Обсуждение: GiST index performance

От:
Matthew Wakeling
Дата:

I have been doing some queries that are best answered with GiST indexes,
however I have found that their performance is a little lacking. I thought
I would do a direct comparison on a level playing field. Here are two
EXPLAIN ANALYSE results for the same query, with two different indexes.
The two indexes are identical except that one is btree and the other GiST.

Here is the query:

SELECT *
FROM
     location l1,
     location l2,
     gene,
     primer
WHERE
         l1.subjectid <> l2.subjectid
     AND l1.objectid = l2.objectid
     AND l1.subjectid = gene.id
     AND l2.subjectid = primer.id
     AND l2.intermine_start <= l1.intermine_start
     AND l2.intermine_end >= l1.intermine_start

Here is the btree index:

CREATE INDEX location_object_start ON location (objectid, intermine_start);

QUERY PLAN
----------------------------------------------------------------------
  Hash Join
    (cost=26213.16..135980894.76 rows=3155740824 width=484)
    (actual time=2799.260..14256.588 rows=2758 loops=1)
    Hash Cond: (l1.subjectid = gene.id)
    ->  Nested Loop
          (cost=0.00..4364485.01 rows=8891802645 width=324)
          (actual time=9.748..10418.807 rows=390695 loops=1)
          Join Filter: (l1.subjectid <> l2.subjectid)
          ->  Nested Loop
                (cost=0.00..446862.58 rows=572239 width=259)
                (actual time=9.720..4226.117 rows=211880 loops=1)
                ->  Seq Scan on primer
                      (cost=0.00..15358.80 rows=211880 width=194)
                      (actual time=9.678..579.877 rows=211880 loops=1)
                ->  Index Scan using location__key_all on location l2
                      (cost=0.00..2.00 rows=3 width=65)
                      (actual time=0.004..0.007 rows=1 loops=211880)
                      Index Cond: (l2.subjectid = primer.id)
          ->  Index Scan using location_object_start on location l1
                (cost=0.00..3.85 rows=150 width=65)
                (actual time=0.005..0.012 rows=3 loops=211880)
                Index Cond: ((l1.objectid = l2.objectid) AND (l2.intermine_start <= l1.intermine_start) AND
(l2.intermine_end>= l1.intermine_start)) 
    ->  Hash
          (cost=20496.96..20496.96 rows=457296 width=160)
          (actual time=2788.698..2788.698 rows=457296 loops=1)
          ->  Seq Scan on gene
                (cost=0.00..20496.96 rows=457296 width=160)
                (actual time=0.038..1420.604 rows=457296 loops=1)
  Total runtime: 14263.846 ms
(13 rows)


Here is the GiST index:

CREATE INDEX location_object_start_gist ON location USING gist (objectid, intermine_start);

QUERY PLAN
------------------------------------------------------------------------
  Hash Join
    (cost=26213.16..136159960.32 rows=3155740824 width=484)
    (actual time=2576.109..2300486.267 rows=2758 loops=1)
    Hash Cond: (l1.subjectid = gene.id)
    ->  Nested Loop
          (cost=0.00..4543550.56 rows=8891802645 width=324)
          (actual time=366.121..2296668.740 rows=390695 loops=1)
          Join Filter: (l1.subjectid <> l2.subjectid)
          ->  Nested Loop
                (cost=0.00..446862.58 rows=572239 width=259)
                (actual time=362.774..13423.443 rows=211880 loops=1)
                ->  Seq Scan on primer
                      (cost=0.00..15358.80 rows=211880 width=194)
                      (actual time=319.559..1296.907 rows=211880 loops=1)
                ->  Index Scan using location__key_all on location l2
                      (cost=0.00..2.00 rows=3 width=65)
                      (actual time=0.041..0.045 rows=1 loops=211880)
                      Index Cond: (l2.subjectid = primer.id)
          ->  Index Scan using location_object_start_gist on location l1
                (cost=0.00..4.16 rows=150 width=65)
                (actual time=3.354..10.757 rows=3 loops=211880)
                Index Cond: ((l1.objectid = l2.objectid) AND (l2.intermine_start <= l1.intermine_start) AND
(l2.intermine_end>= l1.intermine_start)) 
    ->  Hash
          (cost=20496.96..20496.96 rows=457296 width=160)
          (actual time=2157.914..2157.914 rows=457296 loops=1)
          ->  Seq Scan on gene
                (cost=0.00..20496.96 rows=457296 width=160)
                (actual time=3.904..1206.907 rows=457296 loops=1)
  Total runtime: 2300510.674 ms
(13 rows)

The query plans are identical except in the type of index used, but there
is a factor of a few hundred in execute time. Is this the kind of factor
that would be expected, or is there something amiss? Is this seen as
something that might be improved in the future?

Matthew

--
 "We have always been quite clear that Win95 and Win98 are not the systems to
 use if you are in a hostile security environment." "We absolutely do recognize
 that the Internet is a hostile environment." Paul Leach <>

От:
"Kevin Grittner"
Дата:

Matthew Wakeling <> wrote:
> I have been doing some queries that are best answered with GiST
> indexes

For what definition of "best answered"?

Since an index is only a performance tuning feature (unless declared
UNIQUE), and should never alter the results (beyond possibly affecting
row order if that is unspecified), how is an index which performs
worse than an alternative the best answer?

-Kevin

От:
Matthew Wakeling
Дата:

On Thu, 16 Apr 2009, Kevin Grittner wrote:
> Matthew Wakeling <> wrote:
>> I have been doing some queries that are best answered with GiST
>> indexes
>
> For what definition of "best answered"?
>
> Since an index is only a performance tuning feature (unless declared
> UNIQUE), and should never alter the results (beyond possibly affecting
> row order if that is unspecified), how is an index which performs
> worse than an alternative the best answer?

Don't be misled by my example using integers. I'm doing queries on the
bioseg data type, and the only index type for that is GiST. There isn't a
better alternative.

Matthew

--
 "Finger to spiritual emptiness underlying everything."
        -- How a foreign C manual referred to a "pointer to void."

От:
Tom Lane
Дата:

"Kevin Grittner" <> writes:
> Matthew Wakeling <> wrote:
>> I have been doing some queries that are best answered with GiST
>> indexes

> For what definition of "best answered"?

> Since an index is only a performance tuning feature (unless declared
> UNIQUE), and should never alter the results (beyond possibly affecting
> row order if that is unspecified), how is an index which performs
> worse than an alternative the best answer?

The main point of GIST is to be able to index queries that simply are
not indexable in btree.  So I assume that Matthew is really worried
about some queries that are not btree-indexable.  One would fully
expect btree to beat out GIST for btree-indexable cases.  I think the
significant point here is that it's winning by a factor of a couple
hundred; that's pretty awful, and might point to some implementation
problem.

Matthew, can you put together a self-contained test case with a similar
slowdown?  Also, what are the physical sizes of the two indexes?
I notice that the inner nestloop join gets slower too, when it's not
changed at all --- that suggests that the overall I/O load is a lot
worse, so maybe the reason the query is falling off a performance cliff
is that the GIST index fails to fit in cache.

            regards, tom lane

От:
Matthew Wakeling
Дата:

On Thu, 16 Apr 2009, dforum wrote:
> there is other performance problem on this request.
>
> If you analyse query plan, you see that most of the time are lost during
> sequencial scan, and you have 2 seq scan.

Nonsense. Sequential scans account for all of one or two seconds of
processing in these queries, which are 14 seconds and 38 minutes
respectively.

Matthew

--
 Doctor:  Are you okay? You appear to be injured.
 Neelix:  Aaaaaaah!
 Doctor:  It's okay, it looks superficial.
 Neelix:  Am I going to die?
 Doctor:  Not unless you are allergic to tomatoes. This appears to be a sauce
          some kind.

От:
dforum
Дата:

hello,

there is other performance problem on this request.

If you analyse query plan, you see that most of the time are lost during
sequencial scan, and you have 2 seq scan.

You have to create other indexes to match the request.

Postgresq is totally dependant on index to reach is performance.

Regarding gist or btree, I personnaly had better performance with btree.

Regards

david

Matthew Wakeling a écrit :
>
> I have been doing some queries that are best answered with GiST
> indexes, however I have found that their performance is a little
> lacking. I thought I would do a direct comparison on a level playing
> field. Here are two EXPLAIN ANALYSE results for the same query, with
> two different indexes. The two indexes are identical except that one
> is btree and the other GiST.
>
> Here is the query:
>
> SELECT *
> FROM
>     location l1,
>     location l2,
>     gene,
>     primer
> WHERE
>         l1.subjectid <> l2.subjectid
>     AND l1.objectid = l2.objectid
>     AND l1.subjectid = gene.id
>     AND l2.subjectid = primer.id
>     AND l2.intermine_start <= l1.intermine_start
>     AND l2.intermine_end >= l1.intermine_start
>
> Here is the btree index:
>
> CREATE INDEX location_object_start ON location (objectid,
> intermine_start);
>
> QUERY PLAN
> ----------------------------------------------------------------------
>  Hash Join
>    (cost=26213.16..135980894.76 rows=3155740824 width=484)
>    (actual time=2799.260..14256.588 rows=2758 loops=1)
>    Hash Cond: (l1.subjectid = gene.id)
>    ->  Nested Loop
>          (cost=0.00..4364485.01 rows=8891802645 width=324)
>          (actual time=9.748..10418.807 rows=390695 loops=1)
>          Join Filter: (l1.subjectid <> l2.subjectid)
>          ->  Nested Loop
>                (cost=0.00..446862.58 rows=572239 width=259)
>                (actual time=9.720..4226.117 rows=211880 loops=1)
>                ->  Seq Scan on primer
>                      (cost=0.00..15358.80 rows=211880 width=194)
>                      (actual time=9.678..579.877 rows=211880 loops=1)
>                ->  Index Scan using location__key_all on location l2
>                      (cost=0.00..2.00 rows=3 width=65)
>                      (actual time=0.004..0.007 rows=1 loops=211880)
>                      Index Cond: (l2.subjectid = primer.id)
>          ->  Index Scan using location_object_start on location l1
>                (cost=0.00..3.85 rows=150 width=65)
>                (actual time=0.005..0.012 rows=3 loops=211880)
>                Index Cond: ((l1.objectid = l2.objectid) AND
> (l2.intermine_start <= l1.intermine_start) AND (l2.intermine_end >=
> l1.intermine_start))
>    ->  Hash
>          (cost=20496.96..20496.96 rows=457296 width=160)
>          (actual time=2788.698..2788.698 rows=457296 loops=1)
>          ->  Seq Scan on gene
>                (cost=0.00..20496.96 rows=457296 width=160)
>                (actual time=0.038..1420.604 rows=457296 loops=1)
>  Total runtime: 14263.846 ms
> (13 rows)
>
>
> Here is the GiST index:
>
> CREATE INDEX location_object_start_gist ON location USING gist
> (objectid, intermine_start);
>
> QUERY PLAN
> ------------------------------------------------------------------------
>  Hash Join
>    (cost=26213.16..136159960.32 rows=3155740824 width=484)
>    (actual time=2576.109..2300486.267 rows=2758 loops=1)
>    Hash Cond: (l1.subjectid = gene.id)
>    ->  Nested Loop
>          (cost=0.00..4543550.56 rows=8891802645 width=324)
>          (actual time=366.121..2296668.740 rows=390695 loops=1)
>          Join Filter: (l1.subjectid <> l2.subjectid)
>          ->  Nested Loop
>                (cost=0.00..446862.58 rows=572239 width=259)
>                (actual time=362.774..13423.443 rows=211880 loops=1)
>                ->  Seq Scan on primer
>                      (cost=0.00..15358.80 rows=211880 width=194)
>                      (actual time=319.559..1296.907 rows=211880 loops=1)
>                ->  Index Scan using location__key_all on location l2
>                      (cost=0.00..2.00 rows=3 width=65)
>                      (actual time=0.041..0.045 rows=1 loops=211880)
>                      Index Cond: (l2.subjectid = primer.id)
>          ->  Index Scan using location_object_start_gist on location l1
>                (cost=0.00..4.16 rows=150 width=65)
>                (actual time=3.354..10.757 rows=3 loops=211880)
>                Index Cond: ((l1.objectid = l2.objectid) AND
> (l2.intermine_start <= l1.intermine_start) AND (l2.intermine_end >=
> l1.intermine_start))
>    ->  Hash
>          (cost=20496.96..20496.96 rows=457296 width=160)
>          (actual time=2157.914..2157.914 rows=457296 loops=1)
>          ->  Seq Scan on gene
>                (cost=0.00..20496.96 rows=457296 width=160)
>                (actual time=3.904..1206.907 rows=457296 loops=1)
>  Total runtime: 2300510.674 ms
> (13 rows)
>
> The query plans are identical except in the type of index used, but
> there is a factor of a few hundred in execute time. Is this the kind
> of factor that would be expected, or is there something amiss? Is this
> seen as something that might be improved in the future?
>
> Matthew
>


От:
Tom Lane
Дата:

dforum <> writes:
> If you analyse query plan, you see that most of the time are lost during
> sequencial scan, and you have 2 seq scan.

I think you missed the loops count.

>>          ->  Index Scan using location_object_start_gist on location l1
>>                (cost=0.00..4.16 rows=150 width=65)
>>                (actual time=3.354..10.757 rows=3 loops=211880)
>>                Index Cond: ((l1.objectid = l2.objectid) AND
>> (l2.intermine_start <= l1.intermine_start) AND (l2.intermine_end >=
>> l1.intermine_start))

This indexscan is accounting for 10.757 * 211880 msec, which is 99%
of the runtime.

            regards, tom lane

От:
Matthew Wakeling
Дата:

On Thu, 16 Apr 2009, Tom Lane wrote:
> Matthew, can you put together a self-contained test case with a similar
> slowdown?

It isn't the smoking gun I thought it would be, but:

CREATE TABLE a AS SELECT a FROM generate_series(1,1000000) AS a(a);
CREATE TABLE b AS SELECT b FROM generate_series(1,1000000) AS b(b);

ANALYSE;

CREATE INDEX a_a ON a (a);

EXPLAIN ANALYSE SELECT * FROM a, b WHERE a.a BETWEEN b.b AND b.b + 2;

DROP INDEX a_a;
CREATE INDEX a_a ON a USING gist (a);

EXPLAIN ANALYSE SELECT * FROM a, b WHERE a.a BETWEEN b.b AND b.b + 2;


I see four seconds versus thirty seconds. The difference was much greater
on my non-test-case - I wonder if multi-column indexing has something to
do with it.

> Also, what are the physical sizes of the two indexes?

           relname           | pg_size_pretty
----------------------------+----------------
  location_object_start_gist | 193 MB
  location_object_start      | 75 MB
(2 rows)

> I notice that the inner nestloop join gets slower too, when it's not
> changed at all --- that suggests that the overall I/O load is a lot
> worse, so maybe the reason the query is falling off a performance cliff
> is that the GIST index fails to fit in cache.

Memory in the machine is 16GB.

Matthew

--
 [About NP-completeness] These are the problems that make efficient use of
 the Fairy Godmother.                    -- Computer Science Lecturer

От:
Tom Lane
Дата:

Matthew Wakeling <> writes:
> On Thu, 16 Apr 2009, Tom Lane wrote:
>> Also, what are the physical sizes of the two indexes?

>   location_object_start_gist | 193 MB
>   location_object_start      | 75 MB

>> I notice that the inner nestloop join gets slower too, when it's not
>> changed at all --- that suggests that the overall I/O load is a lot
>> worse, so maybe the reason the query is falling off a performance cliff
>> is that the GIST index fails to fit in cache.

> Memory in the machine is 16GB.

Hmm, and what is shared_buffers set to?  How big are the tables and
other indexes used in the query?  We still have to explain why the
inner nestloop got slower, and it's hard to see that unless something
stopped fitting in cache.

            regards, tom lane

От:
Matthew Wakeling
Дата:

On Thu, 16 Apr 2009, Tom Lane wrote:
> Hmm, and what is shared_buffers set to?  How big are the tables and
> other indexes used in the query?  We still have to explain why the
> inner nestloop got slower, and it's hard to see that unless something
> stopped fitting in cache.

I just noticed that someone has started running a big java program (6GB
RAM so far) on that machine. Maybe it was running during the bad run. I'll
see if I can re-run those two queries later on when the machine is idle.

shared_buffers = 500MB

Location table: 336 MB
Gene table:     124 MB
Primer table:   103 MB

location__key_all index: 334 MB

Matthew

--
 For those of you who are into writing programs that are as obscure and
 complicated as possible, there are opportunities for... real fun here
                                        -- Computer Science Lecturer

От:
Craig Ringer
Дата:

dforum wrote:
> hello,
>
> there is other performance problem on this request.
>
> If you analyse query plan, you see that most of the time are lost during
> sequencial scan, and you have 2 seq scan.
>
> You have to create other indexes to match the request.
>
> Postgresq is totally dependant on index to reach is performance.

That depends a lot on your queries. Sometimes a sequential scan is a
faster and better choice. It may also be faster for small tables.

I've usually found that when I (for performance testing purposes) force
the planner to an index scan instead of its preferred sequential scan,
the query runs slower than it did with a sequential scan.

Sure, there are queries that are horrifyingly slow without appropriate
indexes, but I wouldn't go so far as to say that Pg is totally dependent
on indexes to perform well. It depends a lot on the query.

--
Craig Ringer

От:
Matthew Wakeling
Дата:

On Thu, 16 Apr 2009, Tom Lane wrote:
> Matthew, can you put together a self-contained test case with a similar
> slowdown?

I have done a bit of investigation, and I think I might have found the
smoking gun I was looking for. I just added a load of debug to the gist
consistent function on the bioseg type, and did a single overlap lookup in
the index.

The index contains values ranging from 1 to 28,000,000 or so.
The range I looked up was 23503297..23504738 (so a very small proportion).
The index contains 375154 entries.
The index returned 59 rows.
The consistent method was called 54022 times - 5828 times for branch
     (internal) index entries, and 48194 times for leaf entries.

Obviously this is a really bad index layout - scanning that many entries
for such a small output. In fact, I saw lots of overlapping branch index
entries, so the index isn't actually differentiating between the different
branches of the tree very well. This indicates a failure of the picksplit
or the penalty functions. I shall investigate this further next week.

I shall also investigate whether this is the exact same problem that I had
with the int4 gist system.

Matthew

--
So, given 'D' is undeclared too, with a default of zero, C++ is equal to D.
  mnw21, commenting on the "Surely the value of C++ is zero, but C is now 1"
  response to "No, C++ isn't equal to D. 'C' is undeclared [...] C++ should
  really be called 1" response to "C++ -- shouldn't it be called D?"

От:
Matthew Wakeling
Дата:

On Fri, 17 Apr 2009, Matthew Wakeling wrote:
> I have done a bit of investigation, and I think I might have found the
> smoking gun I was looking for.

I have found a bug in the contrib package seg, which has been copied into
the bioseg data type as well. It causes the index to be created with
horribly bad unselective trees, so that when a search is performed many of
the branches of the tree need to be followed. This explanation does not
extend to btree_gist, so I will have to further investigate that. Apply
the following patch to contrib/seg/seg.c:

*** seg.c    2006-09-10 18:36:51.000000000 +0100
--- seg.c_new    2009-04-20 15:02:52.000000000 +0100
***************
*** 426,432 ****
           else
           {
               datum_r = union_dr;
!             size_r = size_alpha;
               *right++ = i;
               v->spl_nright++;
           }
--- 426,432 ----
           else
           {
               datum_r = union_dr;
!             size_r = size_beta;
               *right++ = i;
               v->spl_nright++;
           }


Matthew

--
 The early bird gets the worm. If you want something else for breakfast, get
 up later.

От:
Tom Lane
Дата:

Matthew Wakeling <> writes:
> I have found a bug in the contrib package seg, which has been copied into
> the bioseg data type as well. It causes the index to be created with
> horribly bad unselective trees, so that when a search is performed many of
> the branches of the tree need to be followed. This explanation does not
> extend to btree_gist, so I will have to further investigate that. Apply
> the following patch to contrib/seg/seg.c:

> *** seg.c    2006-09-10 18:36:51.000000000 +0100
> --- seg.c_new    2009-04-20 15:02:52.000000000 +0100
> ***************
> *** 426,432 ****
>            else
>            {
>                datum_r = union_dr;
> !             size_r = size_alpha;
>                *right++ = i;
>                v->spl_nright++;
>            }
> --- 426,432 ----
>            else
>            {
>                datum_r = union_dr;
> !             size_r = size_beta;
>                *right++ = i;
>                v->spl_nright++;
>            }

Looks like contrib/cube has the same error.  I don't see a similar code
pattern elsewhere though.  Oleg, Teodor, do you concur that this is a
correct patch?  Is it safe to back-patch (I think it should be)?

            regards, tom lane

От:
Matthew Wakeling
Дата:

On Mon, 20 Apr 2009, Teodor Sigaev wrote:
>> Looks like contrib/cube has the same error.  I don't see a similar code
>> pattern elsewhere though.  Oleg, Teodor, do you concur that this is a
>> correct patch?  Is it safe to back-patch (I think it should be)?
> Yeah, good catch, and it doesn't touch any already-on-disk data. Although
> release notes should mention advice about REINDEX seg and cube opclasses.

Unfortunately, it seems there is another bug in the picksplit function.
My patch fixes a bug that reveals this new bug. The whole picksplit
algorithm is fundamentally broken, and needs to be rewritten completely,
which is what I am doing.

If you apply my patch, then index sizes will go up by a factor of ten or
so, because the picksplit function tends to split the set of 367 ranges
into one set of 366 and another set of 1, leading to a horribly unbalanced
tree. Before the patch, the different branches of the tree were
unselective, so new entries would just get stuffed in anywhere, leading to
a much more "balanced" tree.

I shall have a proper fix to this problem later today.

Matthew

--
 It's one of those irregular verbs - "I have an independent mind," "You are
 an eccentric," "He is round the twist."
                                      -- Bernard Woolly, Yes Prime Minister

От:
Matthew Wakeling
Дата:

On Tue, 21 Apr 2009, Matthew Wakeling wrote:
> Unfortunately, it seems there is another bug in the picksplit function. My
> patch fixes a bug that reveals this new bug. The whole picksplit algorithm is
> fundamentally broken, and needs to be rewritten completely, which is what I
> am doing.

I have now rewritten the picksplit and penalty functions for the bioseg
data type, and they perform much better. The index size is now 164MB,
compared to 350MB or so originally, and 2400MB after my earlier bugfix.
Execution time of one of our queries (basically a nested loop join over
a sequential scan and an index lookup in this index type) has gone down
from 45 minutes to two minutes.

I have abandoned "Guttman's poly time split algorithm". A fundamental flaw
in the picksplit algorithm is that it would create two separate target
sets, and incrementally add entries to whichever one would grow the least
in range size. However, if the entries arrived in any sort of order, they
would all be added to the one set, growing it by a small amount each time.
This caused the picksplit algorithm to split a set of 367 entries into a
set of 366 and a set of one a high proportion of the time.

I have replaced the picksplit algorithm with a simple one. For each range
element, find the midpoint of the range. Then find the mean of all the
midpoints. All elements with a midpoint below the mean go in one set, and
the others go in the second set. This usually splits the entries in a
meaningful way.

I have also changed the penalty function. Previously, the penalty was the
amount that the range would have to expand. So, if a new element fitted
inside the existing range, then the penalty is zero. I have changed it to
create a tie-break between multiple index pages that the element would fit
in without expanding the range - the element should be inserted into the
index page with the smallest range. This prevents large elements from
messing up the index by forcing a large index page range that sucks in all
the elements in the whole area into a non-selective group.

I may experiment with improving these functions further. The main problem
with this index is the fact that I need to index ranges with a wide
variety of widths, and I have a couple more strategies yet to help with
that.

I will post a patch when I have ported my bioseg code over to the seg data
type.

Matthew

--
 Riker: Our memory pathways have become accustomed to your sensory input.
 Data:  I understand - I'm fond of you too, Commander. And you too Counsellor

От:
Matthew Wakeling
Дата:

On Wed, 22 Apr 2009, Matthew Wakeling wrote:
> I will post a patch when I have ported my bioseg code over to the seg data
> type.

Here is my patch ported over to the seg contrib package, attached. Apply
it to seg.c and all should be well. A similar thing needs to be done to
cube, but I haven't looked at that.

Matthew

--
 An optimist sees the glass as half full, a pessimist as half empty,
 and an engineer as having redundant storage capacity.

От:
Tom Lane
Дата:

Matthew Wakeling <> writes:
> Here is my patch ported over to the seg contrib package, attached. Apply
> it to seg.c and all should be well. A similar thing needs to be done to
> cube, but I haven't looked at that.

Teodor, Oleg, do you intend to review/apply this patch?

            regards, tom lane

От:
Oleg Bartunov
Дата:

On Wed, 6 May 2009, Tom Lane wrote:

> Matthew Wakeling <> writes:
>> Here is my patch ported over to the seg contrib package, attached. Apply
>> it to seg.c and all should be well. A similar thing needs to be done to
>> cube, but I haven't looked at that.
>
> Teodor, Oleg, do you intend to review/apply this patch?

Tom,

I just returned from trek around Annapurna and just learned about Matthew's
experiments, Teodor is in holidays and will be available after May 11,
then there are should be PGCon, so if it can wait, we could look on this
after PGCon.

Matthew, did you try various data ? From our experience we learned there
are can be various corner cases.



     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: , http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

От:
Matthew Wakeling
Дата:

On Thu, 7 May 2009, Oleg Bartunov wrote:
> Did you try Guttman quadratic split algorithm ? We also found linear
> split algorithm for Rtree.

The existing (bugfixed) seg split algorithm is the Guttman quadratic split
algorithm. Guttman did all his work on two-dimensional and above data,
dismissing one-dimensional data as being handled adequately by B-trees,
which is not true for segment overlaps. It turns out that the algorithm
has a weakness with certain types of data, and one-dimensional data is
almost certain to exercise that weakness. The greater the number of
dimensions, the less the weakness is exercised.

The problem is that the algorithm does not calculate a split pivot.
Instead it finds two suitable entries, and adds the remaining entries to
those two in turn. This can lead to the majority of the entries being
added to just one side. In fact, I saw lots of cases where 367 entries
were being split into two pages of 366 and one entry.

Guttman's linear split algorithm has the same weakness.

>> One thing I am seeing is a really big difference in performance between
>> Postgres/GiST and a Java implementation I have written, using the same
>> algorithms. Postgres takes three minutes to perform a set of index lookups
>> while java takes six seconds. The old version of bioseg took an hour. I
>> can't see anything in the GiST support code that could account for this.
>
> is the number of index lookups different, or just index lookup time is very
> big ?

Same number of index lookups. Same algorithms. I have a set of 681879
segments, and I load them all into the index. I then query the index for
overlaps for each one in turn. For some reason, GiST lookups seem to be
slow, even if they are using a good algorithm. I have seen that problem
with btree_gist on integers too. I can't see any reason for this is the
GiST code - it all seems pretty tight to me. We probably need to do some
profiling.

Matthew

--
 I suppose some of you have done a Continuous Maths course. Yes? Continuous
 Maths? <menacing stares from audience> Whoah, it was like that, was it!
                                        -- Computer Science Lecturer

От:
Bruce Momjian
Дата:

Was this corrected?  I don't see any commits to seg.c.

---------------------------------------------------------------------------

Matthew Wakeling wrote:
> On Thu, 7 May 2009, Oleg Bartunov wrote:
> > Did you try Guttman quadratic split algorithm ? We also found linear
> > split algorithm for Rtree.
>
> The existing (bugfixed) seg split algorithm is the Guttman quadratic split
> algorithm. Guttman did all his work on two-dimensional and above data,
> dismissing one-dimensional data as being handled adequately by B-trees,
> which is not true for segment overlaps. It turns out that the algorithm
> has a weakness with certain types of data, and one-dimensional data is
> almost certain to exercise that weakness. The greater the number of
> dimensions, the less the weakness is exercised.
>
> The problem is that the algorithm does not calculate a split pivot.
> Instead it finds two suitable entries, and adds the remaining entries to
> those two in turn. This can lead to the majority of the entries being
> added to just one side. In fact, I saw lots of cases where 367 entries
> were being split into two pages of 366 and one entry.
>
> Guttman's linear split algorithm has the same weakness.
>
> >> One thing I am seeing is a really big difference in performance between
> >> Postgres/GiST and a Java implementation I have written, using the same
> >> algorithms. Postgres takes three minutes to perform a set of index lookups
> >> while java takes six seconds. The old version of bioseg took an hour. I
> >> can't see anything in the GiST support code that could account for this.
> >
> > is the number of index lookups different, or just index lookup time is very
> > big ?
>
> Same number of index lookups. Same algorithms. I have a set of 681879
> segments, and I load them all into the index. I then query the index for
> overlaps for each one in turn. For some reason, GiST lookups seem to be
> slow, even if they are using a good algorithm. I have seen that problem
> with btree_gist on integers too. I can't see any reason for this is the
> GiST code - it all seems pretty tight to me. We probably need to do some
> profiling.
>
> Matthew
>
> --
>  I suppose some of you have done a Continuous Maths course. Yes? Continuous
>  Maths? <menacing stares from audience> Whoah, it was like that, was it!
>                                         -- Computer Science Lecturer
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

--
  Bruce Momjian  <>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com
  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do
  + If your life is a hard drive, Christ can be your backup. +

От:
Robert Haas
Дата:

On Thu, Feb 25, 2010 at 6:44 PM, Bruce Momjian <> wrote:
> Was this corrected?  I don't see any commits to seg.c.

I don't think this was ever reviewed.

It seems like a good patch but I'd be skeptical of committing it now
unless someone has the time to review it carefully.  If not, let's add
it to the next CF so we don't lose it again.

...Robert

>
> ---------------------------------------------------------------------------
>
> Matthew Wakeling wrote:
>> On Thu, 7 May 2009, Oleg Bartunov wrote:
>> > Did you try Guttman quadratic split algorithm ? We also found linear
>> > split algorithm for Rtree.
>>
>> The existing (bugfixed) seg split algorithm is the Guttman quadratic split
>> algorithm. Guttman did all his work on two-dimensional and above data,
>> dismissing one-dimensional data as being handled adequately by B-trees,
>> which is not true for segment overlaps. It turns out that the algorithm
>> has a weakness with certain types of data, and one-dimensional data is
>> almost certain to exercise that weakness. The greater the number of
>> dimensions, the less the weakness is exercised.
>>
>> The problem is that the algorithm does not calculate a split pivot.
>> Instead it finds two suitable entries, and adds the remaining entries to
>> those two in turn. This can lead to the majority of the entries being
>> added to just one side. In fact, I saw lots of cases where 367 entries
>> were being split into two pages of 366 and one entry.
>>
>> Guttman's linear split algorithm has the same weakness.
>>
>> >> One thing I am seeing is a really big difference in performance between
>> >> Postgres/GiST and a Java implementation I have written, using the same
>> >> algorithms. Postgres takes three minutes to perform a set of index lookups
>> >> while java takes six seconds. The old version of bioseg took an hour. I
>> >> can't see anything in the GiST support code that could account for this.
>> >
>> > is the number of index lookups different, or just index lookup time is very
>> > big ?
>>
>> Same number of index lookups. Same algorithms. I have a set of 681879
>> segments, and I load them all into the index. I then query the index for
>> overlaps for each one in turn. For some reason, GiST lookups seem to be
>> slow, even if they are using a good algorithm. I have seen that problem
>> with btree_gist on integers too. I can't see any reason for this is the
>> GiST code - it all seems pretty tight to me. We probably need to do some
>> profiling.
>>
>> Matthew
>>
>> --
>>  I suppose some of you have done a Continuous Maths course. Yes? Continuous
>>  Maths? <menacing stares from audience> Whoah, it was like that, was it!
>>                                         -- Computer Science Lecturer
>>
>> --
>> Sent via pgsql-performance mailing list ()
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>
> --
>  Bruce Momjian  <>        http://momjian.us
>  EnterpriseDB                             http://enterprisedb.com
>  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do
>  + If your life is a hard drive, Christ can be your backup. +
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

От:
Bruce Momjian
Дата:

Robert Haas wrote:
> On Thu, Feb 25, 2010 at 6:44 PM, Bruce Momjian <> wrote:
> > Was this corrected? ?I don't see any commits to seg.c.
>
> I don't think this was ever reviewed.
>
> It seems like a good patch but I'd be skeptical of committing it now
> unless someone has the time to review it carefully.  If not, let's add
> it to the next CF so we don't lose it again.

I have asked Oleg and Teodor to work on it.

--
  Bruce Momjian  <>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do