Re: GiST index performance

От: Matthew Wakeling
Тема: Re: GiST index performance
Дата: ,
Msg-id: alpine.DEB.2.00.0905071250290.2341@aragorn.flymine.org
(см: обсуждение, исходный текст)
Ответ на: GiST index performance  (Matthew Wakeling)
Ответы: Re: GiST index performance  (Bruce Momjian)
Список: pgsql-performance

Скрыть дерево обсуждения

GiST index performance  (Matthew Wakeling, )
 Re: GiST index performance  ("Kevin Grittner", )
  Re: GiST index performance  (Matthew Wakeling, )
  Re: GiST index performance  (Tom Lane, )
   Re: GiST index performance  (Matthew Wakeling, )
    Re: GiST index performance  (Tom Lane, )
     Re: GiST index performance  (Matthew Wakeling, )
   Re: GiST index performance  (Matthew Wakeling, )
    Re: GiST index performance  (Matthew Wakeling, )
     Re: GiST index performance  (Tom Lane, )
 Re: GiST index performance  (Matthew Wakeling, )
 Re: GiST index performance  (dforum, )
  Re: GiST index performance  (Tom Lane, )
  Re: GiST index performance  (Craig Ringer, )
 Re: GiST index performance  (Matthew Wakeling, )
  Re: GiST index performance  (Matthew Wakeling, )
   Re: GiST index performance  (Matthew Wakeling, )
    Re: GiST index performance  (Tom Lane, )
     Re: GiST index performance  (Oleg Bartunov, )
 Re: GiST index performance  (Matthew Wakeling, )
  Re: GiST index performance  (Bruce Momjian, )
   Re: GiST index performance  (Robert Haas, )
    Re: GiST index performance  (Bruce Momjian, )

On Thu, 7 May 2009, Oleg Bartunov wrote:
> Did you try Guttman quadratic split algorithm ? We also found linear
> split algorithm for Rtree.

The existing (bugfixed) seg split algorithm is the Guttman quadratic split
algorithm. Guttman did all his work on two-dimensional and above data,
dismissing one-dimensional data as being handled adequately by B-trees,
which is not true for segment overlaps. It turns out that the algorithm
has a weakness with certain types of data, and one-dimensional data is
almost certain to exercise that weakness. The greater the number of
dimensions, the less the weakness is exercised.

The problem is that the algorithm does not calculate a split pivot.
Instead it finds two suitable entries, and adds the remaining entries to
those two in turn. This can lead to the majority of the entries being
added to just one side. In fact, I saw lots of cases where 367 entries
were being split into two pages of 366 and one entry.

Guttman's linear split algorithm has the same weakness.

>> One thing I am seeing is a really big difference in performance between
>> Postgres/GiST and a Java implementation I have written, using the same
>> algorithms. Postgres takes three minutes to perform a set of index lookups
>> while java takes six seconds. The old version of bioseg took an hour. I
>> can't see anything in the GiST support code that could account for this.
>
> is the number of index lookups different, or just index lookup time is very
> big ?

Same number of index lookups. Same algorithms. I have a set of 681879
segments, and I load them all into the index. I then query the index for
overlaps for each one in turn. For some reason, GiST lookups seem to be
slow, even if they are using a good algorithm. I have seen that problem
with btree_gist on integers too. I can't see any reason for this is the
GiST code - it all seems pretty tight to me. We probably need to do some
profiling.

Matthew

--
 I suppose some of you have done a Continuous Maths course. Yes? Continuous
 Maths? <menacing stares from audience> Whoah, it was like that, was it!
                                        -- Computer Science Lecturer


В списке pgsql-performance по дате сообщения:

От: Dimitri
Дата:
Сообщение: Re: Any better plan for this query?..
От: Simon Riggs
Дата:
Сообщение: Re: Any better plan for this query?..