Re: [PATCH] we have added support for box type in SP-GiST index

Поиск
Список
Период
Сортировка
От Oleg Bartunov
Тема Re: [PATCH] we have added support for box type in SP-GiST index
Дата
Msg-id CAF4Au4zxd2XOV0A__FU7xoHxSiwJzm1z2xhs-FFaT1DzB9ub3Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] we have added support for box type in SP-GiST index  (Teodor Sigaev <teodor@sigaev.ru>)
Список pgsql-hackers


On Mon, Feb 15, 2016 at 6:29 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
It's very pity but author is not able to continue work on this patch, and I would like to raise this flag.

I'd like to add some comments about patches:

traversalValue patch adds arbitrary value assoсiated with branch in SP-GiST tree walk. Unlike to recostructedValue it could be just pointer, it havn't to be a regular pgsql type. Also, it could be used independly from reconstructedValue. This patch is used both following attached patches.

range patch just switchs using reconstructedValue to traversalValue in range opclasses. reconstructedValue was used just because it was an acceptable workaround in case of range type. Range opclase stores a full value in leafs and doesn't need to use reconstructedValue to return tuple in index only scan.
See http://www.postgresql.org/message-id/5399.1343512250@sss.pgh.pa.us

q4d patch implements index over boxes using SP-GiST. Basic idea was an observation, range opclass thinks about one-dimensional ranges as 2D points.
Following this idea, we can think that 2D box (what is 2 points or 4 numbers) could represent a 4D point. We hoped that this approach will be much more effective than traditional R-Tree in case of many overlaps in box's collection.
Performance results are coming shortly.

I made some benchmarks using two real-life datasets. Streets data set is consists of 3691620 MBRs of  russian streets with not much overlaps. As expected, spgist is slower than  rtree (gist), because of  much bigger number of blocks readed. Build time, however, is faster for spgist than rtree (~ 1.3 times).
 
Bounds data set is consists of 7803499 MBRs of some objects and are highly overlapped,  see attached picture (crop with center in Moscow) of boxes.
Create index time (ms):
spgist: 47036.051
rtree:   68491.242

Size:
spgist: 505 MB
rtree:   663 MB
heap:   871 MB

Let's consider a small box in downtown of Moscow:  box  '(37.607164,55.756489), (37.607797,55.756725)'.
&& оператор (returns 23958 boxes) :
spgist:     14.649
rtree:        23.715
seqscan: 924.344

@> operator (returns 23868 boxes):

spgist: 14.113
rtree: 26.853
seqscan: 909.147

<@ operator (returns 0 boxes):
spgist: 0..268 ms
rtree: 12.563

My conclusion is that for "spagetti" data new opclass for spgist is good to have - it's smaller and faster (both, created index and search) that rtree (gist based).
 

In near future (say, tommorrow) I hope to suggest a opclass over polygons based on this approach.


Yes, it'd be interested.

 

Alexander Lebedev wrote:
Hello, Hacker.

* [PATCH] add a box index to sp-gist

   We have extended sp-gist with an index that keeps track of boxes

   We have used ideas underlying sp-gist range implementation to
   represent 2D boxes as points in 4D space. We use quad tree
   analogue, but in 4-dimensional space. We call this tree q4d. Each
   node of this tree is a box (a point in 4D space) which splits space
   in 16 hyperrectangles.

   Rationale: r-tree assumes that boxes we're trying to index don't
   overlap much. When this assumption fails, r-tree performs badly,
   while our proposal to represent a rectangle as a point in 4D space
   solves this problem.

   NB: the index uses traversalValue introduced in a separate patch.

* [PATCH] add traversalValue in sp-gist

   During implementation of box index for sp-gist we saw that we only
   keep rectangles, but to determine traversal direction we may need
   to know boundaries of a hyperrectangle. So we calculate them
   on the fly and store them in traversalValue, available
   when traversing child nodes, because otherwise we would't be able to
   calculate them from inside the inner_consistent function call.

   This patch was written by Teodor Sigaev.

* [PATCH] change reconstructValue -> traversalValue in range_spgist.c

   During traversal, local context is
   insufficient to pick traversal direction. As of now, this is worked
   around with the help of reconstructValue. reconstructValue only
   stores data of the same type as a tree node, that is, a range.

   We have written a patch that calculates auxillary values and makes
   them accessible during traversal.

   We then use traversalValue in a new box index and change
   range_spgist.c to use it in place of reconstructValue.

   NB: apply this patch after traversalValue patch.





--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: [COMMITTERS] pgsql: Add isolationtester spec for old heapam.c bug
Следующее
От: Petr Jelinek
Дата:
Сообщение: Re: Proposal: Generic WAL logical messages