Обсуждение: [PATCH] we have added support for box type in SP-GiST index

Поиск
Список
Период
Сортировка

[PATCH] we have added support for box type in SP-GiST index

От
Alexander Lebedev
Дата:
Hello, Hacker.

* [PATCH] add a box index to sp-gist

   We have extended sp-gist with an index that keeps track of boxes

   We have used ideas underlying sp-gist range implementation to
   represent 2D boxes as points in 4D space. We use quad tree
   analogue, but in 4-dimensional space. We call this tree q4d. Each
   node of this tree is a box (a point in 4D space) which splits space
   in 16 hyperrectangles.

   Rationale: r-tree assumes that boxes we're trying to index don't
   overlap much. When this assumption fails, r-tree performs badly,
   while our proposal to represent a rectangle as a point in 4D space
   solves this problem.

   NB: the index uses traversalValue introduced in a separate patch.

* [PATCH] add traversalValue in sp-gist

   During implementation of box index for sp-gist we saw that we only
   keep rectangles, but to determine traversal direction we may need
   to know boundaries of a hyperrectangle. So we calculate them
   on the fly and store them in traversalValue, available
   when traversing child nodes, because otherwise we would't be able to
   calculate them from inside the inner_consistent function call.

   This patch was written by Teodor Sigaev.

* [PATCH] change reconstructValue -> traversalValue in range_spgist.c

   During traversal, local context is
   insufficient to pick traversal direction. As of now, this is worked
   around with the help of reconstructValue. reconstructValue only
   stores data of the same type as a tree node, that is, a range.

   We have written a patch that calculates auxillary values and makes
   them accessible during traversal.

   We then use traversalValue in a new box index and change
   range_spgist.c to use it in place of reconstructValue.

   NB: apply this patch after traversalValue patch.


Вложения

Re: [PATCH] we have added support for box type in SP-GiST index

От
Oleg Bartunov
Дата:


On Sat, Oct 31, 2015 at 9:49 PM, Alexander Lebedev <a.lebedev@postgrespro.ru> wrote:
Hello, Hacker.

* [PATCH] add a box index to sp-gist

  We have extended sp-gist with an index that keeps track of boxes

  We have used ideas underlying sp-gist range implementation to
  represent 2D boxes as points in 4D space. We use quad tree
  analogue, but in 4-dimensional space. We call this tree q4d. Each
  node of this tree is a box (a point in 4D space) which splits space
  in 16 hyperrectangles.

  Rationale: r-tree assumes that boxes we're trying to index don't
  overlap much. When this assumption fails, r-tree performs badly,
  while our proposal to represent a rectangle as a point in 4D space
  solves this problem.

  NB: the index uses traversalValue introduced in a separate patch.

* [PATCH] add traversalValue in sp-gist

  During implementation of box index for sp-gist we saw that we only
  keep rectangles, but to determine traversal direction we may need
  to know boundaries of a hyperrectangle. So we calculate them
  on the fly and store them in traversalValue, available
  when traversing child nodes, because otherwise we would't be able to
  calculate them from inside the inner_consistent function call.

  This patch was written by Teodor Sigaev.

* [PATCH] change reconstructValue -> traversalValue in range_spgist.c

  During traversal, local context is
  insufficient to pick traversal direction. As of now, this is worked
  around with the help of reconstructValue. reconstructValue only
  stores data of the same type as a tree node, that is, a range.

  We have written a patch that calculates auxillary values and makes
  them accessible during traversal.

  We then use traversalValue in a new box index and change
  range_spgist.c to use it in place of reconstructValue.

  NB: apply this patch after traversalValue patch.


Did you forget to show us performance numbers ?

 


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] we have added support for box type in SP-GiST index

От
Alvaro Herrera
Дата:
CC'ing Teodor because he's author of one of the patches.

Alexander Lebedev wrote:
> Hello, Hacker.

So, can we have a more thorough explanation of what is the point of this
patch?  If it is supposed to improve the performance of searching for
boxes, can we see measurements from some benchmark?  Do the patches
still apply, or do you need to rebase?  If so, please post an updated
version.

I'm setting this patch as Waiting on Author.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] we have added support for box type in SP-GiST index

От
Alvaro Herrera
Дата:
Alexander Lebedev wrote:
> Hello, Hacker.
> 
> * [PATCH] add a box index to sp-gist

I closed this patch as "returned with feedback" because the author
didn't reply in three months.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH] we have added support for box type in SP-GiST index

От
Teodor Sigaev
Дата:
It's very pity but author is not able to continue work on this patch, and I
would like to raise this flag.

I'd like to add some comments about patches:

traversalValue patch adds arbitrary value assoсiated with branch in SP-GiST tree
walk. Unlike to recostructedValue it could be just pointer, it havn't to be a
regular pgsql type. Also, it could be used independly from reconstructedValue.
This patch is used both following attached patches.

range patch just switchs using reconstructedValue to traversalValue in range
opclasses. reconstructedValue was used just because it was an acceptable
workaround in case of range type. Range opclase stores a full value in leafs and
doesn't need to use reconstructedValue to return tuple in index only scan.
See http://www.postgresql.org/message-id/5399.1343512250@sss.pgh.pa.us

q4d patch implements index over boxes using SP-GiST. Basic idea was an
observation, range opclass thinks about one-dimensional ranges as 2D points.
Following this idea, we can think that 2D box (what is 2 points or 4 numbers)
could represent a 4D point. We hoped that this approach will be much more
effective than traditional R-Tree in case of many overlaps in box's collection.
Performance results are coming shortly.

In near future (say, tommorrow) I hope to suggest a opclass over polygons based
on this approach.

Alexander Lebedev wrote:
> Hello, Hacker.
>
> * [PATCH] add a box index to sp-gist
>
>    We have extended sp-gist with an index that keeps track of boxes
>
>    We have used ideas underlying sp-gist range implementation to
>    represent 2D boxes as points in 4D space. We use quad tree
>    analogue, but in 4-dimensional space. We call this tree q4d. Each
>    node of this tree is a box (a point in 4D space) which splits space
>    in 16 hyperrectangles.
>
>    Rationale: r-tree assumes that boxes we're trying to index don't
>    overlap much. When this assumption fails, r-tree performs badly,
>    while our proposal to represent a rectangle as a point in 4D space
>    solves this problem.
>
>    NB: the index uses traversalValue introduced in a separate patch.
>
> * [PATCH] add traversalValue in sp-gist
>
>    During implementation of box index for sp-gist we saw that we only
>    keep rectangles, but to determine traversal direction we may need
>    to know boundaries of a hyperrectangle. So we calculate them
>    on the fly and store them in traversalValue, available
>    when traversing child nodes, because otherwise we would't be able to
>    calculate them from inside the inner_consistent function call.
>
>    This patch was written by Teodor Sigaev.
>
> * [PATCH] change reconstructValue -> traversalValue in range_spgist.c
>
>    During traversal, local context is
>    insufficient to pick traversal direction. As of now, this is worked
>    around with the help of reconstructValue. reconstructValue only
>    stores data of the same type as a tree node, that is, a range.
>
>    We have written a patch that calculates auxillary values and makes
>    them accessible during traversal.
>
>    We then use traversalValue in a new box index and change
>    range_spgist.c to use it in place of reconstructValue.
>
>    NB: apply this patch after traversalValue patch.
>
>
>
>

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

Re: [PATCH] we have added support for box type in SP-GiST index

От
Oleg Bartunov
Дата:


On Mon, Feb 15, 2016 at 6:29 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
It's very pity but author is not able to continue work on this patch, and I would like to raise this flag.

I'd like to add some comments about patches:

traversalValue patch adds arbitrary value assoсiated with branch in SP-GiST tree walk. Unlike to recostructedValue it could be just pointer, it havn't to be a regular pgsql type. Also, it could be used independly from reconstructedValue. This patch is used both following attached patches.

range patch just switchs using reconstructedValue to traversalValue in range opclasses. reconstructedValue was used just because it was an acceptable workaround in case of range type. Range opclase stores a full value in leafs and doesn't need to use reconstructedValue to return tuple in index only scan.
See http://www.postgresql.org/message-id/5399.1343512250@sss.pgh.pa.us

q4d patch implements index over boxes using SP-GiST. Basic idea was an observation, range opclass thinks about one-dimensional ranges as 2D points.
Following this idea, we can think that 2D box (what is 2 points or 4 numbers) could represent a 4D point. We hoped that this approach will be much more effective than traditional R-Tree in case of many overlaps in box's collection.
Performance results are coming shortly.

I made some benchmarks using two real-life datasets. Streets data set is consists of 3691620 MBRs of  russian streets with not much overlaps. As expected, spgist is slower than  rtree (gist), because of  much bigger number of blocks readed. Build time, however, is faster for spgist than rtree (~ 1.3 times).
 
Bounds data set is consists of 7803499 MBRs of some objects and are highly overlapped,  see attached picture (crop with center in Moscow) of boxes.
Create index time (ms):
spgist: 47036.051
rtree:   68491.242

Size:
spgist: 505 MB
rtree:   663 MB
heap:   871 MB

Let's consider a small box in downtown of Moscow:  box  '(37.607164,55.756489), (37.607797,55.756725)'.
&& оператор (returns 23958 boxes) :
spgist:     14.649
rtree:        23.715
seqscan: 924.344

@> operator (returns 23868 boxes):

spgist: 14.113
rtree: 26.853
seqscan: 909.147

<@ operator (returns 0 boxes):
spgist: 0..268 ms
rtree: 12.563

My conclusion is that for "spagetti" data new opclass for spgist is good to have - it's smaller and faster (both, created index and search) that rtree (gist based).
 

In near future (say, tommorrow) I hope to suggest a opclass over polygons based on this approach.


Yes, it'd be interested.

 

Alexander Lebedev wrote:
Hello, Hacker.

* [PATCH] add a box index to sp-gist

   We have extended sp-gist with an index that keeps track of boxes

   We have used ideas underlying sp-gist range implementation to
   represent 2D boxes as points in 4D space. We use quad tree
   analogue, but in 4-dimensional space. We call this tree q4d. Each
   node of this tree is a box (a point in 4D space) which splits space
   in 16 hyperrectangles.

   Rationale: r-tree assumes that boxes we're trying to index don't
   overlap much. When this assumption fails, r-tree performs badly,
   while our proposal to represent a rectangle as a point in 4D space
   solves this problem.

   NB: the index uses traversalValue introduced in a separate patch.

* [PATCH] add traversalValue in sp-gist

   During implementation of box index for sp-gist we saw that we only
   keep rectangles, but to determine traversal direction we may need
   to know boundaries of a hyperrectangle. So we calculate them
   on the fly and store them in traversalValue, available
   when traversing child nodes, because otherwise we would't be able to
   calculate them from inside the inner_consistent function call.

   This patch was written by Teodor Sigaev.

* [PATCH] change reconstructValue -> traversalValue in range_spgist.c

   During traversal, local context is
   insufficient to pick traversal direction. As of now, this is worked
   around with the help of reconstructValue. reconstructValue only
   stores data of the same type as a tree node, that is, a range.

   We have written a patch that calculates auxillary values and makes
   them accessible during traversal.

   We then use traversalValue in a new box index and change
   range_spgist.c to use it in place of reconstructValue.

   NB: apply this patch after traversalValue patch.





--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Вложения

Re: [PATCH] we have added support for box type in SP-GiST index

От
David Steele
Дата:
On 2/15/16 10:29 AM, Teodor Sigaev wrote:

> It's very pity but author is not able to continue work on this patch,
> and I would like to raise this flag.
>
> I'd like to add some comments about patches:
>
> traversalValue patch adds arbitrary value assoсiated with branch in
> SP-GiST tree walk. Unlike to recostructedValue it could be just pointer,
> it havn't to be a regular pgsql type. Also, it could be used independly
> from reconstructedValue. This patch is used both following attached
> patches.
>
> range patch just switchs using reconstructedValue to traversalValue in
> range opclasses. reconstructedValue was used just because it was an
> acceptable workaround in case of range type. Range opclase stores a full
> value in leafs and doesn't need to use reconstructedValue to return
> tuple in index only scan.
> See http://www.postgresql.org/message-id/5399.1343512250@sss.pgh.pa.us
>
> q4d patch implements index over boxes using SP-GiST. Basic idea was an
> observation, range opclass thinks about one-dimensional ranges as 2D
> points.
> Following this idea, we can think that 2D box (what is 2 points or 4
> numbers) could represent a 4D point. We hoped that this approach will be
> much more effective than traditional R-Tree in case of many overlaps in
> box's collection.
> Performance results are coming shortly.

It appears that the issues raised in this thread have been addressed but 
the patch still has not gone though a real review.

Anybody out there willing to take a crack at a review?  All three 
patches apply (with whitespace issues).

Thanks,
-- 
-David
david@pgmasters.net



Re: [PATCH] we have added support for box type in SP-GiST index

От
Oleg Bartunov
Дата:


On Mon, Mar 14, 2016 at 9:01 PM, David Steele <david@pgmasters.net> wrote:
On 2/15/16 10:29 AM, Teodor Sigaev wrote:

It's very pity but author is not able to continue work on this patch,
and I would like to raise this flag.

I'd like to add some comments about patches:

traversalValue patch adds arbitrary value assoсiated with branch in
SP-GiST tree walk. Unlike to recostructedValue it could be just pointer,
it havn't to be a regular pgsql type. Also, it could be used independly
from reconstructedValue. This patch is used both following attached
patches.

range patch just switchs using reconstructedValue to traversalValue in
range opclasses. reconstructedValue was used just because it was an
acceptable workaround in case of range type. Range opclase stores a full
value in leafs and doesn't need to use reconstructedValue to return
tuple in index only scan.
See http://www.postgresql.org/message-id/5399.1343512250@sss.pgh.pa.us

q4d patch implements index over boxes using SP-GiST. Basic idea was an
observation, range opclass thinks about one-dimensional ranges as 2D
points.
Following this idea, we can think that 2D box (what is 2 points or 4
numbers) could represent a 4D point. We hoped that this approach will be
much more effective than traditional R-Tree in case of many overlaps in
box's collection.
Performance results are coming shortly.

It appears that the issues raised in this thread have been addressed but the patch still has not gone though a real review.

Anybody out there willing to take a crack at a review?  All three patches apply (with whitespace issues).

Emre Hasegeli will review the patch.
 

Thanks,
--
-David
david@pgmasters.net

Re: [PATCH] we have added support for box type in SP-GiST index

От
Emre Hasegeli
Дата:
Here are my first comments.  I haven't read the actual index
implementation, yet.

I think traversal value is a useful addition.  It is nice that the
implementation for the range types is also changed.  My questions
about them are:

Do reconstructedValues is required now?  Wouldn't it be cleaner to use
the new field on the prefix tree implementation, too?

We haven't had specific memory context for reconstructedValues.  Why
is it required for the new field?

> +       Memory for <structfield>traversalValues</> should be allocated in
> +       the default context, but make sure to switch to
> +       <structfield>traversalMemoryContext</> before allocating memory
> +       for pointers themselves.

This sentence is not understandable.  Shouldn't it say "the memory
should *not* be allocated in the default context"?  What are the
"pointers themselves"?

The box opclass is broken because of the floating point arithmetics of
the box type.  You can reproduce it with the attached script.  We
really need to fix the geometric types, before building more on them.
They fail to serve the only purpose they are useful for, demonstrating
features.

It think the opclass should support the cross data type operators like
box @> point.  Ideally we should do it by using multiple opclasses in
an opfamily.  The floating problem will bite us again in here, because
some of the operators are not using FP macros.

The tests check not supported operator "@".  It should be "<@".  It is
nice that the opclass doesn't support long deprecated operators.

We needs tests for the remaining operators and for adding new values
to the index.  I am not sure create_index.sql is the best place for
them.  Maybe we should use the test scripts of "box".

> + #define LT      -1
> + #define GT       1
> + #define EQ       0

Using these numbers is a very well established pattern.  I don't think
macros make the code any more readable.

> + /* SP-GiST API functions */
> + Datum       spg_box_quad_config(PG_FUNCTION_ARGS);
> + Datum       spg_box_quad_choose(PG_FUNCTION_ARGS);
> + Datum       spg_box_quad_picksplit(PG_FUNCTION_ARGS);
> + Datum       spg_box_quad_inner_consistent(PG_FUNCTION_ARGS);
> + Datum       spg_box_quad_leaf_consistent(PG_FUNCTION_ARGS);

I guess they should go to the header file.

Вложения

Re: [PATCH] we have added support for box type in SP-GiST index

От
Teodor Sigaev
Дата:
> Do reconstructedValues is required now?  Wouldn't it be cleaner to use
> the new field on the prefix tree implementation, too?
reconstructedValues is needed to reconctruct full indexed value and should match
to type info indexed column

>
> We haven't had specific memory context for reconstructedValues.  Why
> is it required for the new field?
because pg knows type of reconstructedValues (see above why) but traversalValue
just a void*, SP-GiST core doesn't knnow how to free memory of allocated for it.


>
>> +       Memory for <structfield>traversalValues</> should be allocated in
>> +       the default context, but make sure to switch to
>> +       <structfield>traversalMemoryContext</> before allocating memory
>> +       for pointers themselves.
>
> This sentence is not understandable.  Shouldn't it say "the memory
> should *not* be allocated in the default context"?  What are the
> "pointers themselves"?
Clarified, I hope


>
> The box opclass is broken because of the floating point arithmetics of
> the box type.  You can reproduce it with the attached script.  We
hope, fixed. At least, seems, syncronized with seqscan

> really need to fix the geometric types, before building more on them.
> They fail to serve the only purpose they are useful for, demonstrating
> features.
agree, but it's not a deal for this patch

>
> It think the opclass should support the cross data type operators like
> box @> point.  Ideally we should do it by using multiple opclasses in
> an opfamily.  The floating problem will bite us again in here, because
> some of the operators are not using FP macros.
Again, agree. But I suggest to do it by separate patch.

>
> The tests check not supported operator "@".  It should be "<@".  It is
> nice that the opclass doesn't support long deprecated operators.
fixed


>> + #define LT      -1
>> + #define GT       1
>> + #define EQ       0
>
> Using these numbers is a very well established pattern.  I don't think
> macros make the code any more readable.
fixed

>
>> + /* SP-GiST API functions */
>> + Datum       spg_box_quad_config(PG_FUNCTION_ARGS);
>> + Datum       spg_box_quad_choose(PG_FUNCTION_ARGS);
>> + Datum       spg_box_quad_picksplit(PG_FUNCTION_ARGS);
>> + Datum       spg_box_quad_inner_consistent(PG_FUNCTION_ARGS);
>> + Datum       spg_box_quad_leaf_consistent(PG_FUNCTION_ARGS);
>
> I guess they should go to the header file.
Remove them because they are presented in auto-generated file
./src/include/utils/builtins.h

range patch is unchanged, but still attached to keep them altogether.

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

Re: [PATCH] we have added support for box type in SP-GiST index

От
Emre Hasegeli
Дата:
Here are my comments about the operator class implementation:

> + *    implementation of quad-4d tree over boxes for SP-GiST.

Isn't the whole thing actually 3D?

> +  * For example consider the case of intersection.

No need for a new line after this.

> +  * A quadrant has bounds, but sp-gist keeps only 4-d point (box) in inner nodes.

I couldn't get the term 4D point.  Maybe it means that we are using
box datatype as the prefix, but we are not treating them as boxes.

> + * We use traversalValue to calculate quadrant bounds from parent's quadrant
> + * bounds.

I couldn't understand this sentence.  We are using the parent's prefix
and the quadrant to generate the traversalValue.  I think this is the
crucial part of the opclass.  It would be nice to explain it more
clearly.

> +  * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group

2016.

> + *          src/backend/utils/adt/boxtype_spgist.c

Maybe we should name this file as geo_spgist.c to support other types
in the future.

> + #include "utils/builtins.h";

Extra ;.

> + #include "utils/datum.h"

I think this is unnecessary.

> + /* InfR type implements doubles and +- infinity */
> + typedef struct
> + {
> +    int         infFlag;
> +    double      val;
> + }   InfR;

Why do we need this?  Can't we store infinity in "double"?

> + /* wrap double to InfR */
> + static InfR
> + toInfR(double v, InfR * r)
> + {
> +    r->infFlag = NotInf;
> +    r->val = v;
> + }

This isn't returning the value.

> + typedef struct
> + {
> +    Range       range_x;
> +    Range       range_y;
> + }   Rectangle;

Wouldn't it be simpler to just using BOX instead of this?

> + static void
> + evalIRangeBox(const IRangeBox *range_box, const Range *range, const int half1,
> +              const int half2, IRangeBox *new_range_box)
>
> + static void
> + evalIRectBox(const IRectBox *rect_box, const Rectangle *centroid,
> +             const uint8 quadrant, IRectBox * new_rect_box)
>
> + inline static void
> + initializeUnboundedBox(IRectBox * rect_box)

Wouldn't it be simpler to palloc and return the value on those functions?

> + static int
> + intersect2D(const Range * range, const IRangeBox * range_box)

Wouldn't it be better to return "bool" on those functions.

> +    return ((p1 >= 0) && (p2 <= 0));

Extra parentheses.

> + i    const spgChooseIn *in = (spgChooseIn *) PG_GETARG_POINTER(0);
> + i    spgChooseOut *out = (spgChooseOut *) PG_GETARG_POINTER(1);

Many variables are defined with "const".  I am not sure they are all
that doesn't change.  If it applies to the pointer, "out" should also
not change in here.  Am I wrong?

> +    /*
> +     * Begin. This block evaluates the median of coordinates of boxes
> +     */

I would rather explain what the function does on the function header.

> +             memcpy(new_rect_box, rect_box, sizeof(IRectBox));

Do we really need to copy the traversalValues on allTheSame case.
Wouldn't it work if just the same value is passed for all of them.
The search shouldn't continue after allTheSame case.

> +         for (i = 0; flag && i < in->nkeys && flag; i++)

It checks flag two times.

> +            boxPointerToRectangle(
> +                DatumGetBoxP(in->scankeys[i].sk_argument),
> +                p_query_rect
> +            );

I don't think this matches the project coding style.

> +     int         flag = 1,

Wouldn't it be better as "bool"?

The regression tests for the remaining operators are still missing.



Re: [PATCH] we have added support for box type in SP-GiST index

От
Teodor Sigaev
Дата:
>> + *    implementation of quad-4d tree over boxes for SP-GiST.
> Isn't the whole thing actually 3D?
No. The idea if this work is a representation of 2d box as 4d point. Quad means
quadrant of 2d plane. Originally such kind of tree was developed for 2d point,
and each node of tree splits plane on 4 quadrant. For 3d tree each node splits
space for 8 "quadrants", 4d - 16 ones.

>> +  * For example consider the case of intersection.
> No need for a new line after this.
fixed

>> +  * A quadrant has bounds, but sp-gist keeps only 4-d point (box) in inner nodes.
> I couldn't get the term 4D point.  Maybe it means that we are using
> box datatype as the prefix, but we are not treating them as boxes.
exactly, we treat boxes as 4-dimentional points.

>
>> + * We use traversalValue to calculate quadrant bounds from parent's quadrant
>> + * bounds.
> I couldn't understand this sentence.  We are using the parent's prefix
> and the quadrant to generate the traversalValue.  I think this is the
> crucial part of the opclass.  It would be nice to explain it more
> clearly.

I'll try to explain with two-dimensional example over points. ASCII-art:
            |
            |
     1      |      2
            |
-----------+-------------
            |P
      3     |      4
            |
            |

'+' with 'A' represents a centroid or, other words, point which splits plane for
4 quadrants and it strorend in parent node. 1,2,3,4 are labels of quadrants,
each labeling will be the same for all pictures and all centriods, and i will
not show them in pictures later to prevent too complicated images. Let we add C
- child node (and again, it will split plane for 4 quads):


            |         |
        ----+---------+---
      X     |B        |C
            |         |
-----------+---------+---
            |P        |A
            |         |
            |
            |

A and B are points of intersection of lines. So, box PBCAis a bounding box for
points contained in 3-rd (see labeling above). For example X labeled point is
not a descendace of child node with centroid  C because it must be in branch of
1-st quad of parent node. So, each node (except root) will have a limitation in
its quadrant. To transfer that limitation the traversalValue is used.

To keep formatting I attached a txt file with this fragment.

>
>> +  * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
> 2016.
fixed

>
>> + *          src/backend/utils/adt/boxtype_spgist.c
> Maybe we should name this file as geo_spgist.c to support other types
> in the future.
done

>> + #include "utils/builtins.h";
> Extra ;.
fixed

>
>> + #include "utils/datum.h"
> I think this is unnecessary.
removed

>
>> + /* InfR type implements doubles and +- infinity */
>> + typedef struct
>> + {
>> +    int         infFlag;
>> +    double      val;
>> + }   InfR;
> Why do we need this?  Can't we store infinity in "double"?
Hmm, you are right. fixed.

> This isn't returning the value.
removed

>
>> + typedef struct
>> + {
>> +    Range       range_x;
>> +    Range       range_y;
>> + }   Rectangle;
>
> Wouldn't it be simpler to just using BOX instead of this?
Too much the same names in RectBox

>
>> + static void
>> + evalIRangeBox(const IRangeBox *range_box, const Range *range, const int half1,
>> +              const int half2, IRangeBox *new_range_box)
>>
>> + static void
>> + evalIRectBox(const IRectBox *rect_box, const Rectangle *centroid,
>> +             const uint8 quadrant, IRectBox * new_rect_box)
>>
>> + inline static void
>> + initializeUnboundedBox(IRectBox * rect_box)
>
> Wouldn't it be simpler to palloc and return the value on those functions?
evalRangeBox() initializes part of RectBox, so, it could not palloc its result.
Other methods use the same signature just for consistency.

>
>> + static int
>> + intersect2D(const Range * range, const IRangeBox * range_box)
>
> Wouldn't it be better to return "bool" on those functions.
done

>
>> +    return ((p1 >= 0) && (p2 <= 0));
> Extra parentheses.
removed

>
>> + i    const spgChooseIn *in = (spgChooseIn *) PG_GETARG_POINTER(0);
>> + i    spgChooseOut *out = (spgChooseOut *) PG_GETARG_POINTER(1);
>
> Many variables are defined with "const".  I am not sure they are all
> that doesn't change.  If it applies to the pointer, "out" should also
> not change in here.  Am I wrong?
No, changes

>
>> +    /*
>> +     * Begin. This block evaluates the median of coordinates of boxes
>> +     */
> I would rather explain what the function does on the function header.
fixed

>
>> +             memcpy(new_rect_box, rect_box, sizeof(IRectBox));
> Do we really need to copy the traversalValues on allTheSame case.
> Wouldn't it work if just the same value is passed for all of them.
> The search shouldn't continue after allTheSame case.
Seems, yes. spgist tree could contain a locng branches with allTheSame.

>
>> +         for (i = 0; flag && i < in->nkeys && flag; i++)
> It checks flag two times.
fixed

>
>> +            boxPointerToRectangle(
>> +                DatumGetBoxP(in->scankeys[i].sk_argument),
>> +                p_query_rect
>> +            );
> I don't think this matches the project coding style.
fixed

>
>> +     int         flag = 1,
> Wouldn't it be better as "bool"?
fixed.

> The regression tests for the remaining operators are still missing.
Ooops, will be soon.

Attached all patches  to simplify review. Thank you very much!

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

Re: [PATCH] we have added support for box type in SP-GiST index

От
Kevin Grittner
Дата:
On Mon, Mar 21, 2016 at 11:57 AM, Teodor Sigaev <teodor@sigaev.ru> wrote:

>> I couldn't get the term 4D point.  Maybe it means that we are
>> using box datatype as the prefix, but we are not treating them
>> as boxes.
>
> exactly, we treat boxes as 4-dimentional points.

I'm not entirely sure I understand the terminology either, since I
tend to think of a *point* as having *zero* dimensions.  Would it
perhaps be more accurate to say we are treating a 2-dimensional box
as a point in 4-dimensional space?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] we have added support for box type in SP-GiST index

От
Stas Kelvich
Дата:
> On 21 Mar 2016, at 22:38, Kevin Grittner <kgrittn@gmail.com> wrote:
>
> On Mon, Mar 21, 2016 at 11:57 AM, Teodor Sigaev <teodor@sigaev.ru> wrote:
>
>>> I couldn't get the term 4D point.  Maybe it means that we are
>>> using box datatype as the prefix, but we are not treating them
>>> as boxes.
>>
>> exactly, we treat boxes as 4-dimentional points.
>
> I'm not entirely sure I understand the terminology either, since I
> tend to think of a *point* as having *zero* dimensions.  Would it
> perhaps be more accurate to say we are treating a 2-dimensional box
> as a point in 4-dimensional space?

Or just say 4-d vector instead of 4-d point. Look like it will be enough rigorous.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: [PATCH] we have added support for box type in SP-GiST index

От
Jim Nasby
Дата:
On 3/21/16 11:57 AM, Teodor Sigaev wrote:
> A and B are points of intersection of lines. So, box PBCAis a bounding
> box for points contained in 3-rd (see labeling above). For example X
> labeled point is not a descendace of child node with centroid  C because
> it must be in branch of 1-st quad of parent node. So, each node (except
> root) will have a limitation in its quadrant. To transfer that
> limitation the traversalValue is used.

Isn't this basically the same thing that the cube contrib module does? 
(Which has the added benefit of kNN-capable operators).

If that's true then ISTM it'd be better to work on pulling cube's 
features into box?

If it's not true, I'm still wondering if there's enough commonality here 
that we should pull cube into core...
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: [PATCH] we have added support for box type in SP-GiST index

От
Stas Kelvich
Дата:
> On 22 Mar 2016, at 01:47, Jim Nasby <Jim.Nasby@BlueTreble.com> wrote:
>
> On 3/21/16 11:57 AM, Teodor Sigaev wrote:
>> A and B are points of intersection of lines. So, box PBCAis a bounding
>> box for points contained in 3-rd (see labeling above). For example X
>> labeled point is not a descendace of child node with centroid  C because
>> it must be in branch of 1-st quad of parent node. So, each node (except
>> root) will have a limitation in its quadrant. To transfer that
>> limitation the traversalValue is used.
>
> Isn't this basically the same thing that the cube contrib module does? (Which has the added benefit of kNN-capable
operators).

Cube and postgres own geometric types are indexed by building R-tree. While this is one of the most popular solutions
it
degrades almost to linear search when bounding boxes for each index node overlaps a lot. This problem will arise when
onewill 
try to index streets in the city with grid system — a lot of street's bounding boxes will just coincide with bounding
boxfor whole city. 

But that patch use totally different strategy for building index and do not suffer from above problem.

>
> If that's true then ISTM it'd be better to work on pulling cube's features into box?
>
> If it's not true, I'm still wondering if there's enough commonality here that we should pull cube into core…

There is also intarray module with very common functionality, but it also addressing different data pattern.

Optimal indexing and kNN strategy are very sensible to the data itself and if we want some general multidimensional
searchroutines, 
then I think none of the mentioned extensions is general enough. Cube barely applicable when dimensions number is
higherthan 10-20, 
intarray performs bad on data with big sets of possible coordinates, this patch is also intended to help with specific,
nicheproblem. 

While people tends to use machine learning and regressions models more and more it is interesting to have some general
n-dimindexing with kNN, 
but I think it is different problem and should be solved in a different way.

Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: [PATCH] we have added support for box type in SP-GiST index

От
Jim Nasby
Дата:
On 3/21/16 7:41 PM, Stas Kelvich wrote:
> While people tends to use machine learning and regressions models more and more it is interesting to have some
generaln-dim indexing with kNN,
 
> but I think it is different problem and should be solved in a different way.

I think one of the issues here is it's not very clear to someone without 
a good amount of ML knowledge how these things relate. I hear "box' and 
'cube' and think it's just a 2D vs 3D issue, and intarray isn't even on 
the radar.

Maybe what's needed are actual vector and matrix types?

In any case, if you've got a good reason why box and cube should stay 
separate then further discussion should happen in another thread.

BTW, if you haven't seen it, take a look at http://madlib.apache.org/
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: [PATCH] we have added support for box type in SP-GiST index

От
Teodor Sigaev
Дата:
> tend to think of a *point* as having *zero* dimensions.  Would it
> perhaps be more accurate to say we are treating a 2-dimensional box
> as a point in 4-dimensional space?
Exactly, sorry for ambiguity.
-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Re: [PATCH] we have added support for box type in SP-GiST index

От
Teodor Sigaev
Дата:
> Isn't this basically the same thing that the cube contrib module does? (Which
> has the added benefit of kNN-capable operators).
No, cube module introduces new type - N-dimensional box. And adds an index 
support for it.

Current patch suggests non-traditional indexing technique for 2D boxes by 
treating them as point in 4D space. With such representation it's became 
possible to use quad-tree technique which:
1 supports only points to index
2 already supported by SP-GiST

Such technique provides some benefits in comparance with traditional R-Tree 
which implemented in pg with a help GiST. See Oleg's message.


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Re: [PATCH] we have added support for box type in SP-GiST index

От
Emre Hasegeli
Дата:
> +  * boxtype_spgist.c

The names on the file header need to be changed, too.

> I'll try to explain with two-dimensional example over points. ASCII-art:
>            |
>            |
>     1      |      2
>            |
> -----------+-------------
>            |P
>      3     |      4
>            |
>            |
>
> '+' with 'A' represents a centroid or, other words, point which splits plane
> for 4 quadrants and it strorend in parent node. 1,2,3,4 are labels of
> quadrants, each labeling will be the same for all pictures and all
> centriods, and i will not show them in pictures later to prevent too
> complicated images. Let we add C - child node (and again, it will split
> plane for 4 quads):
>
>
>            |         |
>        ----+---------+---
>      X     |B        |C
>            |         |
> -----------+---------+---
>            |P        |A
>            |         |
>            |
>            |
>
> A and B are points of intersection of lines. So, box PBCAis a bounding box
> for points contained in 3-rd (see labeling above). For example X labeled
> point is not a descendace of child node with centroid  C because it must be
> in branch of 1-st quad of parent node. So, each node (except root) will have
> a limitation in its quadrant. To transfer that limitation the traversalValue
> is used.
>
> To keep formatting I attached a txt file with this fragment.

Thank you for the explanation.  Should we incorporate this with the patch.

>>> +  * Portions Copyright (c) 1996-2015, PostgreSQL Global Development
>>> Group
>>
>> 2016.
>
> fixed

Not on the patch.

> + cmp_double(const double a, const double b)

Does this function necessary?  We can just compare the doubles.

> + boxPointerToRangeBox(BOX *box, RangeBox * rectangle)

The "rectangle" variable in here should be renamed.

> +     Assert(is_infinite(b) == 0);

This is failing on my test.  You can reproduce with the script I have sent.

>> Wouldn't it be simpler to palloc and return the value on those functions?
>
> evalRangeBox() initializes part of RectBox, so, it could not palloc its
> result.
> Other methods use the same signature just for consistency.

I couldn't understand it.  evalRangeBox() can palloc and return the
result.  evalRectBox() can return the result palloc'ed by
evalRangeBox().  The caller wouldn't need to palloc anything.

>> Many variables are defined with "const".  I am not sure they are all
>> that doesn't change.  If it applies to the pointer, "out" should also
>> not change in here.  Am I wrong?
>
> No, changes

I now read about "const".  I am sorry for not being experienced in C.
The new version of the patch marks them as "const" by mistake.

I went through all other variables:

> +     int r = is_infinite(a);
>
> +     double      x = *(double *) a;
> +     double      y = *(double *) b;
>
> +     spgInnerConsistentIn *in = (spgInnerConsistentIn *) PG_GETARG_POINTER(0);
>
> +     spgLeafConsistentIn *in = (spgLeafConsistentIn *) PG_GETARG_POINTER(0);
>
> +     BOX        *leafBox = DatumGetBoxP(in->leafDatum);

Shouldn't they be "const", too?

>>> +    /*
>>> +     * Begin. This block evaluates the median of coordinates of boxes
>>> +     */
>>
>> I would rather explain what the function does on the function header.
>
> fixed

The "end" part of it is still there.

>> Do we really need to copy the traversalValues on allTheSame case.
>> Wouldn't it work if just the same value is passed for all of them.
>> The search shouldn't continue after allTheSame case.
>
> Seems, yes. spgist tree could contain a locng branches with allTheSame.

Does SP-GiST allows any node under allTheSame to not being allTheSame?Not setting traversalValues for allTheSame worked
finewith my test.
 

> +     if (in->allTheSame)

Most of the things happening before this check is not used in there.
Shouldn't we move this to the top of the function?

> +     out->nodeNumbers = (int *) palloc(sizeof(int) * in->nNodes);

This could go before allTheSame block.



Re: [PATCH] we have added support for box type in SP-GiST index

От
Teodor Sigaev
Дата:
>> +  * boxtype_spgist.c
> The names on the file header need to be changed, too.
Oops. fixed



>> I'll try to explain with two-dimensional example over points. ASCII-art:
> Thank you for the explanation.  Should we incorporate this with the patch.
added


>> + cmp_double(const double a, const double b)
> Does this function necessary?  We can just compare the doubles.
Yes, it compares with limited precision as it does by geometry operations.
Renamed to point actual arguments.

>
>> + boxPointerToRangeBox(BOX *box, RangeBox * rectangle)
> The "rectangle" variable in here should be renamed.
fixed

>
>> +     Assert(is_infinite(b) == 0);
> This is failing on my test.  You can reproduce with the script I have sent.
I didn't know:
# select '(1,inf)'::point;
   point
---------
  (1,inf)

fixed

>
>>> Wouldn't it be simpler to palloc and return the value on those functions?
>> evalRangeBox() initializes part of RectBox, so, it could not palloc its
>> result.
>> Other methods use the same signature just for consistency.
>
> I couldn't understand it.  evalRangeBox() can palloc and return the
> result.  evalRectBox() can return the result palloc'ed by
> evalRangeBox().  The caller wouldn't need to palloc anything.
evalRangeBox() is used to initialize fields of RangeBox in evalRectBox(). If
evalRangeBox() will palloc its result then we need to copy its result into
RangeBox struct and free. Let both fucntion use the same interface.

> I went through all other variables:
>> +     int r = is_infinite(a);
>> +     double      x = *(double *) a;
>> +     double      y = *(double *) b;
>> +     spgInnerConsistentIn *in = (spgInnerConsistentIn *) PG_GETARG_POINTER(0);
>> +     spgLeafConsistentIn *in = (spgLeafConsistentIn *) PG_GETARG_POINTER(0);
>> +     BOX        *leafBox = DatumGetBoxP(in->leafDatum);
> Shouldn't they be "const", too?
They could. But it doesn't required. To be in consistent state I've removed
const modifier where possible


>
>>>> +    /*
>>>> +     * Begin. This block evaluates the median of coordinates of boxes
>>>> +     */
>>>
>>> I would rather explain what the function does on the function header.
>>
>> fixed
> The "end" part of it is still there.
Oops again, fixed

>
>>> Do we really need to copy the traversalValues on allTheSame case.
>>> Wouldn't it work if just the same value is passed for all of them.
>>> The search shouldn't continue after allTheSame case.
>>
>> Seems, yes. spgist tree could contain a locng branches with allTheSame.
>
> Does SP-GiST allows any node under allTheSame to not being allTheSame?
No, SP-GiST doesn't allow that
>   Not setting traversalValues for allTheSame worked fine with my test.
it works until allthesame branch contains only one inner node. Otherwise
traversalValue will be freeed several times, see spgWalk().
# select i as id, '1,2,3,4'::box as b into x from generate_series(1,1000000) i;
# create index ix on i using spgist (b);
# select count(*) from x where b && '1,2,3,4'::box; -- coredump
gdb:
#0  0x000000080143564a in thr_kill () from /lib/libc.so.7
#1  0x0000000801435636 in raise () from /lib/libc.so.7
#2  0x00000008014355b9 in abort () from /lib/libc.so.7
#3  0x0000000000a80739 in in_error_recursion_trouble () at elog.c:199
#4  0x0000000000abb748 in pfree (pointer=0x801e90868) at mcxt.c:1016
#5  0x000000000053330c in freeScanStackEntry (so=0x801e8d358,
stackEntry=0x801e935d8) at spgscan.c:47
#6  0x0000000000532cdb in spgWalk (index=0x801f1c588, so=0x801e8d358,
scanWholeIndex=1 '\001', storeRes=0x532d10 <storeBitm
...

>
>> +     if (in->allTheSame)
> Most of the things happening before this check is not used in there.
> Shouldn't we move this to the top of the function?
yep, fixed

>
>> +     out->nodeNumbers = (int *) palloc(sizeof(int) * in->nNodes);
> This could go before allTheSame block.
yep, fixed

I've attached all patches again. Thank you very much!

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Вложения

Re: [PATCH] we have added support for box type in SP-GiST index

От
Emre Hasegeli
Дата:
>>> I'll try to explain with two-dimensional example over points. ASCII-art:
>>
>> Thank you for the explanation.  Should we incorporate this with the patch.
>
> added

I have worked on the comments of the patch.  It is attached.  I hope
it looks more clear than it was before.

>>> + cmp_double(const double a, const double b)
>>
>> Does this function necessary?  We can just compare the doubles.
>
> Yes, it compares with limited precision as it does by geometry operations.
> Renamed to point actual arguments.

I meant that we could use FP macros directly instead of this function.
Doing so is also more correct.  I haven't tried to produce the
problem, but this function is not same as using the macros directly.
For differences smaller that the epsilon, it can return different
results.  I have removed it on the attached version.

>>> + boxPointerToRangeBox(BOX *box, RangeBox * rectangle)
>>
>> The "rectangle" variable in here should be renamed.
>
> fixed

I found a bunch of those too and renamed for clarity.  I have also
reordered the arguments of the helper functions to keep them
consistent.

> evalRangeBox() is used to initialize fields of RangeBox in evalRectBox(). If
> evalRangeBox() will palloc its result then we need to copy its result into
> RangeBox struct and free. Let both fucntion use the same interface.

I found it nicer to copy and edit the existing value than creating it
in two steps like this.  It is also attached.

> it works until allthesame branch contains only one inner node. Otherwise
> traversalValue will be freeed several times, see spgWalk().

It just works, when traversalValues is not set.  It is also attached.

I have also added the missing regression tests.  While doing that I
noticed that some operators are missing and also added support for
them.  It is also attached with the updated version of my test script.

I hope I haven't changed the patch too much.  I have also pushed the
changes here:

https://github.com/hasegeli/postgres/commits/box-spgist

Вложения

Re: [PATCH] we have added support for box type in SP-GiST index

От
Teodor Sigaev
Дата:
Thank you, pushed

Emre Hasegeli wrote:
>>>> I'll try to explain with two-dimensional example over points. ASCII-art:
>>>
>>> Thank you for the explanation.  Should we incorporate this with the patch.
>>
>> added
>
> I have worked on the comments of the patch.  It is attached.  I hope
> it looks more clear than it was before.
>
>>>> + cmp_double(const double a, const double b)
>>>
>>> Does this function necessary?  We can just compare the doubles.
>>
>> Yes, it compares with limited precision as it does by geometry operations.
>> Renamed to point actual arguments.
>
> I meant that we could use FP macros directly instead of this function.
> Doing so is also more correct.  I haven't tried to produce the
> problem, but this function is not same as using the macros directly.
> For differences smaller that the epsilon, it can return different
> results.  I have removed it on the attached version.
>
>>>> + boxPointerToRangeBox(BOX *box, RangeBox * rectangle)
>>>
>>> The "rectangle" variable in here should be renamed.
>>
>> fixed
>
> I found a bunch of those too and renamed for clarity.  I have also
> reordered the arguments of the helper functions to keep them
> consistent.
>
>> evalRangeBox() is used to initialize fields of RangeBox in evalRectBox(). If
>> evalRangeBox() will palloc its result then we need to copy its result into
>> RangeBox struct and free. Let both fucntion use the same interface.
>
> I found it nicer to copy and edit the existing value than creating it
> in two steps like this.  It is also attached.
>
>> it works until allthesame branch contains only one inner node. Otherwise
>> traversalValue will be freeed several times, see spgWalk().
>
> It just works, when traversalValues is not set.  It is also attached.
>
> I have also added the missing regression tests.  While doing that I
> noticed that some operators are missing and also added support for
> them.  It is also attached with the updated version of my test script.
>
> I hope I haven't changed the patch too much.  I have also pushed the
> changes here:
>
> https://github.com/hasegeli/postgres/commits/box-spgist
>

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Re: [PATCH] we have added support for box type in SP-GiST index

От
Emre Hasegeli
Дата:
> Thank you, pushed

Thank you for working on this.

I noticed that have overlooked this:
static RectBox *
-initRectBox()
+initRectBox(void){



Re: [PATCH] we have added support for box type in SP-GiST index

От
Bruce Momjian
Дата:
On Thu, Mar 31, 2016 at 05:46:41PM +0200, Emre Hasegeli wrote:
> > Thank you, pushed
> 
> Thank you for working on this.
> 
> I noticed that have overlooked this:
> 
>  static RectBox *
> -initRectBox()
> +initRectBox(void)
>  {

Done, thanks.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+                     Ancient Roman grave inscription +