[HACKERS] Merge join for GiST

Поиск
Список
Период
Сортировка
От Andrew Borodin
Тема [HACKERS] Merge join for GiST
Дата
Msg-id CAJEAwVE5z5_mE-0SUBjNjNY=CPYf8vq0N9pGHiJckWqE5-8H3A@mail.gmail.com
обсуждение исходный текст
Ответы Re: [HACKERS] Merge join for GiST
Re: [HACKERS] Merge join for GiST
Список pgsql-hackers
Hello, hackers!

==Spatial joins==
Scientific papers from the dawn of R-trees and multidimensional
indexes feature a lot of algorithms for spatial joins.
I.e. you have two sets of geometries s1 and s2, you need to produce
all colliding pairs (p1,p2) where p1 in s1 and p2 in s2. For 2 R-trees
of equal heights with roots R and S most straightforward[1] algorithm
look like:

SpatialJoin (R,S: Node)
{ FOR (all E in  S)   FOR (all ER in R with ER.rect intersecting  E.rect)      IF (R is a leaf page)      {
OutputER,ES      }      ELSE      {        SpatialJoin (ER.ref, ES.ref)      } 
}


==Motivation==
I’m mostly interested in OLAP-style aggregates like:

SELECT rows.Id, sum(data.attribute1), sum(data.attribute2), sum(data.attribute3)
FROM rows LEFT JOIN data ON rows.IndexedAttribute && data.IndexedAttribute
GROUP BY 1

When we have two GiST for rows.IndexedAttribute and data.IndexedAttribute.
Currently, this query produces plans with Nested Loop Join, so for
every row from “rows” there is one GiST scan into “data”.


==Proposal==
Obviously, for this scenario, we can use GiST-based join algorithm
identical to the SpatialJoin algorithm above.

This algorithm will work iif (key1 && key2)  ==> (union(key1,key3) &&
key2 ) and (union(key2,key3) && key1 ) for any key3. This is quite
straightforward for && operator, while I’m not sure this will work for
<@ and @> and others.

I can try to implement this kind of join as an extension or try to
embed this into the planner.

I think this idea is somewhat related to this patch [2], but as for
now cannot describe how exactly GiST merge and Range Merge features
relate.


How do you think:
1. Has this idea been considered before? I had not found anything on
actual spatial join of GiSTS.
2. What is the easiest way to implement this feature?
3. What kind of operators may be spatial-joined without intervention
to existing GiST scan strategies API?

Many thanks for reading.

Best regards, Andrey Borodin.

[1] Brinkhoff, Thomas, Hans-Peter Kriegel, and Bernhard Seeger.
Efficient processing of spatial joins using R-trees. Vol. 22. No. 2.
ACM, 1993.
[2]
https://www.postgresql.org/message-id/flat/CAMp0ubfwAFFW3O_NgKqpRPmm56M4weTEXjprb2gP_NrDaEC4Eg%40mail.gmail.com#CAMp0ubfwAFFW3O_NgKqpRPmm56M4weTEXjprb2gP_NrDaEC4Eg@mail.gmail.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kuntal Ghosh
Дата:
Сообщение: Re: [HACKERS] strange parallel query behavior after OOM crashes
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] Compiler warning in costsize.c