Обсуждение: Re: [HACKERS] Range Merge Join v1

Поиск
Список
Период
Сортировка

Re: [HACKERS] Range Merge Join v1

От
Alexander Kuzmenkov
Дата:
Hi Jeff,

Fast range joins are very nice to have, thank you for working on this.

I've been doing a somewhat related thing recently, trying to teach the 
merge join executor to perform full joins on comparison clauses [1]. I 
have some comments after reading your patch:

* At the moment, "mergejoinable clause" and "equality clause" mean the 
same thing to the planner, and those clauses are used both to create 
equivalence classes and to perform merge joins. If we start performing 
merge joins for different kinds of clauses, such as comparison or range 
intersection, it makes sense to separate these two meanings. I tried 
this in my patch and it didn't require many changes. I use 
RestrictInfo.equivopfamilies to build equivalence classes, and 
RestrictInfo.mergeopfamilies for mergejoinable clauses.

* Semantically, MJCompare() is a function that determines whether you 
should emit a join result or advance one of the pointers. This meaning 
is not explicit in the code and is not well encapsulated: the function 
returns and int and 'compareNoMatch' flag, and the calling code combines 
them in various ways to derive the final result. This meaning can be 
made explicit by making MJCompare return enum values {Join, NextInner, 
NextOuter}, and putting inside it all the logic that decides whether we 
join or advance. ExecMergeJoin looks intimidating already, and I think 
this change would help readability. Again, you can find an illustration 
in my patch.

I hope you find these points helpful.

[1] 

https://www.postgresql.org/message-id/flat/1d23ad41-a9d9-1d1e-1d79-83b002d6f776%40postgrespro.ru#1d23ad41-a9d9-1d1e-1d79-83b002d6f776@postgrespro.ru

-- 
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: [HACKERS] Range Merge Join v1

От
Jeff Davis
Дата:
On Fri, Aug 25, 2017 at 10:19 AM, Alexander Kuzmenkov
<a.kuzmenkov@postgrespro.ru> wrote:
> Hi Jeff,

Hi,

Thank you for the review and suggestions!

> * At the moment, "mergejoinable clause" and "equality clause" mean the same
> thing to the planner, and those clauses are used both to create equivalence
> classes and to perform merge joins. If we start performing merge joins for
> different kinds of clauses, such as comparison or range intersection, it
> makes sense to separate these two meanings. I tried this in my patch and it
> didn't require many changes. I use RestrictInfo.equivopfamilies to build
> equivalence classes, and RestrictInfo.mergeopfamilies for mergejoinable
> clauses.

I like the idea. I will look in more detail and I can either change my
patch or piggyback on yours.

> * Semantically, MJCompare() is a function that determines whether you should
> emit a join result or advance one of the pointers. This meaning is not
> explicit in the code and is not well encapsulated: the function returns and
> int and 'compareNoMatch' flag, and the calling code combines them in various
> ways to derive the final result. This meaning can be made explicit by making
> MJCompare return enum values {Join, NextInner, NextOuter}, and putting
> inside it all the logic that decides whether we join or advance.
> ExecMergeJoin looks intimidating already, and I think this change would help
> readability. Again, you can find an illustration in my patch.

I actually tried doing something similar in my patch, but there are
four cases to consider in EXEC_MJ_SKIP_TEST:

1. Overlaps: mark and then join the tuples
2. left-of: SKIPOUTER_ADVANCE
3. right-of: SKIPINNER_ADVANCE
4. None of the above: mark and NEXTINNER

However, you are right that the current code is ugly, and I should
refactor into 4 explicit cases. I don't think I will call them by the
same names as the join states, though, because there's not a direct
mapping.

Updated patch attached. Changelog:

* Rebased
* Changed MJCompare to return an enum as suggested, but it has 4
possible values rather than 3.
* Added support for joining on contains or contained by (@> or <@) and
updated tests.

Regards,
    Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

Re: [HACKERS] Range Merge Join v1

От
Jeff Davis
Дата:
On Thu, Aug 31, 2017 at 1:52 AM, Jeff Davis <pgsql@j-davis.com> wrote:
> Updated patch attached. Changelog:
>
> * Rebased
> * Changed MJCompare to return an enum as suggested, but it has 4
> possible values rather than 3.
> * Added support for joining on contains or contained by (@> or <@) and
> updated tests.

The current patch does not work well with multiple keys, and I believe
it's important to solve because it will make it usable for
multi-dimension spatial joins.

The problem is this: the algorithm for a single key demands that the
input is sorted by (lower(r) NULLS FIRST, upper(r) NULLS LAST). That's
easy, because that's also the sort order for the range operator class,
so everything just works.

For multiple keys, the order of the input is:  lower(r1) NULLS FIRST, lower(r2) NULLS FIRST,  upper(r1) NULLS LAST,
upper(r2)NULLS LAST
 

But that can't match up with any opclass, because an opclass can only
order one attribute at a time. In this case, the lower bound of r2 is
more significant than the upper bound of r1.

It's easy enough to adapt the execution to make this work as long as
the input is properly sorted. The challenge is making the optimizer
choose the sort keys properly.

I have tried a few approaches. The current approach that I'm using is:
* have a new, special range operator family with no operator classes
* in check_mergejoinable(), detect the && operator and set the
mergeopfamilies to contain only the special operator family
* in try_mergejoin_path(), it will convert the pathkeys using that
operator class into pathkeys over a "lower" expression over the same
EC; and then add on additional pathkeys for the "upper" expressions
onto the end of the pathkey list

Any comments or alternative suggestions welcome. This will probably
take a few days at least, so I put the patch in "waiting on author"
state.

Regards,    Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Range Merge Join v1

От
Michael Paquier
Дата:
On Mon, Sep 18, 2017 at 2:24 AM, Jeff Davis <pgsql@j-davis.com> wrote:
> Any comments or alternative suggestions welcome. This will probably
> take a few days at least, so I put the patch in "waiting on author"
> state.

This did not receive an update for two months. I am marking it as
returned with feedback.
-- 
Michael