Обсуждение: Re: [HACKERS] Range Merge Join v1
Hi Jeff, Fast range joins are very nice to have, thank you for working on this. I've been doing a somewhat related thing recently, trying to teach the merge join executor to perform full joins on comparison clauses [1]. I have some comments after reading your patch: * At the moment, "mergejoinable clause" and "equality clause" mean the same thing to the planner, and those clauses are used both to create equivalence classes and to perform merge joins. If we start performing merge joins for different kinds of clauses, such as comparison or range intersection, it makes sense to separate these two meanings. I tried this in my patch and it didn't require many changes. I use RestrictInfo.equivopfamilies to build equivalence classes, and RestrictInfo.mergeopfamilies for mergejoinable clauses. * Semantically, MJCompare() is a function that determines whether you should emit a join result or advance one of the pointers. This meaning is not explicit in the code and is not well encapsulated: the function returns and int and 'compareNoMatch' flag, and the calling code combines them in various ways to derive the final result. This meaning can be made explicit by making MJCompare return enum values {Join, NextInner, NextOuter}, and putting inside it all the logic that decides whether we join or advance. ExecMergeJoin looks intimidating already, and I think this change would help readability. Again, you can find an illustration in my patch. I hope you find these points helpful. [1] https://www.postgresql.org/message-id/flat/1d23ad41-a9d9-1d1e-1d79-83b002d6f776%40postgrespro.ru#1d23ad41-a9d9-1d1e-1d79-83b002d6f776@postgrespro.ru -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Fri, Aug 25, 2017 at 10:19 AM, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote: > Hi Jeff, Hi, Thank you for the review and suggestions! > * At the moment, "mergejoinable clause" and "equality clause" mean the same > thing to the planner, and those clauses are used both to create equivalence > classes and to perform merge joins. If we start performing merge joins for > different kinds of clauses, such as comparison or range intersection, it > makes sense to separate these two meanings. I tried this in my patch and it > didn't require many changes. I use RestrictInfo.equivopfamilies to build > equivalence classes, and RestrictInfo.mergeopfamilies for mergejoinable > clauses. I like the idea. I will look in more detail and I can either change my patch or piggyback on yours. > * Semantically, MJCompare() is a function that determines whether you should > emit a join result or advance one of the pointers. This meaning is not > explicit in the code and is not well encapsulated: the function returns and > int and 'compareNoMatch' flag, and the calling code combines them in various > ways to derive the final result. This meaning can be made explicit by making > MJCompare return enum values {Join, NextInner, NextOuter}, and putting > inside it all the logic that decides whether we join or advance. > ExecMergeJoin looks intimidating already, and I think this change would help > readability. Again, you can find an illustration in my patch. I actually tried doing something similar in my patch, but there are four cases to consider in EXEC_MJ_SKIP_TEST: 1. Overlaps: mark and then join the tuples 2. left-of: SKIPOUTER_ADVANCE 3. right-of: SKIPINNER_ADVANCE 4. None of the above: mark and NEXTINNER However, you are right that the current code is ugly, and I should refactor into 4 explicit cases. I don't think I will call them by the same names as the join states, though, because there's not a direct mapping. Updated patch attached. Changelog: * Rebased * Changed MJCompare to return an enum as suggested, but it has 4 possible values rather than 3. * Added support for joining on contains or contained by (@> or <@) and updated tests. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Вложения
On Thu, Aug 31, 2017 at 1:52 AM, Jeff Davis <pgsql@j-davis.com> wrote: > Updated patch attached. Changelog: > > * Rebased > * Changed MJCompare to return an enum as suggested, but it has 4 > possible values rather than 3. > * Added support for joining on contains or contained by (@> or <@) and > updated tests. The current patch does not work well with multiple keys, and I believe it's important to solve because it will make it usable for multi-dimension spatial joins. The problem is this: the algorithm for a single key demands that the input is sorted by (lower(r) NULLS FIRST, upper(r) NULLS LAST). That's easy, because that's also the sort order for the range operator class, so everything just works. For multiple keys, the order of the input is: lower(r1) NULLS FIRST, lower(r2) NULLS FIRST, upper(r1) NULLS LAST, upper(r2)NULLS LAST But that can't match up with any opclass, because an opclass can only order one attribute at a time. In this case, the lower bound of r2 is more significant than the upper bound of r1. It's easy enough to adapt the execution to make this work as long as the input is properly sorted. The challenge is making the optimizer choose the sort keys properly. I have tried a few approaches. The current approach that I'm using is: * have a new, special range operator family with no operator classes * in check_mergejoinable(), detect the && operator and set the mergeopfamilies to contain only the special operator family * in try_mergejoin_path(), it will convert the pathkeys using that operator class into pathkeys over a "lower" expression over the same EC; and then add on additional pathkeys for the "upper" expressions onto the end of the pathkey list Any comments or alternative suggestions welcome. This will probably take a few days at least, so I put the patch in "waiting on author" state. Regards, Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Sep 18, 2017 at 2:24 AM, Jeff Davis <pgsql@j-davis.com> wrote: > Any comments or alternative suggestions welcome. This will probably > take a few days at least, so I put the patch in "waiting on author" > state. This did not receive an update for two months. I am marking it as returned with feedback. -- Michael