Обсуждение: ERROR: left and right pathkeys do not match in mergejoin

Поиск
Список
Период
Сортировка

ERROR: left and right pathkeys do not match in mergejoin

От
Alexander Kuzmenkov
Дата:
Hi hackers,

I found a bug related to the planning of merge joins. The steps to 
reproduce are as follows:
```
CREATE TABLE J1_TBL (
     i integer,
     j integer,
     t text
);
CREATE TABLE J2_TBL (
     i integer,
     k integer
);
set enable_hashjoin to off;
explain select * from j1_tbl full join (select * from j2_tbl order by 
j2_tbl.i desc, j2_tbl.k) j2_tbl on j1_tbl.i = j2_tbl.i and j1_tbl.i = 
j2_tbl.k;

ERROR:  left and right pathkeys do not match in mergejoin
```

It can be reproduced on the latest 9.6, 10 and devel versions. 
Basically, j2_tbl is used as the outer relation in merge join. We try to 
use the existing order of j2_tbl, that is, 'i2 desc' and 'k asc'. As a 
result, there is no order for i1 that would allow it to be merge-joined 
to both i2 and k. The corresponding code path begins in 
`generate_mergejoin_pathkeys()`, when it is first called by 
`match_unsorted_outer()`. `find_mergeclauses_for_pathkeys()` selects 
both mergeclauses for the given outer pathkeys, and then 
`make_inner_pathkeys_for_merge()` generates inner pathkey for the first 
mergeclause. Inner pathkey for the second mergeclause is skipped as 
redundant. It looks suspicious already, because this new pathkey has 
different direction and is more like conflicting than redundant. 
Finally, `create_mergejoin_plan()` detects that the inner pathkey 
doesn't match the second mergeclause and throws the error.

I found this while working on the inequality merge join patch. I don't 
know yet how to fix this, so any help is welcome.

-- 
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: ERROR: left and right pathkeys do not match in mergejoin

От
Tom Lane
Дата:
Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> writes:
> explain select * from j1_tbl full join (select * from j2_tbl order by 
> j2_tbl.i desc, j2_tbl.k) j2_tbl on j1_tbl.i = j2_tbl.i and j1_tbl.i = 
> j2_tbl.k;
> ERROR:  left and right pathkeys do not match in mergejoin

Nice example.  There are several places that we could consider trying
to fix this:

The first possibility is to teach find_mergeclauses_for_pathkeys that
it should not select both of the join clauses in a case like this,
perhaps based on noting that the associated pathkeys have the same
eclass but different sort orders.  However, that seems like a loser
for the reason specified in the comment in that function: we need to
select a maximal list of mergeclauses, not a minimal list, because
if it's a full join and we aren't able to include all the clauses as
mergeclauses then we won't have a valid plan.  (At the time this code
was written, that could have resulted in failure to produce a plan
at all.  Now we could fall back to a hash full join ... but that might
be inefficient, or we might not have hashable operators.)

The next possibility is to fix it in make_inner_pathkeys_for_merge,
by having it not suppress lower-order pathkeys unless they match earlier
ones in all details (not just eclass).  In an example like this, that
means emitting a redundant inner pathkey list, equivalent to "ORDER BY
j1_tbl.i DESC, j1_tbl.i ASC".  That's kind of annoying.  It seems to
work in a simple test, but it implies doing useless comparisons during
the sort (since we'd only reach comparing the second sort column if
the first sort column compares equal, whereupon the second must too).
And it means representing the inner sort order by a noncanonical
pathkey list, which is not nice.  For example, even if we have an inner
path available that delivers rows ordered by "j1_tbl.i DESC", we'd
still think we have to physically sort it to add the extra sort key.
(And no, I don't want to change pathkey comparison rules to avoid that.)

The third possibility is to decide that create_mergejoin_plan is being
overly paranoid and it's okay to extract merge details from a "redundant"
path key even though it specifies the opposite sort order from what the
current merge clause seems to need.  This is scary at first glance, but
it seems like it should work.  We'd essentially be lying to the executor
about the sort ordering of the lower-order merge column, and trusting that
it won't take any wrong actions as a result because it should never reach
comparison of that lower-order column except when the higher column(s)
are equal.  I can't see a reason for that to go wrong; it's basically a
restatement of the argument why the lower-order sort key can be considered
redundant in the first place.  But it doesn't leave a warm feeling in the
pit of the stomach, especially for a bug fix that needs to be back-patched
a long way.

On balance, though, this last choice seems like the thing to do.  There
are clear downsides to the first two, in terms of failing to construct a
plan or constructing a needlessly inefficient plan.

            regards, tom lane


Re: ERROR: left and right pathkeys do not match in mergejoin

От
Tom Lane
Дата:
I wrote:
> The third possibility is to decide that create_mergejoin_plan is being
> overly paranoid and it's okay to extract merge details from a "redundant"
> path key even though it specifies the opposite sort order from what the
> current merge clause seems to need.  This is scary at first glance, but
> it seems like it should work.

BTW, while working through this, I realized that there's an additional
problem in the same area, which can be demonstrated thus in the
regression database:

explain select * from
  (select * from tenk1 a order by a.unique2) a
  right join
  (select * from tenk1 b order by b.thousand, b.twothousand, b.fivethous) b
  on a.unique2 = b.thousand and a.hundred = b.twothousand and a.unique2 = b.fivethous;
ERROR:  outer pathkeys do not match mergeclauses

The problem here is that find_mergeclauses_for_pathkeys has an API
designed on the assumption that the outer and inner pathkeys have
identical relationships to the mergeclauses, ie, if you want to find
the mergeclauses to use given a particular available sort ordering,
it works basically the same way for outer or inner pathkeys.  But per
this discussion, tain't so.  What really happens is that initially
we choose a list of mergeclauses that line up with the outer pathkeys,
and then we identify inner pathkeys that make sense given that list
of mergeclauses; but, in the presence of redundancy, those inner pathkeys
are not necessarily one-for-one with the mergeclauses.  All fine so
far.  But then, generate_mergejoin_paths looks at inner paths that have
available sort keys that are truncations of the original inner pathkey
list, and it uses find_mergeclauses_for_pathkeys to decide which
mergeclauses still make sense for that truncated pathkey list.  That
doesn't work.  In my example, we consider the pre-ordered sub-select
output for B as the outer side, and we choose all three mergeclauses
in the order the example has them, and we arrive at the initial inner
pathkey list of ({a.unique2}, {a.hundred}), dropping the redundant
second appearance of {a.unique2}.  We can successfully make a plan
from that.  But then we consider the truncated inner pathkey list
({a.unique2}), which this example is set up to ensure will win since
it avoids an extra sort step.  As the code stands, the mergeclause list
that's extracted to use with that pair of input paths is
    a.unique2 = b.thousand and a.unique2 = b.fivethous
(from the find_mergeclauses_for_pathkeys call with outer_keys = false).
That's just wrong, because it doesn't match the already-determined
outer path order, and create_mergejoin_plan is quite right to whine.

So it seems like find_mergeclauses_for_pathkeys should be renamed to
find_mergeclauses_for_outer_pathkeys, and then we need a second
function for the task of truncating the outer mergeclauses --- not
selecting them from scratch --- given a truncated inner pathkey list.
In this example, we could keep a.unique2 = b.thousand, but we'd have
to drop a.hundred = b.twothousand and then everything after it, since
the inner path doesn't have the sort order needed for that mergeclause.

            regards, tom lane


Re: ERROR: left and right pathkeys do not match in mergejoin

От
Alexander Kuzmenkov
Дата:
El 22/02/18 a las 21:03, Tom Lane escribió:
> The third possibility is to decide that create_mergejoin_plan is being
> overly paranoid and it's okay to extract merge details from a "redundant"
> path key even though it specifies the opposite sort order from what the
> current merge clause seems to need.

This check looks necessary to me, because merge join can't work with 
such a combination of outer pathkeys and merge clauses. The order of 
outer and inner column must be the same. Here, two outer columns with 
different orders are compared to the same inner column, so there will be 
mismatch one way or another. Consider the original query with some 
sample data:

     outer     inner
   desc  asc   desc
    i2   k2     i1
------------------
-> 1    0      1 <-
    1    1      0
    1    2
    0

The merge join starts at tuples marked with arrows. It finds that the 
outer tuple is less than the inner, so it advances the inner pointer, 
and loses the tuple with i1 = 1.

So, if we can't do a merge join on these clauses with these pathkeys 
anyway, it is OK to discard some of them in 
find_mergeclauses_for_pathkey. This helps the case when we use the 
existing order of outer relation. Also, we should take care not to 
create such combinations in select_outer_pathkeys_for_merge, when it 
tries to match root->query_pathkeys.

-- 
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: ERROR: left and right pathkeys do not match in mergejoin

От
Tom Lane
Дата:
Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> writes:
>> The third possibility is to decide that create_mergejoin_plan is being
>> overly paranoid and it's okay to extract merge details from a "redundant"
>> path key even though it specifies the opposite sort order from what the
>> current merge clause seems to need.

> This check looks necessary to me, because merge join can't work with 
> such a combination of outer pathkeys and merge clauses.

No, I think it's okay, for exactly the same reason that the pathkey got
thrown away as redundant in the first place.  That is, imagine that we
are mergejoining on "o.a = i.x AND o.b = i.x", and we know that the sort
ordering of the outer relation is "a ASC, b DESC".  It would appear that
this means the inner relation needs to be ordered like "x ASC, x DESC",
and certainly if we were to play dumb and sort it that way, you'd expect
everything to work.  The reason that we're having trouble is that the
pathkey machinery knows that "x ASC, x DESC" is stupid, and it throws
away the second column.  It would not matter whether we considered the
sort ordering of the inner relation to be "x ASC, x DESC", or just
"x ASC", or indeed "x ASC, x ASC": all of those describe exactly the same
row ordering, because we only get to comparing the second sort column for
rows having equal values in the first sort column, and then our conclusion
must still be that the rows' values are equal.

This logic still goes through if the sort column contents are not simply
duplicate variables but distinct variables that have gotten put into the
same eclass, that is "SELECT ... WHERE x = y ORDER BY x ASC, y DESC".
Equal is equal.  (We go to some pains to be sure this is true, ie that
the equality operator's semantics agree with the ordering operator's.)

So essentially, we can lie to the executor and say that we sorted the
inner rel as "x ASC, x DESC", even though we really did no such thing,
because there will be no difference in the actual input row order from
what it would be if we had done that.

Attached is a completed patch that fixes this and also deals with the
overall topic that the inner and outer pathkeys aren't so interchangeable
as some parts of the code thought.  Maybe the comments could use further
improvement but I think it's OK otherwise.  I haven't started on
back-porting this, but the bugs are demonstrable back to 8.3, so that
needs to be done.

            regards, tom lane

diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 396ee27..a2777ea 100644
*** a/src/backend/optimizer/path/joinpath.c
--- b/src/backend/optimizer/path/joinpath.c
*************** sort_inner_and_outer(PlannerInfo *root,
*** 1009,1018 ****
              outerkeys = all_pathkeys;    /* no work at first one... */

          /* Sort the mergeclauses into the corresponding ordering */
!         cur_mergeclauses = find_mergeclauses_for_pathkeys(root,
!                                                           outerkeys,
!                                                           true,
!                                                           extra->mergeclause_list);

          /* Should have used them all... */
          Assert(list_length(cur_mergeclauses) == list_length(extra->mergeclause_list));
--- 1009,1017 ----
              outerkeys = all_pathkeys;    /* no work at first one... */

          /* Sort the mergeclauses into the corresponding ordering */
!         cur_mergeclauses = find_mergeclauses_for_outer_pathkeys(root,
!                                                                 outerkeys,
!                                                                 extra->mergeclause_list);

          /* Should have used them all... */
          Assert(list_length(cur_mergeclauses) == list_length(extra->mergeclause_list));
*************** generate_mergejoin_paths(PlannerInfo *ro
*** 1102,1111 ****
          jointype = JOIN_INNER;

      /* Look for useful mergeclauses (if any) */
!     mergeclauses = find_mergeclauses_for_pathkeys(root,
!                                                   outerpath->pathkeys,
!                                                   true,
!                                                   extra->mergeclause_list);

      /*
       * Done with this outer path if no chance for a mergejoin.
--- 1101,1109 ----
          jointype = JOIN_INNER;

      /* Look for useful mergeclauses (if any) */
!     mergeclauses = find_mergeclauses_for_outer_pathkeys(root,
!                                                         outerpath->pathkeys,
!                                                         extra->mergeclause_list);

      /*
       * Done with this outer path if no chance for a mergejoin.
*************** generate_mergejoin_paths(PlannerInfo *ro
*** 1228,1237 ****
              if (sortkeycnt < num_sortkeys)
              {
                  newclauses =
!                     find_mergeclauses_for_pathkeys(root,
!                                                    trialsortkeys,
!                                                    false,
!                                                    mergeclauses);
                  Assert(newclauses != NIL);
              }
              else
--- 1226,1234 ----
              if (sortkeycnt < num_sortkeys)
              {
                  newclauses =
!                     trim_mergeclauses_for_inner_pathkeys(root,
!                                                          mergeclauses,
!                                                          trialsortkeys);
                  Assert(newclauses != NIL);
              }
              else
*************** generate_mergejoin_paths(PlannerInfo *ro
*** 1272,1281 ****
                      if (sortkeycnt < num_sortkeys)
                      {
                          newclauses =
!                             find_mergeclauses_for_pathkeys(root,
!                                                            trialsortkeys,
!                                                            false,
!                                                            mergeclauses);
                          Assert(newclauses != NIL);
                      }
                      else
--- 1269,1277 ----
                      if (sortkeycnt < num_sortkeys)
                      {
                          newclauses =
!                             trim_mergeclauses_for_inner_pathkeys(root,
!                                                                  mergeclauses,
!                                                                  trialsortkeys);
                          Assert(newclauses != NIL);
                      }
                      else
diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c
index ef58cff..689bd68 100644
*** a/src/backend/optimizer/path/pathkeys.c
--- b/src/backend/optimizer/path/pathkeys.c
*************** update_mergeclause_eclasses(PlannerInfo
*** 981,994 ****
  }

  /*
!  * find_mergeclauses_for_pathkeys
   *      This routine attempts to find a set of mergeclauses that can be
!  *      used with a specified ordering for one of the input relations.
   *      If successful, it returns a list of mergeclauses.
   *
!  * 'pathkeys' is a pathkeys list showing the ordering of an input path.
!  * 'outer_keys' is true if these keys are for the outer input path,
!  *            false if for inner.
   * 'restrictinfos' is a list of mergejoinable restriction clauses for the
   *            join relation being formed.
   *
--- 981,992 ----
  }

  /*
!  * find_mergeclauses_for_outer_pathkeys
   *      This routine attempts to find a set of mergeclauses that can be
!  *      used with a specified ordering for the join's outer relation.
   *      If successful, it returns a list of mergeclauses.
   *
!  * 'pathkeys' is a pathkeys list showing the ordering of an outer-rel path.
   * 'restrictinfos' is a list of mergejoinable restriction clauses for the
   *            join relation being formed.
   *
*************** update_mergeclause_eclasses(PlannerInfo
*** 1000,1009 ****
   * usable mergeclauses (represented as a list of their restrictinfo nodes).
   */
  List *
! find_mergeclauses_for_pathkeys(PlannerInfo *root,
!                                List *pathkeys,
!                                bool outer_keys,
!                                List *restrictinfos)
  {
      List       *mergeclauses = NIL;
      ListCell   *i;
--- 998,1006 ----
   * usable mergeclauses (represented as a list of their restrictinfo nodes).
   */
  List *
! find_mergeclauses_for_outer_pathkeys(PlannerInfo *root,
!                                      List *pathkeys,
!                                      List *restrictinfos)
  {
      List       *mergeclauses = NIL;
      ListCell   *i;
*************** find_mergeclauses_for_pathkeys(PlannerIn
*** 1044,1062 ****
           *
           * It's possible that multiple matching clauses might have different
           * ECs on the other side, in which case the order we put them into our
!          * result makes a difference in the pathkeys required for the other
!          * input path.  However this routine hasn't got any info about which
           * order would be best, so we don't worry about that.
           *
           * It's also possible that the selected mergejoin clauses produce
!          * a noncanonical ordering of pathkeys for the other side, ie, we
           * might select clauses that reference b.v1, b.v2, b.v1 in that
           * order.  This is not harmful in itself, though it suggests that
!          * the clauses are partially redundant.  Since it happens only with
!          * redundant query conditions, we don't bother to eliminate it.
!          * make_inner_pathkeys_for_merge() has to delete duplicates when
!          * it constructs the canonical pathkeys list, and we also have to
!          * deal with the case in create_mergejoin_plan().
           *----------
           */
          foreach(j, restrictinfos)
--- 1041,1060 ----
           *
           * It's possible that multiple matching clauses might have different
           * ECs on the other side, in which case the order we put them into our
!          * result makes a difference in the pathkeys required for the inner
!          * input rel.  However this routine hasn't got any info about which
           * order would be best, so we don't worry about that.
           *
           * It's also possible that the selected mergejoin clauses produce
!          * a noncanonical ordering of pathkeys for the inner side, ie, we
           * might select clauses that reference b.v1, b.v2, b.v1 in that
           * order.  This is not harmful in itself, though it suggests that
!          * the clauses are partially redundant.  Since the alternative is
!          * to omit mergejoin clauses and thereby possibly fail to generate a
!          * plan altogether, we live with it.  make_inner_pathkeys_for_merge()
!          * has to delete duplicates when it constructs the inner pathkeys
!          * list, and we also have to deal with such cases specially in
!          * create_mergejoin_plan().
           *----------
           */
          foreach(j, restrictinfos)
*************** find_mergeclauses_for_pathkeys(PlannerIn
*** 1064,1075 ****
              RestrictInfo *rinfo = (RestrictInfo *) lfirst(j);
              EquivalenceClass *clause_ec;

!             if (outer_keys)
!                 clause_ec = rinfo->outer_is_left ?
!                     rinfo->left_ec : rinfo->right_ec;
!             else
!                 clause_ec = rinfo->outer_is_left ?
!                     rinfo->right_ec : rinfo->left_ec;
              if (clause_ec == pathkey_ec)
                  matched_restrictinfos = lappend(matched_restrictinfos, rinfo);
          }
--- 1062,1069 ----
              RestrictInfo *rinfo = (RestrictInfo *) lfirst(j);
              EquivalenceClass *clause_ec;

!             clause_ec = rinfo->outer_is_left ?
!                 rinfo->left_ec : rinfo->right_ec;
              if (clause_ec == pathkey_ec)
                  matched_restrictinfos = lappend(matched_restrictinfos, rinfo);
          }
*************** make_inner_pathkeys_for_merge(PlannerInf
*** 1351,1358 ****
                                               opathkey->pk_nulls_first);

          /*
!          * Don't generate redundant pathkeys (can happen if multiple
!          * mergeclauses refer to same EC).
           */
          if (!pathkey_is_redundant(pathkey, pathkeys))
              pathkeys = lappend(pathkeys, pathkey);
--- 1345,1357 ----
                                               opathkey->pk_nulls_first);

          /*
!          * Don't generate redundant pathkeys (which can happen if multiple
!          * mergeclauses refer to the same EC).  Because we do this, the output
!          * pathkey list isn't necessarily ordered like the mergeclauses, which
!          * complicates life for create_mergejoin_plan().  But if we didn't,
!          * we'd have a noncanonical sort key list, which would be bad; for one
!          * reason, it certainly wouldn't match any available sort order for
!          * the input relation.
           */
          if (!pathkey_is_redundant(pathkey, pathkeys))
              pathkeys = lappend(pathkeys, pathkey);
*************** make_inner_pathkeys_for_merge(PlannerInf
*** 1361,1366 ****
--- 1360,1457 ----
      return pathkeys;
  }

+ /*
+  * trim_mergeclauses_for_inner_pathkeys
+  *      This routine trims a list of mergeclauses to include just those that
+  *      work with a specified ordering for the join's inner relation.
+  *
+  * 'mergeclauses' is a list of RestrictInfos for mergejoin clauses for the
+  *            join relation being formed, in an order known to work for the
+  *            currently-considered sort ordering of the join's outer rel.
+  * 'pathkeys' is a pathkeys list showing the ordering of an inner-rel path;
+  *            it should be equal to, or a truncation of, the result of
+  *            make_inner_pathkeys_for_merge for these mergeclauses.
+  *
+  * What we return will be a prefix of the given mergeclauses list.
+  *
+  * We need this logic because make_inner_pathkeys_for_merge's result isn't
+  * necessarily in the same order as the mergeclauses.  That means that if we
+  * consider an inner-rel pathkey list that is a truncation of that result,
+  * we might need to drop mergeclauses even though they match a surviving inner
+  * pathkey.  This happens when they are to the right of a mergeclause that
+  * matches a removed inner pathkey.
+  *
+  * The mergeclauses must be marked (via outer_is_left) to show which side
+  * of each clause is associated with the current outer path.  (See
+  * select_mergejoin_clauses())
+  */
+ List *
+ trim_mergeclauses_for_inner_pathkeys(PlannerInfo *root,
+                                      List *mergeclauses,
+                                      List *pathkeys)
+ {
+     List       *new_mergeclauses = NIL;
+     PathKey    *pathkey;
+     EquivalenceClass *pathkey_ec;
+     bool        matched_pathkey;
+     ListCell   *lip;
+     ListCell   *i;
+
+     /* No pathkeys => no mergeclauses (though we don't expect this case) */
+     if (pathkeys == NIL)
+         return NIL;
+     /* Initialize to consider first pathkey */
+     lip = list_head(pathkeys);
+     pathkey = (PathKey *) lfirst(lip);
+     pathkey_ec = pathkey->pk_eclass;
+     lip = lnext(lip);
+     matched_pathkey = false;
+
+     /* Scan mergeclauses to see how many we can use */
+     foreach(i, mergeclauses)
+     {
+         RestrictInfo *rinfo = (RestrictInfo *) lfirst(i);
+         EquivalenceClass *clause_ec;
+
+         /* Assume we needn't do update_mergeclause_eclasses again here */
+
+         /* Check clause's inner-rel EC against current pathkey */
+         clause_ec = rinfo->outer_is_left ?
+             rinfo->right_ec : rinfo->left_ec;
+
+         /* If we don't have a match, attempt to advance to next pathkey */
+         if (clause_ec != pathkey_ec)
+         {
+             /* If we had no clauses matching this inner pathkey, must stop */
+             if (!matched_pathkey)
+                 break;
+
+             /* Advance to next inner pathkey, if any */
+             if (lip == NULL)
+                 break;
+             pathkey = (PathKey *) lfirst(lip);
+             pathkey_ec = pathkey->pk_eclass;
+             lip = lnext(lip);
+             matched_pathkey = false;
+         }
+
+         /* If mergeclause matches current inner pathkey, we can use it */
+         if (clause_ec == pathkey_ec)
+         {
+             new_mergeclauses = lappend(new_mergeclauses, rinfo);
+             matched_pathkey = true;
+         }
+         else
+         {
+             /* Else, no hope of adding any more mergeclauses */
+             break;
+         }
+     }
+
+     return new_mergeclauses;
+ }
+
+
  /****************************************************************************
   *        PATHKEY USEFULNESS CHECKS
   *
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index da0cc7f..65ca47d 100644
*** a/src/backend/optimizer/plan/createplan.c
--- b/src/backend/optimizer/plan/createplan.c
*************** create_mergejoin_plan(PlannerInfo *root,
*** 3791,3796 ****
--- 3791,3798 ----
      Oid           *mergecollations;
      int           *mergestrategies;
      bool       *mergenullsfirst;
+     PathKey    *opathkey;
+     EquivalenceClass *opeclass;
      int            i;
      ListCell   *lc;
      ListCell   *lop;
*************** create_mergejoin_plan(PlannerInfo *root,
*** 3909,3915 ****
       * Compute the opfamily/collation/strategy/nullsfirst arrays needed by the
       * executor.  The information is in the pathkeys for the two inputs, but
       * we need to be careful about the possibility of mergeclauses sharing a
!      * pathkey (compare find_mergeclauses_for_pathkeys()).
       */
      nClauses = list_length(mergeclauses);
      Assert(nClauses == list_length(best_path->path_mergeclauses));
--- 3911,3918 ----
       * Compute the opfamily/collation/strategy/nullsfirst arrays needed by the
       * executor.  The information is in the pathkeys for the two inputs, but
       * we need to be careful about the possibility of mergeclauses sharing a
!      * pathkey, as well as the possibility that the inner pathkeys are not in
!      * an order matching the mergeclauses.
       */
      nClauses = list_length(mergeclauses);
      Assert(nClauses == list_length(best_path->path_mergeclauses));
*************** create_mergejoin_plan(PlannerInfo *root,
*** 3918,3923 ****
--- 3921,3928 ----
      mergestrategies = (int *) palloc(nClauses * sizeof(int));
      mergenullsfirst = (bool *) palloc(nClauses * sizeof(bool));

+     opathkey = NULL;
+     opeclass = NULL;
      lop = list_head(outerpathkeys);
      lip = list_head(innerpathkeys);
      i = 0;
*************** create_mergejoin_plan(PlannerInfo *root,
*** 3926,3936 ****
          RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
          EquivalenceClass *oeclass;
          EquivalenceClass *ieclass;
!         PathKey    *opathkey;
!         PathKey    *ipathkey;
!         EquivalenceClass *opeclass;
!         EquivalenceClass *ipeclass;
!         ListCell   *l2;

          /* fetch outer/inner eclass from mergeclause */
          if (rinfo->outer_is_left)
--- 3931,3939 ----
          RestrictInfo *rinfo = lfirst_node(RestrictInfo, lc);
          EquivalenceClass *oeclass;
          EquivalenceClass *ieclass;
!         PathKey    *ipathkey = NULL;
!         EquivalenceClass *ipeclass = NULL;
!         bool        first_inner_match = false;

          /* fetch outer/inner eclass from mergeclause */
          if (rinfo->outer_is_left)
*************** create_mergejoin_plan(PlannerInfo *root,
*** 3947,4050 ****
          Assert(ieclass != NULL);

          /*
!          * For debugging purposes, we check that the eclasses match the paths'
!          * pathkeys.  In typical cases the merge clauses are one-to-one with
!          * the pathkeys, but when dealing with partially redundant query
!          * conditions, we might have clauses that re-reference earlier path
!          * keys.  The case that we need to reject is where a pathkey is
!          * entirely skipped over.
           *
!          * lop and lip reference the first as-yet-unused pathkey elements;
!          * it's okay to match them, or any element before them.  If they're
!          * NULL then we have found all pathkey elements to be used.
           */
!         if (lop)
          {
              opathkey = (PathKey *) lfirst(lop);
              opeclass = opathkey->pk_eclass;
!             if (oeclass == opeclass)
!             {
!                 /* fast path for typical case */
!                 lop = lnext(lop);
!             }
!             else
!             {
!                 /* redundant clauses ... must match something before lop */
!                 foreach(l2, outerpathkeys)
!                 {
!                     if (l2 == lop)
!                         break;
!                     opathkey = (PathKey *) lfirst(l2);
!                     opeclass = opathkey->pk_eclass;
!                     if (oeclass == opeclass)
!                         break;
!                 }
!                 if (oeclass != opeclass)
!                     elog(ERROR, "outer pathkeys do not match mergeclauses");
!             }
!         }
!         else
!         {
!             /* redundant clauses ... must match some already-used pathkey */
!             opathkey = NULL;
!             opeclass = NULL;
!             foreach(l2, outerpathkeys)
!             {
!                 opathkey = (PathKey *) lfirst(l2);
!                 opeclass = opathkey->pk_eclass;
!                 if (oeclass == opeclass)
!                     break;
!             }
!             if (l2 == NULL)
                  elog(ERROR, "outer pathkeys do not match mergeclauses");
          }

          if (lip)
          {
              ipathkey = (PathKey *) lfirst(lip);
              ipeclass = ipathkey->pk_eclass;
              if (ieclass == ipeclass)
              {
!                 /* fast path for typical case */
                  lip = lnext(lip);
!             }
!             else
!             {
!                 /* redundant clauses ... must match something before lip */
!                 foreach(l2, innerpathkeys)
!                 {
!                     if (l2 == lip)
!                         break;
!                     ipathkey = (PathKey *) lfirst(l2);
!                     ipeclass = ipathkey->pk_eclass;
!                     if (ieclass == ipeclass)
!                         break;
!                 }
!                 if (ieclass != ipeclass)
!                     elog(ERROR, "inner pathkeys do not match mergeclauses");
              }
          }
!         else
          {
!             /* redundant clauses ... must match some already-used pathkey */
!             ipathkey = NULL;
!             ipeclass = NULL;
              foreach(l2, innerpathkeys)
              {
                  ipathkey = (PathKey *) lfirst(l2);
                  ipeclass = ipathkey->pk_eclass;
                  if (ieclass == ipeclass)
                      break;
              }
!             if (l2 == NULL)
                  elog(ERROR, "inner pathkeys do not match mergeclauses");
          }

!         /* pathkeys should match each other too (more debugging) */
          if (opathkey->pk_opfamily != ipathkey->pk_opfamily ||
!             opathkey->pk_eclass->ec_collation != ipathkey->pk_eclass->ec_collation ||
!             opathkey->pk_strategy != ipathkey->pk_strategy ||
!             opathkey->pk_nulls_first != ipathkey->pk_nulls_first)
              elog(ERROR, "left and right pathkeys do not match in mergejoin");

          /* OK, save info for executor */
--- 3950,4043 ----
          Assert(ieclass != NULL);

          /*
!          * We must identify the pathkey elements associated with this clause
!          * by matching the eclasses (which should give a unique match, since
!          * the pathkey lists should be canonical).  In typical cases the merge
!          * clauses are one-to-one with the pathkeys, but when dealing with
!          * partially redundant query conditions, things are more complicated.
           *
!          * lop and lip reference the first as-yet-unmatched pathkey elements.
!          * If they're NULL then we have found all pathkey elements to be
!          * matched.
!          *
!          * The ordering of the outer pathkeys should match the mergeclauses,
!          * by construction (see find_mergeclauses_for_outer_pathkeys()). There
!          * could be more than one mergeclause for the same outer path key, but
!          * no path key may be entirely skipped over.
           */
!         if (oeclass != opeclass)    /* multiple matches are not interesting */
          {
+             /* doesn't match the current opathkey, so must match the next */
+             if (lop == NULL)
+                 elog(ERROR, "outer pathkeys do not match mergeclauses");
              opathkey = (PathKey *) lfirst(lop);
              opeclass = opathkey->pk_eclass;
!             lop = lnext(lop);
!             if (oeclass != opeclass)
                  elog(ERROR, "outer pathkeys do not match mergeclauses");
          }

+         /*
+          * The inner pathkeys likewise should not have skipped-over keys, but
+          * it's possible for a mergeclause to reference some earlier inner
+          * pathkey if we had redundant pathkeys.  For example we might have
+          * mergeclauses like "o.a = i.x AND o.b = i.y AND o.c = i.x".  The
+          * implied inner ordering is then "ORDER BY x, y, x", but the pathkey
+          * drops the second sort by x as redundant, and this code must cope.
+          *
+          * It's also possible for the implied inner-rel ordering to be like
+          * "ORDER BY x, y, x DESC".  We still drop the second instance of x as
+          * redundant; but this means that the sort ordering of a redundant
+          * inner pathkey should not be considered significant.
+          */
          if (lip)
          {
              ipathkey = (PathKey *) lfirst(lip);
              ipeclass = ipathkey->pk_eclass;
              if (ieclass == ipeclass)
              {
!                 /* successful first match to this inner pathkey */
                  lip = lnext(lip);
!                 first_inner_match = true;
              }
          }
!         if (!first_inner_match)
          {
!             /* redundant clause ... must match something before lip */
!             ListCell   *l2;
!
              foreach(l2, innerpathkeys)
              {
+                 if (l2 == lip)
+                     break;
                  ipathkey = (PathKey *) lfirst(l2);
                  ipeclass = ipathkey->pk_eclass;
                  if (ieclass == ipeclass)
                      break;
              }
!             if (ieclass != ipeclass)
                  elog(ERROR, "inner pathkeys do not match mergeclauses");
          }

!         /*
!          * The pathkeys should always match each other as to opfamily and
!          * collation (which affect equality), but if we're considering a
!          * redundant inner pathkey, its sort ordering might not match.  In
!          * such cases we may ignore the inner pathkey's sort ordering and use
!          * the outer's.  (In effect, we're lying to the executor about the
!          * sort direction of this inner column, but it does not matter since
!          * the run-time row comparisons would only reach this column when
!          * there's equality for the earlier column containing the same eclass.
!          * IOW, there could be only one value in this column for the range of
!          * inner rows having a given value in the earlier column, so it does
!          * not matter in which direction we imagine those rows to be ordered.)
!          */
          if (opathkey->pk_opfamily != ipathkey->pk_opfamily ||
!             opathkey->pk_eclass->ec_collation != ipathkey->pk_eclass->ec_collation)
!             elog(ERROR, "left and right pathkeys do not match in mergejoin");
!         if (first_inner_match &&
!             (opathkey->pk_strategy != ipathkey->pk_strategy ||
!              opathkey->pk_nulls_first != ipathkey->pk_nulls_first))
              elog(ERROR, "left and right pathkeys do not match in mergejoin");

          /* OK, save info for executor */
diff --git a/src/include/optimizer/paths.h b/src/include/optimizer/paths.h
index c9e4431..b2b4353 100644
*** a/src/include/optimizer/paths.h
--- b/src/include/optimizer/paths.h
*************** extern void initialize_mergeclause_eclas
*** 216,231 ****
                                  RestrictInfo *restrictinfo);
  extern void update_mergeclause_eclasses(PlannerInfo *root,
                              RestrictInfo *restrictinfo);
! extern List *find_mergeclauses_for_pathkeys(PlannerInfo *root,
!                                List *pathkeys,
!                                bool outer_keys,
!                                List *restrictinfos);
  extern List *select_outer_pathkeys_for_merge(PlannerInfo *root,
                                  List *mergeclauses,
                                  RelOptInfo *joinrel);
  extern List *make_inner_pathkeys_for_merge(PlannerInfo *root,
                                List *mergeclauses,
                                List *outer_pathkeys);
  extern List *truncate_useless_pathkeys(PlannerInfo *root,
                            RelOptInfo *rel,
                            List *pathkeys);
--- 216,233 ----
                                  RestrictInfo *restrictinfo);
  extern void update_mergeclause_eclasses(PlannerInfo *root,
                              RestrictInfo *restrictinfo);
! extern List *find_mergeclauses_for_outer_pathkeys(PlannerInfo *root,
!                                      List *pathkeys,
!                                      List *restrictinfos);
  extern List *select_outer_pathkeys_for_merge(PlannerInfo *root,
                                  List *mergeclauses,
                                  RelOptInfo *joinrel);
  extern List *make_inner_pathkeys_for_merge(PlannerInfo *root,
                                List *mergeclauses,
                                List *outer_pathkeys);
+ extern List *trim_mergeclauses_for_inner_pathkeys(PlannerInfo *root,
+                                      List *mergeclauses,
+                                      List *pathkeys);
  extern List *truncate_useless_pathkeys(PlannerInfo *root,
                            RelOptInfo *rel,
                            List *pathkeys);
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index c50a206..4d5931d 100644
*** a/src/test/regress/expected/join.out
--- b/src/test/regress/expected/join.out
*************** where b.f1 = t.thousand and a.f1 = b.f1
*** 2291,2296 ****
--- 2291,2376 ----
  (0 rows)

  --
+ -- check a case where we formerly got confused by conflicting sort orders
+ -- in redundant merge join path keys
+ --
+ explain (costs off)
+ select * from
+   j1_tbl full join
+   (select * from j2_tbl order by j2_tbl.i desc, j2_tbl.k asc) j2_tbl
+   on j1_tbl.i = j2_tbl.i and j1_tbl.i = j2_tbl.k;
+                            QUERY PLAN
+ -----------------------------------------------------------------
+  Merge Full Join
+    Merge Cond: ((j2_tbl.i = j1_tbl.i) AND (j2_tbl.k = j1_tbl.i))
+    ->  Sort
+          Sort Key: j2_tbl.i DESC, j2_tbl.k
+          ->  Seq Scan on j2_tbl
+    ->  Sort
+          Sort Key: j1_tbl.i DESC
+          ->  Seq Scan on j1_tbl
+ (8 rows)
+
+ select * from
+   j1_tbl full join
+   (select * from j2_tbl order by j2_tbl.i desc, j2_tbl.k asc) j2_tbl
+   on j1_tbl.i = j2_tbl.i and j1_tbl.i = j2_tbl.k;
+  i | j |   t   | i | k
+ ---+---+-------+---+----
+    |   |       |   |  0
+    |   |       |   |
+    | 0 | zero  |   |
+    |   | null  |   |
+  8 | 8 | eight |   |
+  7 | 7 | seven |   |
+  6 | 6 | six   |   |
+    |   |       | 5 | -5
+    |   |       | 5 | -5
+  5 | 0 | five  |   |
+  4 | 1 | four  |   |
+    |   |       | 3 | -3
+  3 | 2 | three |   |
+  2 | 3 | two   | 2 |  2
+    |   |       | 2 |  4
+    |   |       | 1 | -1
+    |   |       | 0 |
+  1 | 4 | one   |   |
+  0 |   | zero  |   |
+ (19 rows)
+
+ --
+ -- a different check for handling of redundant sort keys in merge joins
+ --
+ explain (costs off)
+ select count(*) from
+   (select * from tenk1 x order by x.thousand, x.twothousand, x.fivethous) x
+   left join
+   (select * from tenk1 y order by y.unique2) y
+   on x.thousand = y.unique2 and x.twothousand = y.hundred and x.fivethous = y.unique2;
+                                     QUERY PLAN
+ ----------------------------------------------------------------------------------
+  Aggregate
+    ->  Merge Left Join
+          Merge Cond: (x.thousand = y.unique2)
+          Join Filter: ((x.twothousand = y.hundred) AND (x.fivethous = y.unique2))
+          ->  Sort
+                Sort Key: x.thousand, x.twothousand, x.fivethous
+                ->  Seq Scan on tenk1 x
+          ->  Materialize
+                ->  Index Scan using tenk1_unique2 on tenk1 y
+ (9 rows)
+
+ select count(*) from
+   (select * from tenk1 x order by x.thousand, x.twothousand, x.fivethous) x
+   left join
+   (select * from tenk1 y order by y.unique2) y
+   on x.thousand = y.unique2 and x.twothousand = y.hundred and x.fivethous = y.unique2;
+  count
+ -------
+  10000
+ (1 row)
+
+ --
  -- Clean up
  --
  DROP TABLE t1;
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index fc84237..30dfde2 100644
*** a/src/test/regress/sql/join.sql
--- b/src/test/regress/sql/join.sql
*************** select a.f1, b.f1, t.thousand, t.tenthou
*** 417,422 ****
--- 417,453 ----
    (select sum(f1) as f1 from int4_tbl i4b) b
  where b.f1 = t.thousand and a.f1 = b.f1 and (a.f1+b.f1+999) = t.tenthous;

+ --
+ -- check a case where we formerly got confused by conflicting sort orders
+ -- in redundant merge join path keys
+ --
+ explain (costs off)
+ select * from
+   j1_tbl full join
+   (select * from j2_tbl order by j2_tbl.i desc, j2_tbl.k asc) j2_tbl
+   on j1_tbl.i = j2_tbl.i and j1_tbl.i = j2_tbl.k;
+
+ select * from
+   j1_tbl full join
+   (select * from j2_tbl order by j2_tbl.i desc, j2_tbl.k asc) j2_tbl
+   on j1_tbl.i = j2_tbl.i and j1_tbl.i = j2_tbl.k;
+
+ --
+ -- a different check for handling of redundant sort keys in merge joins
+ --
+ explain (costs off)
+ select count(*) from
+   (select * from tenk1 x order by x.thousand, x.twothousand, x.fivethous) x
+   left join
+   (select * from tenk1 y order by y.unique2) y
+   on x.thousand = y.unique2 and x.twothousand = y.hundred and x.fivethous = y.unique2;
+
+ select count(*) from
+   (select * from tenk1 x order by x.thousand, x.twothousand, x.fivethous) x
+   left join
+   (select * from tenk1 y order by y.unique2) y
+   on x.thousand = y.unique2 and x.twothousand = y.hundred and x.fivethous = y.unique2;
+

  --
  -- Clean up

Re: ERROR: left and right pathkeys do not match in mergejoin

От
Alexander Kuzmenkov
Дата:
Ah, I understand now. We can lie to the executor about the order, 
because when we are moving based on the second outer column, we have a 
stretch of equal values in the inner column, so we can consider them to 
be sorted whatever way we need.

The patch applies cleanly, make check-world passes.

In create_mergejoin_plan:
     * implied inner ordering is then "ORDER BY x, y, x", but the pathkey
     * drops the second sort by x as redundant, and this code must cope.
-- should this read "the pathkey machinery drops"?

-- 
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: ERROR: left and right pathkeys do not match in mergejoin

От
Tom Lane
Дата:
Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> writes:
> In create_mergejoin_plan:
>      * implied inner ordering is then "ORDER BY x, y, x", but the pathkey
>      * drops the second sort by x as redundant, and this code must cope.
> -- should this read "the pathkey machinery drops"?

I changed that and improved some other comments, and pushed it.
Thanks for reviewing!

            regards, tom lane