Re: wrong query result due to wang plan

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: wrong query result due to wang plan
Дата
Msg-id 3313769.1676833273@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: wrong query result due to wang plan  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: wrong query result due to wang plan  (Richard Guo <guofenglinux@gmail.com>)
Список pgsql-hackers
I wrote:
> So that leads to a conclusion that we could just forget the whole
> thing and always use the syntactic qualscope here.  I tried that
> and it doesn't break any existing regression tests, which admittedly
> doesn't prove a lot in this area :-(.

Hmm, I thought it worked, but when I tried it again just now I see a
failure (well, a less efficient plan) for one of the recently added
test cases.  I also tried the idea of stripping off lower outer joins,
but that doesn't work in distribute_qual_to_rels, because we haven't
yet formed the SpecialJoinInfos for all the outer joins that are below
the top of the join domain.

So for now I agree with your solution in distribute_qual_to_rels.
It's about equivalent to what we did in older branches, and it's not
real clear that cases with pseudoconstant quals in lower join domains
are common enough to be worth sweating over.

We still have to fix process_implied_equality though, and in that
context we do have all the SpecialJoinInfos, so the strip-the-outer-joins
fix seems to work.  See attached.

            regards, tom lane

diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 9132ce235f..faabe4d065 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -1248,11 +1248,11 @@ generate_base_implied_equalities_no_const(PlannerInfo *root,
             /*
              * The expressions aren't constants, so the passed qualscope will
              * never be used to place the generated clause.  We just need to
-             * be sure it covers both expressions, so ec_relids will serve.
+             * be sure it covers both expressions, which em_relids should do.
              */
             rinfo = process_implied_equality(root, eq_op, ec->ec_collation,
                                              prev_em->em_expr, cur_em->em_expr,
-                                             ec->ec_relids,
+                                             cur_em->em_relids,
                                              ec->ec_min_security,
                                              false);

diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c
index fe4c8c63c9..8452720a18 100644
--- a/src/backend/optimizer/plan/initsplan.c
+++ b/src/backend/optimizer/plan/initsplan.c
@@ -125,6 +125,7 @@ static void distribute_qual_to_rels(PlannerInfo *root, Node *clause,
                                     bool is_clone,
                                     List **postponed_oj_qual_list);
 static bool check_redundant_nullability_qual(PlannerInfo *root, Node *clause);
+static Relids get_join_domain_min_rels(PlannerInfo *root, Relids domain_relids);
 static void check_mergejoinable(RestrictInfo *restrictinfo);
 static void check_hashjoinable(RestrictInfo *restrictinfo);
 static void check_memoizable(RestrictInfo *restrictinfo);
@@ -2250,8 +2251,7 @@ distribute_qual_to_rels(PlannerInfo *root, Node *clause,
      * RestrictInfo lists for the moment, but eventually createplan.c will
      * pull it out and make a gating Result node immediately above whatever
      * plan node the pseudoconstant clause is assigned to.  It's usually best
-     * to put a gating node as high in the plan tree as possible, which we can
-     * do by assigning it the full relid set of the current JoinDomain.
+     * to put a gating node as high in the plan tree as possible.
      */
     if (bms_is_empty(relids))
     {
@@ -2269,8 +2269,19 @@ distribute_qual_to_rels(PlannerInfo *root, Node *clause,
         }
         else
         {
-            /* eval at join domain level */
-            relids = bms_copy(jtitem->jdomain->jd_relids);
+            /*
+             * If we are in the top-level join domain, we can push the qual to
+             * the top of the plan tree.  Otherwise, be conservative and eval
+             * it at original syntactic level.  (Ideally we'd push it to the
+             * top of the current join domain in all cases, but that causes
+             * problems if we later rearrange outer-join evaluation order.
+             * Pseudoconstant quals below the top level are a pretty odd case,
+             * so it's not clear that it's worth working hard on.)
+             */
+            if (jtitem->jdomain == (JoinDomain *) linitial(root->join_domains))
+                relids = bms_copy(jtitem->jdomain->jd_relids);
+            else
+                relids = bms_copy(qualscope);
             /* mark as gating qual */
             pseudoconstant = true;
             /* tell createplan.c to check for gating quals */
@@ -2734,12 +2745,13 @@ process_implied_equality(PlannerInfo *root,
     /*
      * If the clause is variable-free, our normal heuristic for pushing it
      * down to just the mentioned rels doesn't work, because there are none.
-     * Apply it as a gating qual at the given qualscope.
+     * Apply it as a gating qual at the appropriate level (see comments for
+     * get_join_domain_min_rels).
      */
     if (bms_is_empty(relids))
     {
-        /* eval at join domain level */
-        relids = bms_copy(qualscope);
+        /* eval at join domain's safe level */
+        relids = get_join_domain_min_rels(root, qualscope);
         /* mark as gating qual */
         pseudoconstant = true;
         /* tell createplan.c to check for gating quals */
@@ -2856,6 +2868,45 @@ build_implied_join_equality(PlannerInfo *root,
     return restrictinfo;
 }

+/*
+ * get_join_domain_min_rels
+ *      Remove any lower outer joins from a JoinDomain's relid set.
+ *
+ * When we derive a pseudoconstant (Var-free) clause from an EquivalenceClass,
+ * we'd ideally apply the clause at the top level of the EC's join domain.
+ * However, if there are any outer joins inside that domain that get commuted
+ * with joins outside it, that leads to not finding a correct place to apply
+ * the clause.  Instead, remove any lower outer joins from the relid set,
+ * and apply the clause to just the remaining rels.  This still results in a
+ * correct answer, since if the clause produces FALSE then the LHS of these
+ * joins will be empty leading to an empty join result.
+ *
+ * Note: it's tempting to use this in distribute_qual_to_rels where it's
+ * dealing with pseudoconstant quals; but we can't because the necessary
+ * SpecialJoinInfos aren't all formed at that point.
+ *
+ * The result is always freshly palloc'd; we do not modify domain_relids.
+ */
+static Relids
+get_join_domain_min_rels(PlannerInfo *root, Relids domain_relids)
+{
+    Relids        result = bms_copy(domain_relids);
+    ListCell   *lc;
+
+    foreach(lc, root->join_info_list)
+    {
+        SpecialJoinInfo *sjinfo = (SpecialJoinInfo *) lfirst(lc);
+
+        if (sjinfo->jointype == JOIN_LEFT &&
+            bms_is_member(sjinfo->ojrelid, result))
+        {
+            result = bms_del_member(result, sjinfo->ojrelid);
+            result = bms_del_members(result, sjinfo->syn_righthand);
+        }
+    }
+    return result;
+}
+

 /*
  * match_foreign_keys_to_quals
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 16318d9da2..5a2756b333 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -5119,6 +5119,67 @@ from int8_tbl t1
          ->  Seq Scan on onek t4
 (13 rows)

+-- More tests of correct placement of pseudoconstant quals
+-- simple constant-false condition
+explain (costs off)
+select * from int8_tbl t1 left join
+  (int8_tbl t2 inner join int8_tbl t3 on false
+   left join int8_tbl t4 on t2.q2 = t4.q2)
+on t1.q1 = t2.q1;
+              QUERY PLAN
+--------------------------------------
+ Hash Left Join
+   Hash Cond: (t1.q1 = q1)
+   ->  Seq Scan on int8_tbl t1
+   ->  Hash
+         ->  Result
+               One-Time Filter: false
+(6 rows)
+
+-- deduce constant-false from an EquivalenceClass
+explain (costs off)
+select * from int8_tbl t1 left join
+  (int8_tbl t2 inner join int8_tbl t3 on (t2.q1-t3.q2) = 0 and (t2.q1-t3.q2) = 1
+   left join int8_tbl t4 on t2.q2 = t4.q2)
+on t1.q1 = t2.q1;
+              QUERY PLAN
+--------------------------------------
+ Hash Left Join
+   Hash Cond: (t1.q1 = q1)
+   ->  Seq Scan on int8_tbl t1
+   ->  Hash
+         ->  Result
+               One-Time Filter: false
+(6 rows)
+
+-- pseudoconstant based on an outer-level Param
+explain (costs off)
+select exists(
+  select * from int8_tbl t1 left join
+    (int8_tbl t2 inner join int8_tbl t3 on x0.f1 = 1
+     left join int8_tbl t4 on t2.q2 = t4.q2)
+  on t1.q1 = t2.q1
+) from int4_tbl x0;
+                             QUERY PLAN
+---------------------------------------------------------------------
+ Seq Scan on int4_tbl x0
+   SubPlan 1
+     ->  Nested Loop Left Join
+           Join Filter: (t2.q2 = t4.q2)
+           ->  Nested Loop Left Join
+                 Join Filter: (t1.q1 = t2.q1)
+                 ->  Seq Scan on int8_tbl t1
+                 ->  Materialize
+                       ->  Result
+                             One-Time Filter: (x0.f1 = 1)
+                             ->  Nested Loop
+                                   ->  Seq Scan on int8_tbl t2
+                                   ->  Materialize
+                                         ->  Seq Scan on int8_tbl t3
+           ->  Materialize
+                 ->  Seq Scan on int8_tbl t4
+(16 rows)
+
 -- check that join removal works for a left join when joining a subquery
 -- that is guaranteed to be unique by its GROUP BY clause
 explain (costs off)
diff --git a/src/test/regress/sql/join.sql b/src/test/regress/sql/join.sql
index 2b2e094955..a38034bce7 100644
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@@ -1846,6 +1846,31 @@ from int8_tbl t1
   left join onek t4
     on t2.q2 < t3.unique2;

+-- More tests of correct placement of pseudoconstant quals
+
+-- simple constant-false condition
+explain (costs off)
+select * from int8_tbl t1 left join
+  (int8_tbl t2 inner join int8_tbl t3 on false
+   left join int8_tbl t4 on t2.q2 = t4.q2)
+on t1.q1 = t2.q1;
+
+-- deduce constant-false from an EquivalenceClass
+explain (costs off)
+select * from int8_tbl t1 left join
+  (int8_tbl t2 inner join int8_tbl t3 on (t2.q1-t3.q2) = 0 and (t2.q1-t3.q2) = 1
+   left join int8_tbl t4 on t2.q2 = t4.q2)
+on t1.q1 = t2.q1;
+
+-- pseudoconstant based on an outer-level Param
+explain (costs off)
+select exists(
+  select * from int8_tbl t1 left join
+    (int8_tbl t2 inner join int8_tbl t3 on x0.f1 = 1
+     left join int8_tbl t4 on t2.q2 = t4.q2)
+  on t1.q1 = t2.q1
+) from int4_tbl x0;
+
 -- check that join removal works for a left join when joining a subquery
 -- that is guaranteed to be unique by its GROUP BY clause
 explain (costs off)

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Add LZ4 compression in pg_dump
Следующее
От: Justin Pryzby
Дата:
Сообщение: Re: Add LZ4 compression in pg_dump