Обсуждение: Disallow pullup of a subquery with a subquery in its targetlist?

Поиск
Список
Период
Сортировка

Disallow pullup of a subquery with a subquery in its targetlist?

От
Tom Lane
Дата:
Back at
http://www.postgresql.org/message-id/520D221E.2060008@gmail.com
there was a complaint about strange behavior of a query that looks
basically like this:

SELECT ...
FROM    (SELECT ... ,            ( SELECT ...                ORDER BY random()                LIMIT 1            ) AS
color_id    FROM ...    ) s    LEFT JOIN ...
 

The problem is that the planner decides it can "pull up" the subquery s,
or flatten it into the outer query.  This entails substituting the
subqury's targetlist expressions for outer-query Vars referencing s,
and there's more than one reference to s.color_id.  So we get multiple
copies of the inner subquery, and they will produce different results
at runtime due to the use of random().  This results in inconsistent
behavior.

We decided long ago that we should forbid pullup of subqueries that
contain volatile functions in their targetlists, because of what's
basically the same hazard: you might get more evaluations of the
volatile functions than you expected, yielding inconsistent results
and/or unwanted side-effects.

I first wondered why the instance of random() didn't prevent pullup
in this example.  That's because contain_volatile_functions() does
not recurse into SubLinks, which maybe is the wrong thing; but
I'm hesitant to change it without detailed analysis of all the
(many) call sites.

However, I think that a good case could also be made for fixing this
by deciding that we shouldn't pull up if there are SubLinks in the
subquery targetlist, period.  Even without any volatile functions,
multiple copies of a subquery seem like a probable loser cost-wise.

Thoughts?  If we do change this, should we back-patch it?
        regards, tom lane



Re: Disallow pullup of a subquery with a subquery in its targetlist?

От
Tom Lane
Дата:
I wrote:
> Back at
> http://www.postgresql.org/message-id/520D221E.2060008@gmail.com
> there was a complaint about strange behavior of a query that looks
> basically like this:

> SELECT ...
> FROM
>      (SELECT ... ,
>              ( SELECT ...
>                  ORDER BY random()
>                  LIMIT 1
>              ) AS color_id
>       FROM ...
>      ) s
>      LEFT JOIN ...

> ...

> I first wondered why the instance of random() didn't prevent pullup
> in this example.  That's because contain_volatile_functions() does
> not recurse into SubLinks, which maybe is the wrong thing; but
> I'm hesitant to change it without detailed analysis of all the
> (many) call sites.

> However, I think that a good case could also be made for fixing this
> by deciding that we shouldn't pull up if there are SubLinks in the
> subquery targetlist, period.  Even without any volatile functions,
> multiple copies of a subquery seem like a probable loser cost-wise.

I experimented with the latter behavior and decided it was too invasive;
in particular, it changes the plans chosen for some queries in the
regression tests, and I think it's actually breaking those tests, in the
sense that the queries no longer exercise the planner code paths they
were designed to test.

So I went back to the first idea of allowing contain_volatile_functions()
to recurse into sub-selects.  It turns out that this is not as big a deal
as I feared, because recursing into SubLinks will only affect behavior
before the planner has converted SubLinks to SubPlans, and most of the
existing calls to contain_volatile_functions/contain_mutable_functions
are after that and so don't need individual analysis.  I've pretty well
convinced myself that looking into sub-selects is a good idea in the
places that examine volatility before that.  Accordingly, I propose the
attached patch, which fixes the complained-of behavior but doesn't
change any existing regression test results.

            regards, tom lane

diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c
index 4fa73a9..add29f5 100644
*** a/src/backend/optimizer/util/clauses.c
--- b/src/backend/optimizer/util/clauses.c
*************** contain_subplans_walker(Node *node, void
*** 833,840 ****
   * mistakenly think that something like "WHERE random() < 0.5" can be treated
   * as a constant qualification.
   *
!  * XXX we do not examine sub-selects to see if they contain uses of
!  * mutable functions.  It's not real clear if that is correct or not...
   */
  bool
  contain_mutable_functions(Node *clause)
--- 833,840 ----
   * mistakenly think that something like "WHERE random() < 0.5" can be treated
   * as a constant qualification.
   *
!  * We will recursively look into Query nodes (i.e., SubLink sub-selects)
!  * but not into SubPlans.  See comments for contain_volatile_functions().
   */
  bool
  contain_mutable_functions(Node *clause)
*************** contain_mutable_functions_walker(Node *n
*** 931,936 ****
--- 931,943 ----
          }
          /* else fall through to check args */
      }
+     else if (IsA(node, Query))
+     {
+         /* Recurse into subselects */
+         return query_tree_walker((Query *) node,
+                                  contain_mutable_functions_walker,
+                                  context, 0);
+     }
      return expression_tree_walker(node, contain_mutable_functions_walker,
                                    context);
  }
*************** contain_mutable_functions_walker(Node *n
*** 945,955 ****
   *      Recursively search for volatile functions within a clause.
   *
   * Returns true if any volatile function (or operator implemented by a
!  * volatile function) is found. This test prevents invalid conversions
!  * of volatile expressions into indexscan quals.
   *
!  * XXX we do not examine sub-selects to see if they contain uses of
!  * volatile functions.    It's not real clear if that is correct or not...
   */
  bool
  contain_volatile_functions(Node *clause)
--- 952,969 ----
   *      Recursively search for volatile functions within a clause.
   *
   * Returns true if any volatile function (or operator implemented by a
!  * volatile function) is found. This test prevents, for example,
!  * invalid conversions of volatile expressions into indexscan quals.
   *
!  * We will recursively look into Query nodes (i.e., SubLink sub-selects)
!  * but not into SubPlans.  This is a bit odd, but intentional.    If we are
!  * looking at a SubLink, we are probably deciding whether a query tree
!  * transformation is safe, and a contained sub-select should affect that;
!  * for example, duplicating a sub-select containing a volatile function
!  * would be bad.  However, once we've got to the stage of having SubPlans,
!  * subsequent planning need not consider volatility within those, since
!  * the executor won't change its evaluation rules for a SubPlan based on
!  * volatility.
   */
  bool
  contain_volatile_functions(Node *clause)
*************** contain_volatile_functions_walker(Node *
*** 1047,1052 ****
--- 1061,1073 ----
          }
          /* else fall through to check args */
      }
+     else if (IsA(node, Query))
+     {
+         /* Recurse into subselects */
+         return query_tree_walker((Query *) node,
+                                  contain_volatile_functions_walker,
+                                  context, 0);
+     }
      return expression_tree_walker(node, contain_volatile_functions_walker,
                                    context);
  }