Обсуждение: Avoiding deeply nested AND/OR trees in the parser

Поиск
Список
Период
Сортировка

Avoiding deeply nested AND/OR trees in the parser

От
Tom Lane
Дата:
Over in
http://www.postgresql.org/message-id/BAY176-W382A9DE827EBC8E602B7BBC5860@phx.gbl
there's a complaint about getting "stack depth limit exceeded" from a
query of the form

select count(*) from gui_die_summary where (x_coord, y_coord) in
((25,5),(41,13),(25,7),(28,3),(25,8),(34,7),(26,6),(21,10), ...);

when there's a few thousand pairs in the IN list.  The reason for this
is that transformAExprIn() generates a tree of nested OR expressions,
and then recursion in assign_collations_walker() runs out of stack space.
(If assign_collations_walker() hadn't failed, something else probably
would later, so it's wrong to blame that function in particular.)

I reproduced this problem in the regression database using generated
queries like so:

print "explain select * from tenk1 where (thousand, tenthous) in (\n";
for ($i = 0; $i < 10000; $i++) {
  print ",\n" if $i > 0;
  print "($i,$i)";
}
print ");\n";

On my machine, HEAD fails at about 9000 pairs with this test case.

While I commented in the pgsql-novice thread that there are better ways
to do this, it still seems like a limitation we probably ought to fix
if it's not too hard.  In the case of transformAExprIn, we could generate
an N-argument OR or AND node instead of a nest; this is already done
for example in make_row_comparison_op().  The attached quick-hack patch
does so, and I verified that the system could handle a million pairs
with this in place.  (It takes about 20 seconds and 20GB of memory to
plan such a case, so I still say it's a bad idea, but at least we can
do it.)  There is similar code in make_row_distinct_op(), which perhaps
ought to be fixed as well.

However, this isn't exactly the end of the story, because if you were to
dump out a view generated from a query like this, it would contain a long
chain of OR clauses, which would mean that reloading the view would put
you right back in stack overflow territory.  Really if we wanted to fix
this issue we'd need to hack up transformAExprAnd/transformAExprOr so that
they recognized nested ANDs/ORs and flattened them on the spot.  This
might not be a bad idea, but it's starting to look like more than a quick
hack patch.

Does this seem worth pursuing, or shall we leave it alone?

            regards, tom lane

diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 81c9338..9550bd1 100644
*** a/src/backend/parser/parse_expr.c
--- b/src/backend/parser/parse_expr.c
*************** transformAExprOf(ParseState *pstate, A_E
*** 1088,1094 ****
  static Node *
  transformAExprIn(ParseState *pstate, A_Expr *a)
  {
!     Node       *result = NULL;
      Node       *lexpr;
      List       *rexprs;
      List       *rvars;
--- 1088,1095 ----
  static Node *
  transformAExprIn(ParseState *pstate, A_Expr *a)
  {
!     Node       *result;
!     List       *cmpexprs = NIL;
      Node       *lexpr;
      List       *rexprs;
      List       *rvars;
*************** transformAExprIn(ParseState *pstate, A_E
*** 1166,1171 ****
--- 1167,1173 ----
               */
              List       *aexprs;
              ArrayExpr  *newa;
+             Node       *cmp;

              aexprs = NIL;
              foreach(l, rnonvars)
*************** transformAExprIn(ParseState *pstate, A_E
*** 1185,1196 ****
              newa->multidims = false;
              newa->location = -1;

!             result = (Node *) make_scalar_array_op(pstate,
!                                                    a->name,
!                                                    useOr,
!                                                    lexpr,
!                                                    (Node *) newa,
!                                                    a->location);

              /* Consider only the Vars (if any) in the loop below */
              rexprs = rvars;
--- 1187,1202 ----
              newa->multidims = false;
              newa->location = -1;

!             cmp = (Node *) make_scalar_array_op(pstate,
!                                                 a->name,
!                                                 useOr,
!                                                 lexpr,
!                                                 (Node *) newa,
!                                                 a->location);
!
!             /* cmp certainly yields boolean, no need to check it */
!
!             cmpexprs = lappend(cmpexprs, cmp);

              /* Consider only the Vars (if any) in the loop below */
              rexprs = rvars;
*************** transformAExprIn(ParseState *pstate, A_E
*** 1198,1204 ****
      }

      /*
!      * Must do it the hard way, ie, with a boolean expression tree.
       */
      foreach(l, rexprs)
      {
--- 1204,1210 ----
      }

      /*
!      * Any remaining righthand exprs need to be compared individually.
       */
      foreach(l, rexprs)
      {
*************** transformAExprIn(ParseState *pstate, A_E
*** 1226,1239 ****
          }

          cmp = coerce_to_boolean(pstate, cmp, "IN");
!         if (result == NULL)
!             result = cmp;
!         else
!             result = (Node *) makeBoolExpr(useOr ? OR_EXPR : AND_EXPR,
!                                            list_make2(result, cmp),
!                                            a->location);
      }

      return result;
  }

--- 1232,1253 ----
          }

          cmp = coerce_to_boolean(pstate, cmp, "IN");
!
!         cmpexprs = lappend(cmpexprs, cmp);
      }

+     /*
+      * If we have more than one comparison expression, AND or OR them together
+      */
+     if (cmpexprs == NIL)
+         result = NULL;            /* can this happen? is it right if so? */
+     else if (list_length(cmpexprs) == 1)
+         result = (Node *) linitial(cmpexprs);
+     else
+         result = (Node *) makeBoolExpr(useOr ? OR_EXPR : AND_EXPR,
+                                        cmpexprs,
+                                        a->location);
+
      return result;
  }


Re: Avoiding deeply nested AND/OR trees in the parser

От
Noah Misch
Дата:
On Tue, Feb 25, 2014 at 01:15:09PM -0500, Tom Lane wrote:
> Over in
> http://www.postgresql.org/message-id/BAY176-W382A9DE827EBC8E602B7BBC5860@phx.gbl
> there's a complaint about getting "stack depth limit exceeded" from a
> query of the form
> 
> select count(*) from gui_die_summary where (x_coord, y_coord) in
> ((25,5),(41,13),(25,7),(28,3),(25,8),(34,7),(26,6),(21,10), ...);
> 
> when there's a few thousand pairs in the IN list.  The reason for this
> is that transformAExprIn() generates a tree of nested OR expressions,
> and then recursion in assign_collations_walker() runs out of stack space.

> Really if we wanted to fix
> this issue we'd need to hack up transformAExprAnd/transformAExprOr so that
> they recognized nested ANDs/ORs and flattened them on the spot.  This
> might not be a bad idea, but it's starting to look like more than a quick
> hack patch.

Reminds me of this work:
http://www.postgresql.org/message-id/flat/CABwTF4XJKN1smMjHv_O-QzTpokqSjHBouMWVw-E8kyb2bC=_wg@mail.gmail.com
http://www.postgresql.org/message-id/flat/CAFj8pRDd9QTyoY0cbPoODR-hfj1xaMBuxWOxAZg0kyVtVaunkw@mail.gmail.com

> Does this seem worth pursuing, or shall we leave it alone?

+1 for fixing.  Extrapolating from your figure of 20s and 20 GiB for a million
coordinate pairs, we'd have tolerable performance at 20000 pairs instead of
just failing as we do today.  That's a nice win all by itself.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: Avoiding deeply nested AND/OR trees in the parser

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> On Tue, Feb 25, 2014 at 01:15:09PM -0500, Tom Lane wrote:
>> Really if we wanted to fix
>> this issue we'd need to hack up transformAExprAnd/transformAExprOr so that
>> they recognized nested ANDs/ORs and flattened them on the spot.  This
>> might not be a bad idea, but it's starting to look like more than a quick
>> hack patch.

> Reminds me of this work:
> http://www.postgresql.org/message-id/flat/CABwTF4XJKN1smMjHv_O-QzTpokqSjHBouMWVw-E8kyb2bC=_wg@mail.gmail.com
> http://www.postgresql.org/message-id/flat/CAFj8pRDd9QTyoY0cbPoODR-hfj1xaMBuxWOxAZg0kyVtVaunkw@mail.gmail.com

Oh, I'd forgotten about that thread.  I never particularly liked the patch
as presented: like Robert, I thought it far too complicated.  My
inclination would just be to tweak the parser enough so that a simple list
of ANDs or ORs (ie, a left-deep raw parse tree) gets flattened.

The most likely bet for making that happen in an uncomplicated way would
be to alter gram.y's processing: if we had the productions for AND/OR
notice whether their left inputs were already AND/OR clauses, they could
extend the argument lists instead of building nested clauses.  The reason
the proposed patch is so complicated is it's trying to avoid recursing
while handling a fundamentally recursive data structure, and that's just
the hard way to do it.

We do need to look at whether there are any implications for ruleutils
and other places, though.
        regards, tom lane



Re: Avoiding deeply nested AND/OR trees in the parser

От
Ashutosh Bapat
Дата:



On Thu, Feb 27, 2014 at 8:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Noah Misch <noah@leadboat.com> writes:
> On Tue, Feb 25, 2014 at 01:15:09PM -0500, Tom Lane wrote:
>> Really if we wanted to fix
>> this issue we'd need to hack up transformAExprAnd/transformAExprOr so that
>> they recognized nested ANDs/ORs and flattened them on the spot.  This
>> might not be a bad idea, but it's starting to look like more than a quick
>> hack patch.

> Reminds me of this work:
> http://www.postgresql.org/message-id/flat/CABwTF4XJKN1smMjHv_O-QzTpokqSjHBouMWVw-E8kyb2bC=_wg@mail.gmail.com
> http://www.postgresql.org/message-id/flat/CAFj8pRDd9QTyoY0cbPoODR-hfj1xaMBuxWOxAZg0kyVtVaunkw@mail.gmail.com

Oh, I'd forgotten about that thread.  I never particularly liked the patch
as presented: like Robert, I thought it far too complicated.  My
inclination would just be to tweak the parser enough so that a simple list
of ANDs or ORs (ie, a left-deep raw parse tree) gets flattened.

The most likely bet for making that happen in an uncomplicated way would
be to alter gram.y's processing: if we had the productions for AND/OR
notice whether their left inputs were already AND/OR clauses, they could
extend the argument lists instead of building nested clauses.  The reason
the proposed patch is so complicated is it's trying to avoid recursing
while handling a fundamentally recursive data structure, and that's just
the hard way to do it.

We do need to look at whether there are any implications for ruleutils
and other places, though.

ruleutils should be fine. See code below in ruleutils.c

6615                     case AND_EXPR:
6616                         if (!PRETTY_PAREN(context))
6617                             appendStringInfoChar(buf, '(');
6618                         get_rule_expr_paren(first_arg, context,
6619                                             false, node);
6620                         while (arg)
6621                         {
6622                             appendStringInfoString(buf, " AND ");
6623                             get_rule_expr_paren((Node *) lfirst(arg), context,
6624                                                 false, node);
6625                             arg = lnext(arg);
6626                         }
6627                         if (!PRETTY_PAREN(context))
6628                             appendStringInfoChar(buf, ')');
6629                         break;

Similar code exists for OR_EXPR.

Within the planner, I have seen the quals are always implicitly ANDed lists, where all ANDs are put into a single list. May be same with OR.

As a side note, the code blocks for AND_EXPR and OR_EXPR are almost same except words "AND" and "OR". So there is some chance to get rid of some code duplication here.

                        regards, tom lane


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Avoiding deeply nested AND/OR trees in the parser

От
Bruce Momjian
Дата:
On Wed, Feb 26, 2014 at 10:21:03PM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Tue, Feb 25, 2014 at 01:15:09PM -0500, Tom Lane wrote:
> >> Really if we wanted to fix
> >> this issue we'd need to hack up transformAExprAnd/transformAExprOr so that
> >> they recognized nested ANDs/ORs and flattened them on the spot.  This
> >> might not be a bad idea, but it's starting to look like more than a quick
> >> hack patch.
> 
> > Reminds me of this work:
> > http://www.postgresql.org/message-id/flat/CABwTF4XJKN1smMjHv_O-QzTpokqSjHBouMWVw-E8kyb2bC=_wg@mail.gmail.com
> > http://www.postgresql.org/message-id/flat/CAFj8pRDd9QTyoY0cbPoODR-hfj1xaMBuxWOxAZg0kyVtVaunkw@mail.gmail.com
> 
> Oh, I'd forgotten about that thread.  I never particularly liked the patch
> as presented: like Robert, I thought it far too complicated.  My
> inclination would just be to tweak the parser enough so that a simple list
> of ANDs or ORs (ie, a left-deep raw parse tree) gets flattened.
> 
> The most likely bet for making that happen in an uncomplicated way would
> be to alter gram.y's processing: if we had the productions for AND/OR
> notice whether their left inputs were already AND/OR clauses, they could
> extend the argument lists instead of building nested clauses.  The reason
> the proposed patch is so complicated is it's trying to avoid recursing
> while handling a fundamentally recursive data structure, and that's just
> the hard way to do it.
> 
> We do need to look at whether there are any implications for ruleutils
> and other places, though.

Where are we on this?  Is it being kept for 9.5?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Avoiding deeply nested AND/OR trees in the parser

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> On Wed, Feb 26, 2014 at 10:21:03PM -0500, Tom Lane wrote:
>> Oh, I'd forgotten about that thread.  I never particularly liked the patch
>> as presented: like Robert, I thought it far too complicated.  My
>> inclination would just be to tweak the parser enough so that a simple list
>> of ANDs or ORs (ie, a left-deep raw parse tree) gets flattened.
>> 
>> The most likely bet for making that happen in an uncomplicated way would
>> be to alter gram.y's processing: if we had the productions for AND/OR
>> notice whether their left inputs were already AND/OR clauses, they could
>> extend the argument lists instead of building nested clauses.  The reason
>> the proposed patch is so complicated is it's trying to avoid recursing
>> while handling a fundamentally recursive data structure, and that's just
>> the hard way to do it.
>> 
>> We do need to look at whether there are any implications for ruleutils
>> and other places, though.

> Where are we on this?  Is it being kept for 9.5?

I think we rejected the patch-as-presented, and no one's bothered to
create a new one, which suggests that the problem isn't all that
important ...

I suspect the gram.y change I suggest above would be about a ten-line
patch.  What makes it less than completely trivial is the need to chase
down all the downstream implications, such as whether ruleutils would
need any work, and whether anything else is expecting parser output
to contain only binary clauses.
        regards, tom lane



Re: Avoiding deeply nested AND/OR trees in the parser

От
Bruce Momjian
Дата:
On Sat, Apr 19, 2014 at 05:50:17PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Wed, Feb 26, 2014 at 10:21:03PM -0500, Tom Lane wrote:
> >> Oh, I'd forgotten about that thread.  I never particularly liked the patch
> >> as presented: like Robert, I thought it far too complicated.  My
> >> inclination would just be to tweak the parser enough so that a simple list
> >> of ANDs or ORs (ie, a left-deep raw parse tree) gets flattened.
> >> 
> >> The most likely bet for making that happen in an uncomplicated way would
> >> be to alter gram.y's processing: if we had the productions for AND/OR
> >> notice whether their left inputs were already AND/OR clauses, they could
> >> extend the argument lists instead of building nested clauses.  The reason
> >> the proposed patch is so complicated is it's trying to avoid recursing
> >> while handling a fundamentally recursive data structure, and that's just
> >> the hard way to do it.
> >> 
> >> We do need to look at whether there are any implications for ruleutils
> >> and other places, though.
> 
> > Where are we on this?  Is it being kept for 9.5?
> 
> I think we rejected the patch-as-presented, and no one's bothered to
> create a new one, which suggests that the problem isn't all that
> important ...
> 
> I suspect the gram.y change I suggest above would be about a ten-line
> patch.  What makes it less than completely trivial is the need to chase
> down all the downstream implications, such as whether ruleutils would
> need any work, and whether anything else is expecting parser output
> to contain only binary clauses.

OK, thanks for the feedback.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +