Обсуждение: OOM on EXPLAIN with lots of nodes

Поиск
Список
Период
Сортировка

OOM on EXPLAIN with lots of nodes

От
Alexey Bashtanov
Дата:
Hello!

I found that EXPLAIN command takes very much memory to execute when huge 
unions are used.
For example the following sql
-- begin sql
create table t (a000 int, a001 int, ... a099 int);
explain select * from (    select a001 a from t    union all    select a001 a from t    union all    ... (1000 times)
...   union all    select a001 a from t
 
) _ where a = 1;
-- end sql
took more than 1GB of memory to execute.

Namely converting of the plan to a human-readable form causes excessive 
memory usage, not planning itself.

By varying the parameters and reading source code I determined that
memory usage linearly depends on (plan nodes count)*(overall columns 
count), thus it quadratically depends on number of tables unionized.

To remove this excessive memory usage I propose
to run deparse_context_for_planstate+deparse_expression in a separate 
memory context and free it after a plan node is generated.

Any reasons to treat this idea as bad?

BTW in this case explain execution is also quite long (I got tens of 
seconds). But I have no immediate ideas how to improve it.

Regards,  Alexey Bashtanov



Re: OOM on EXPLAIN with lots of nodes

От
Heikki Linnakangas
Дата:
On 01/13/2015 02:08 PM, Alexey Bashtanov wrote:
> I found that EXPLAIN command takes very much memory to execute when huge
> unions are used.
> For example the following sql
> -- begin sql
> create table t (a000 int, a001 int, ... a099 int);
> explain select * from (
>       select a001 a from t
>       union all
>       select a001 a from t
>       union all
>       ... (1000 times) ...
>       union all
>       select a001 a from t
> ) _ where a = 1;
> -- end sql
> took more than 1GB of memory to execute.
>
> Namely converting of the plan to a human-readable form causes excessive
> memory usage, not planning itself.
>
> By varying the parameters and reading source code I determined that
> memory usage linearly depends on (plan nodes count)*(overall columns
> count), thus it quadratically depends on number of tables unionized.
>
> To remove this excessive memory usage I propose
> to run deparse_context_for_planstate+deparse_expression in a separate
> memory context and free it after a plan node is generated.

Hmm, something like the attached? Seems reasonable...

- Heikki


Вложения

Re: OOM on EXPLAIN with lots of nodes

От
Alexey Bashtanov
Дата:
On 13.01.2015 16:47, Heikki Linnakangas wrote:
> Hmm, something like the attached? Seems reasonable...
>
> - Heikki
>
yes, i have tested something like this, it stopped eating memory

Just one small notice to the patch you attached: maybe it would be more 
safe to switch to oldcxt before calling 
ExplainPropertyText/ExplainPropertyList?
Otherwise are you pretty sure ExplainPropertyText and 
ExplainPropertyList are not going perform any pallocs without immediate 
pfrees in current memory context even in future?

Regards,  Alexey Bashtanov




Re: OOM on EXPLAIN with lots of nodes

От
Tom Lane
Дата:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> On 01/13/2015 02:08 PM, Alexey Bashtanov wrote:
>> By varying the parameters and reading source code I determined that
>> memory usage linearly depends on (plan nodes count)*(overall columns
>> count), thus it quadratically depends on number of tables unionized.
>> 
>> To remove this excessive memory usage I propose
>> to run deparse_context_for_planstate+deparse_expression in a separate
>> memory context and free it after a plan node is generated.

> Hmm, something like the attached? Seems reasonable...

This looks pretty unsafe to me: it assumes, without much justification,
that there is no memory allocated during show_expression() that will be
needed later.

I suspect the O(N^2) consumption comes directly from some workspace
allocated for variable deparsing in ruleutils.c.  A safer fix would
be to do retail pfrees on that.
        regards, tom lane



Re: OOM on EXPLAIN with lots of nodes

От
Tom Lane
Дата:
I wrote:
> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
>> Hmm, something like the attached? Seems reasonable...

> This looks pretty unsafe to me: it assumes, without much justification,
> that there is no memory allocated during show_expression() that will be
> needed later.

> I suspect the O(N^2) consumption comes directly from some workspace
> allocated for variable deparsing in ruleutils.c.  A safer fix would
> be to do retail pfrees on that.

I looked at this a bit more, and realized that we've got worse problems
than just O(N^2) space consumption: there's also O(N^2) time consumption
in ruleutils' set_simple_column_names().  The constant factor is not too
bad in this example but would be horrible with 1000 regular relations in
the rtable :-(.  Safe or not, the extra-context solution doesn't improve
the code's time consumption.

Really the issue here is that explain.c assumes that
deparse_context_for_planstate() is negligibly cheap, which was once true
but is so no longer.  What's more, most of the work is dependent only on
the rtable and so there is no good reason at all to be doing it more than
once per plan tree.

I think we can fix this by refactoring so that we construct a deparse
context once in ExplainPrintPlan(), and then just repoint it at
different planstate nodes as we work through the plan tree.

A difficulty with either your patch or my idea is that they require adding
another field to ExplainState, which is an ABI break for any third-party
code that might be declaring variables of that struct type.  That's fine
for HEAD but would be risky to back-patch.  Any thoughts about whether we
can get away with that (ie, anybody have an idea if there are third-party
extensions that call explain.c)?
        regards, tom lane



Re: OOM on EXPLAIN with lots of nodes

От
Alvaro Herrera
Дата:
Tom Lane wrote:

> A difficulty with either your patch or my idea is that they require adding
> another field to ExplainState, which is an ABI break for any third-party
> code that might be declaring variables of that struct type.  That's fine
> for HEAD but would be risky to back-patch.  Any thoughts about whether we
> can get away with that (ie, anybody have an idea if there are third-party
> extensions that call explain.c)?

codesearch.debian.net shows a couple of hits for ExplainState in
multicorn (an extension for FDW from Python data sources); I didn't look
but it seems that the FDW API is using that struct.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: OOM on EXPLAIN with lots of nodes

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane wrote:
>> A difficulty with either your patch or my idea is that they require adding
>> another field to ExplainState, which is an ABI break for any third-party
>> code that might be declaring variables of that struct type.  That's fine
>> for HEAD but would be risky to back-patch.  Any thoughts about whether we
>> can get away with that (ie, anybody have an idea if there are third-party
>> extensions that call explain.c)?

> codesearch.debian.net shows a couple of hits for ExplainState in
> multicorn (an extension for FDW from Python data sources); I didn't look
> but it seems that the FDW API is using that struct.

It is, but FDWs are not at risk here: they merely reference ExplainStates
that were allocated by core backend code.  So as long as we add the new
field at the end it's not a problem for them.  Problematic usage would be
like what auto_explain does:
           ExplainState es;
           ExplainInitState(&es);           ...

In hindsight, that's a bad API and we should change it to something like
           ExplainState *es = NewExplainState();

so that the sizeof the struct isn't embedded in extension code.  But we
definitely can't do that in back branches.
        regards, tom lane



Re: OOM on EXPLAIN with lots of nodes

От
Heikki Linnakangas
Дата:
On 01/13/2015 07:24 PM, Tom Lane wrote:
> It is, but FDWs are not at risk here: they merely reference ExplainStates
> that were allocated by core backend code.  So as long as we add the new
> field at the end it's not a problem for them.  Problematic usage would be
> like what auto_explain does:
>
>              ExplainState es;
>
>              ExplainInitState(&es);
>              ...
>
> In hindsight, that's a bad API and we should change it to something like
>
>              ExplainState *es = NewExplainState();
>
> so that the sizeof the struct isn't embedded in extension code.  But we
> definitely can't do that in back branches.

Actually, it would make sense to do exactly that, to break any 
extensions that are doing the unsafe thing in an obvious way. The 
downside would be that an extension using the new API would then not 
work on an old server.

We could repurpose one of the existing fields in ExplainState to point 
to another struct that contains more fields. Something like this:

*** a/src/include/commands/explain.h
--- b/src/include/commands/explain.h
***************
*** 40,48 ****      List       *rtable;            /* range table */      List       *rtable_names;    /* alias names
forRTEs */      int            indent;            /* current indentation level */
 
!     List       *grouping_stack; /* format-specific grouping state */  } ExplainState;
  /* Hook for plugins to get control in ExplainOneQuery() */  typedef void (*ExplainOneQuery_hook_type) (Query *query,
                                                      IntoClause *into,
 
--- 40,55 ----      List       *rtable;            /* range table */      List       *rtable_names;    /* alias names
forRTEs */      int            indent;            /* current indentation level */
 
!     ExplainStateExtraFields *extra;    /* more fields */  } ExplainState;

+ typedef struct ExplainStateExtraFields
+ {
+     List       *grouping_stack; /* format-specific grouping state */
+     ...
+ }
+

That's pretty ugly, but it would work even if there are ExplainState 
structs embeded in extensions. As long as they don't try to look at the 
grouping_stack field; I think that's fairly safe assumption.

But do we really need to backpatch any of this?

- Heikki




Re: OOM on EXPLAIN with lots of nodes

От
Tom Lane
Дата:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> On 01/13/2015 07:24 PM, Tom Lane wrote:
>> In hindsight, that's a bad API and we should change it to something like
>> ExplainState *es = NewExplainState();
>> so that the sizeof the struct isn't embedded in extension code.  But we
>> definitely can't do that in back branches.

> Actually, it would make sense to do exactly that, to break any 
> extensions that are doing the unsafe thing in an obvious way. The 
> downside would be that an extension using the new API would then not 
> work on an old server.

I guess that's a possibility ...

> We could repurpose one of the existing fields in ExplainState to point 
> to another struct that contains more fields. Something like this:
> ...
> That's pretty ugly, but it would work even if there are ExplainState 
> structs embeded in extensions. As long as they don't try to look at the 
> grouping_stack field; I think that's fairly safe assumption.

Yeah, I was thinking the same thing, but it's *mighty* ugly and would also
create a back-patch hazard, since presumably we'd not do that in HEAD.

> But do we really need to backpatch any of this?

Alexey's example consumes only a couple hundred MB in 9.2, vs about 7GB
peak in 9.3 and up.  That seems like a pretty nasty regression.
        regards, tom lane



Re: OOM on EXPLAIN with lots of nodes

От
Tom Lane
Дата:
I wrote:
> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
>> But do we really need to backpatch any of this?

> Alexey's example consumes only a couple hundred MB in 9.2, vs about 7GB
> peak in 9.3 and up.  That seems like a pretty nasty regression.

I did a bit more measurement of the time and backend memory consumption
for Alexey's example EXPLAIN:

9.2: 0.9 sec, circa 200 MB
HEAD: 56 sec, circa 7300 MB
with patch below: 3.7 sec, circa 300 MB

So while this doesn't get us all the way back down to where we were before
we started trying to guarantee unique table/column identifiers in EXPLAIN
printouts, it's at least a lot closer.

Not sure whether to just commit this to HEAD and call it a day, or to
risk back-patching.

It occurred to me that we could use your idea of a secondary data
structure and minimize the code impact with a macro layer, ie
    #define grouping_stack pointer_field->groupingstack
But I'm not sure if it's worth the trouble.  9.3 has been like this
right along, and this is the first complaint.

One compromise idea would be to back-patch only as far as 9.4.
If there are extensions out there that have this ABI issue, expecting
them to rebuild for 9.4.1 might be OK.

            regards, tom lane

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8a0be5d..39ceaf2 100644
*** a/src/backend/commands/explain.c
--- b/src/backend/commands/explain.c
*************** ExplainPrintPlan(ExplainState *es, Query
*** 563,568 ****
--- 563,570 ----
      es->rtable = queryDesc->plannedstmt->rtable;
      ExplainPreScanNode(queryDesc->planstate, &rels_used);
      es->rtable_names = select_rtable_names_for_explain(es->rtable, rels_used);
+     es->deparse_cxt = deparse_context_for_plan_rtable(es->rtable,
+                                                       es->rtable_names);
      ExplainNode(queryDesc->planstate, NIL, NULL, NULL, es);
  }

*************** show_plan_tlist(PlanState *planstate, Li
*** 1678,1687 ****
          return;

      /* Set up deparsing context */
!     context = deparse_context_for_planstate((Node *) planstate,
!                                             ancestors,
!                                             es->rtable,
!                                             es->rtable_names);
      useprefix = list_length(es->rtable) > 1;

      /* Deparse each result column (we now include resjunk ones) */
--- 1680,1688 ----
          return;

      /* Set up deparsing context */
!     context = set_deparse_context_planstate(es->deparse_cxt,
!                                             (Node *) planstate,
!                                             ancestors);
      useprefix = list_length(es->rtable) > 1;

      /* Deparse each result column (we now include resjunk ones) */
*************** show_expression(Node *node, const char *
*** 1710,1719 ****
      char       *exprstr;

      /* Set up deparsing context */
!     context = deparse_context_for_planstate((Node *) planstate,
!                                             ancestors,
!                                             es->rtable,
!                                             es->rtable_names);

      /* Deparse the expression */
      exprstr = deparse_expression(node, context, useprefix, false);
--- 1711,1719 ----
      char       *exprstr;

      /* Set up deparsing context */
!     context = set_deparse_context_planstate(es->deparse_cxt,
!                                             (Node *) planstate,
!                                             ancestors);

      /* Deparse the expression */
      exprstr = deparse_expression(node, context, useprefix, false);
*************** show_sort_group_keys(PlanState *planstat
*** 1855,1864 ****
          return;

      /* Set up deparsing context */
!     context = deparse_context_for_planstate((Node *) planstate,
!                                             ancestors,
!                                             es->rtable,
!                                             es->rtable_names);
      useprefix = (list_length(es->rtable) > 1 || es->verbose);

      for (keyno = 0; keyno < nkeys; keyno++)
--- 1855,1863 ----
          return;

      /* Set up deparsing context */
!     context = set_deparse_context_planstate(es->deparse_cxt,
!                                             (Node *) planstate,
!                                             ancestors);
      useprefix = (list_length(es->rtable) > 1 || es->verbose);

      for (keyno = 0; keyno < nkeys; keyno++)
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index dd748ac..cacd72c 100644
*** a/src/backend/utils/adt/ruleutils.c
--- b/src/backend/utils/adt/ruleutils.c
*************** deparse_context_for(const char *aliasnam
*** 2520,2526 ****
  }

  /*
!  * deparse_context_for_planstate    - Build deparse context for a plan
   *
   * When deparsing an expression in a Plan tree, we might have to resolve
   * OUTER_VAR, INNER_VAR, or INDEX_VAR references.  To do this, the caller must
--- 2520,2562 ----
  }

  /*
!  * deparse_context_for_plan_rtable - Build deparse context for a plan's rtable
!  *
!  * When deparsing an expression in a Plan tree, we use the plan's rangetable
!  * to resolve names of simple Vars.  The initialization of column names for
!  * this is rather expensive if the rangetable is large, and it'll be the same
!  * for every expression in the Plan tree; so we do it just once and re-use
!  * the result of this function for each expression.  (Note that the result
!  * is not usable until set_deparse_context_planstate() is applied to it.)
!  *
!  * In addition to the plan's rangetable list, pass the per-RTE alias names
!  * assigned by a previous call to select_rtable_names_for_explain.
!  */
! List *
! deparse_context_for_plan_rtable(List *rtable, List *rtable_names)
! {
!     deparse_namespace *dpns;
!
!     dpns = (deparse_namespace *) palloc0(sizeof(deparse_namespace));
!
!     /* Initialize fields that stay the same across the whole plan tree */
!     dpns->rtable = rtable;
!     dpns->rtable_names = rtable_names;
!     dpns->ctes = NIL;
!
!     /*
!      * Set up column name aliases.  We will get rather bogus results for join
!      * RTEs, but that doesn't matter because plan trees don't contain any join
!      * alias Vars.
!      */
!     set_simple_column_names(dpns);
!
!     /* Return a one-deep namespace stack */
!     return list_make1(dpns);
! }
!
! /*
!  * set_deparse_context_planstate    - Specify Plan node containing expression
   *
   * When deparsing an expression in a Plan tree, we might have to resolve
   * OUTER_VAR, INNER_VAR, or INDEX_VAR references.  To do this, the caller must
*************** deparse_context_for(const char *aliasnam
*** 2539,2575 ****
   * most-closely-nested first.  This is needed to resolve PARAM_EXEC Params.
   * Note we assume that all the PlanStates share the same rtable.
   *
!  * The plan's rangetable list must also be passed, along with the per-RTE
!  * alias names assigned by a previous call to select_rtable_names_for_explain.
!  * (We use the rangetable to resolve simple Vars, but the plan inputs are
!  * necessary for Vars with special varnos.)
   */
  List *
! deparse_context_for_planstate(Node *planstate, List *ancestors,
!                               List *rtable, List *rtable_names)
  {
      deparse_namespace *dpns;

!     dpns = (deparse_namespace *) palloc0(sizeof(deparse_namespace));
!
!     /* Initialize fields that stay the same across the whole plan tree */
!     dpns->rtable = rtable;
!     dpns->rtable_names = rtable_names;
!     dpns->ctes = NIL;
!
!     /*
!      * Set up column name aliases.  We will get rather bogus results for join
!      * RTEs, but that doesn't matter because plan trees don't contain any join
!      * alias Vars.
!      */
!     set_simple_column_names(dpns);

      /* Set our attention on the specific plan node passed in */
      set_deparse_planstate(dpns, (PlanState *) planstate);
      dpns->ancestors = ancestors;

!     /* Return a one-deep namespace stack */
!     return list_make1(dpns);
  }

  /*
--- 2575,2602 ----
   * most-closely-nested first.  This is needed to resolve PARAM_EXEC Params.
   * Note we assume that all the PlanStates share the same rtable.
   *
!  * Once this function has been called, deparse_expression() can be called on
!  * subsidiary expression(s) of the specified PlanState node.  To deparse
!  * expressions of a different Plan node in the same Plan tree, re-call this
!  * function to identify the new parent Plan node.
!  *
!  * The result is the same List passed in; this is a notational convenience.
   */
  List *
! set_deparse_context_planstate(List *dpcontext,
!                               Node *planstate, List *ancestors)
  {
      deparse_namespace *dpns;

!     /* Should always have one-entry namespace list for Plan deparsing */
!     Assert(list_length(dpcontext) == 1);
!     dpns = (deparse_namespace *) linitial(dpcontext);

      /* Set our attention on the specific plan node passed in */
      set_deparse_planstate(dpns, (PlanState *) planstate);
      dpns->ancestors = ancestors;

!     return dpcontext;
  }

  /*
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 6e26950..598f374 100644
*** a/src/include/commands/explain.h
--- b/src/include/commands/explain.h
*************** typedef struct ExplainState
*** 41,46 ****
--- 41,47 ----
      List       *rtable_names;    /* alias names for RTEs */
      int            indent;            /* current indentation level */
      List       *grouping_stack; /* format-specific grouping state */
+     List       *deparse_cxt;    /* context list for deparsing expressions */
  } ExplainState;

  /* Hook for plugins to get control in ExplainOneQuery() */
diff --git a/src/include/utils/ruleutils.h b/src/include/utils/ruleutils.h
index 188503c..86d2bea 100644
*** a/src/include/utils/ruleutils.h
--- b/src/include/utils/ruleutils.h
*************** extern char *pg_get_constraintdef_string
*** 25,32 ****
  extern char *deparse_expression(Node *expr, List *dpcontext,
                     bool forceprefix, bool showimplicit);
  extern List *deparse_context_for(const char *aliasname, Oid relid);
! extern List *deparse_context_for_planstate(Node *planstate, List *ancestors,
!                               List *rtable, List *rtable_names);
  extern List *select_rtable_names_for_explain(List *rtable,
                                  Bitmapset *rels_used);
  extern char *generate_collation_name(Oid collid);
--- 25,33 ----
  extern char *deparse_expression(Node *expr, List *dpcontext,
                     bool forceprefix, bool showimplicit);
  extern List *deparse_context_for(const char *aliasname, Oid relid);
! extern List *deparse_context_for_plan_rtable(List *rtable, List *rtable_names);
! extern List *set_deparse_context_planstate(List *dpcontext,
!                               Node *planstate, List *ancestors);
  extern List *select_rtable_names_for_explain(List *rtable,
                                  Bitmapset *rels_used);
  extern char *generate_collation_name(Oid collid);

Re: OOM on EXPLAIN with lots of nodes

От
Robert Haas
Дата:
On Tue, Jan 13, 2015 at 8:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
>>> But do we really need to backpatch any of this?
>
>> Alexey's example consumes only a couple hundred MB in 9.2, vs about 7GB
>> peak in 9.3 and up.  That seems like a pretty nasty regression.
>
> I did a bit more measurement of the time and backend memory consumption
> for Alexey's example EXPLAIN:
>
> 9.2: 0.9 sec, circa 200 MB
> HEAD: 56 sec, circa 7300 MB
> with patch below: 3.7 sec, circa 300 MB
>
> So while this doesn't get us all the way back down to where we were before
> we started trying to guarantee unique table/column identifiers in EXPLAIN
> printouts, it's at least a lot closer.
>
> Not sure whether to just commit this to HEAD and call it a day, or to
> risk back-patching.

I think we need to back-patch something; that's a pretty nasty
regression, and I have some EDB-internal reports that might be from
the same cause.  I'm not too concerned about forcibly breaking the API
here, but I can understand why somebody might want to do that.  If we
do, I like the idea of renaming ExplainInitState() or maybe by
replacing it by a NewExplainState() function that is used instead.
But I'm not sure how necessary it is really.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: OOM on EXPLAIN with lots of nodes

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jan 13, 2015 at 8:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Not sure whether to just commit this to HEAD and call it a day, or to
>> risk back-patching.

> I think we need to back-patch something; that's a pretty nasty
> regression, and I have some EDB-internal reports that might be from
> the same cause.

OK.  I went ahead and made the necessary code changes to avoid changing
the struct size in released branches, as per Heikki's idea.
        regards, tom lane