Обсуждение: OOM on EXPLAIN with lots of nodes
Hello! I found that EXPLAIN command takes very much memory to execute when huge unions are used. For example the following sql -- begin sql create table t (a000 int, a001 int, ... a099 int); explain select * from ( select a001 a from t union all select a001 a from t union all ... (1000 times) ... union all select a001 a from t ) _ where a = 1; -- end sql took more than 1GB of memory to execute. Namely converting of the plan to a human-readable form causes excessive memory usage, not planning itself. By varying the parameters and reading source code I determined that memory usage linearly depends on (plan nodes count)*(overall columns count), thus it quadratically depends on number of tables unionized. To remove this excessive memory usage I propose to run deparse_context_for_planstate+deparse_expression in a separate memory context and free it after a plan node is generated. Any reasons to treat this idea as bad? BTW in this case explain execution is also quite long (I got tens of seconds). But I have no immediate ideas how to improve it. Regards, Alexey Bashtanov
On 01/13/2015 02:08 PM, Alexey Bashtanov wrote: > I found that EXPLAIN command takes very much memory to execute when huge > unions are used. > For example the following sql > -- begin sql > create table t (a000 int, a001 int, ... a099 int); > explain select * from ( > select a001 a from t > union all > select a001 a from t > union all > ... (1000 times) ... > union all > select a001 a from t > ) _ where a = 1; > -- end sql > took more than 1GB of memory to execute. > > Namely converting of the plan to a human-readable form causes excessive > memory usage, not planning itself. > > By varying the parameters and reading source code I determined that > memory usage linearly depends on (plan nodes count)*(overall columns > count), thus it quadratically depends on number of tables unionized. > > To remove this excessive memory usage I propose > to run deparse_context_for_planstate+deparse_expression in a separate > memory context and free it after a plan node is generated. Hmm, something like the attached? Seems reasonable... - Heikki
Вложения
On 13.01.2015 16:47, Heikki Linnakangas wrote: > Hmm, something like the attached? Seems reasonable... > > - Heikki > yes, i have tested something like this, it stopped eating memory Just one small notice to the patch you attached: maybe it would be more safe to switch to oldcxt before calling ExplainPropertyText/ExplainPropertyList? Otherwise are you pretty sure ExplainPropertyText and ExplainPropertyList are not going perform any pallocs without immediate pfrees in current memory context even in future? Regards, Alexey Bashtanov
Heikki Linnakangas <hlinnakangas@vmware.com> writes: > On 01/13/2015 02:08 PM, Alexey Bashtanov wrote: >> By varying the parameters and reading source code I determined that >> memory usage linearly depends on (plan nodes count)*(overall columns >> count), thus it quadratically depends on number of tables unionized. >> >> To remove this excessive memory usage I propose >> to run deparse_context_for_planstate+deparse_expression in a separate >> memory context and free it after a plan node is generated. > Hmm, something like the attached? Seems reasonable... This looks pretty unsafe to me: it assumes, without much justification, that there is no memory allocated during show_expression() that will be needed later. I suspect the O(N^2) consumption comes directly from some workspace allocated for variable deparsing in ruleutils.c. A safer fix would be to do retail pfrees on that. regards, tom lane
I wrote: > Heikki Linnakangas <hlinnakangas@vmware.com> writes: >> Hmm, something like the attached? Seems reasonable... > This looks pretty unsafe to me: it assumes, without much justification, > that there is no memory allocated during show_expression() that will be > needed later. > I suspect the O(N^2) consumption comes directly from some workspace > allocated for variable deparsing in ruleutils.c. A safer fix would > be to do retail pfrees on that. I looked at this a bit more, and realized that we've got worse problems than just O(N^2) space consumption: there's also O(N^2) time consumption in ruleutils' set_simple_column_names(). The constant factor is not too bad in this example but would be horrible with 1000 regular relations in the rtable :-(. Safe or not, the extra-context solution doesn't improve the code's time consumption. Really the issue here is that explain.c assumes that deparse_context_for_planstate() is negligibly cheap, which was once true but is so no longer. What's more, most of the work is dependent only on the rtable and so there is no good reason at all to be doing it more than once per plan tree. I think we can fix this by refactoring so that we construct a deparse context once in ExplainPrintPlan(), and then just repoint it at different planstate nodes as we work through the plan tree. A difficulty with either your patch or my idea is that they require adding another field to ExplainState, which is an ABI break for any third-party code that might be declaring variables of that struct type. That's fine for HEAD but would be risky to back-patch. Any thoughts about whether we can get away with that (ie, anybody have an idea if there are third-party extensions that call explain.c)? regards, tom lane
Tom Lane wrote: > A difficulty with either your patch or my idea is that they require adding > another field to ExplainState, which is an ABI break for any third-party > code that might be declaring variables of that struct type. That's fine > for HEAD but would be risky to back-patch. Any thoughts about whether we > can get away with that (ie, anybody have an idea if there are third-party > extensions that call explain.c)? codesearch.debian.net shows a couple of hits for ExplainState in multicorn (an extension for FDW from Python data sources); I didn't look but it seems that the FDW API is using that struct. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Tom Lane wrote: >> A difficulty with either your patch or my idea is that they require adding >> another field to ExplainState, which is an ABI break for any third-party >> code that might be declaring variables of that struct type. That's fine >> for HEAD but would be risky to back-patch. Any thoughts about whether we >> can get away with that (ie, anybody have an idea if there are third-party >> extensions that call explain.c)? > codesearch.debian.net shows a couple of hits for ExplainState in > multicorn (an extension for FDW from Python data sources); I didn't look > but it seems that the FDW API is using that struct. It is, but FDWs are not at risk here: they merely reference ExplainStates that were allocated by core backend code. So as long as we add the new field at the end it's not a problem for them. Problematic usage would be like what auto_explain does: ExplainState es; ExplainInitState(&es); ... In hindsight, that's a bad API and we should change it to something like ExplainState *es = NewExplainState(); so that the sizeof the struct isn't embedded in extension code. But we definitely can't do that in back branches. regards, tom lane
On 01/13/2015 07:24 PM, Tom Lane wrote: > It is, but FDWs are not at risk here: they merely reference ExplainStates > that were allocated by core backend code. So as long as we add the new > field at the end it's not a problem for them. Problematic usage would be > like what auto_explain does: > > ExplainState es; > > ExplainInitState(&es); > ... > > In hindsight, that's a bad API and we should change it to something like > > ExplainState *es = NewExplainState(); > > so that the sizeof the struct isn't embedded in extension code. But we > definitely can't do that in back branches. Actually, it would make sense to do exactly that, to break any extensions that are doing the unsafe thing in an obvious way. The downside would be that an extension using the new API would then not work on an old server. We could repurpose one of the existing fields in ExplainState to point to another struct that contains more fields. Something like this: *** a/src/include/commands/explain.h --- b/src/include/commands/explain.h *************** *** 40,48 **** List *rtable; /* range table */ List *rtable_names; /* alias names forRTEs */ int indent; /* current indentation level */ ! List *grouping_stack; /* format-specific grouping state */ } ExplainState; /* Hook for plugins to get control in ExplainOneQuery() */ typedef void (*ExplainOneQuery_hook_type) (Query *query, IntoClause *into, --- 40,55 ---- List *rtable; /* range table */ List *rtable_names; /* alias names forRTEs */ int indent; /* current indentation level */ ! ExplainStateExtraFields *extra; /* more fields */ } ExplainState; + typedef struct ExplainStateExtraFields + { + List *grouping_stack; /* format-specific grouping state */ + ... + } + That's pretty ugly, but it would work even if there are ExplainState structs embeded in extensions. As long as they don't try to look at the grouping_stack field; I think that's fairly safe assumption. But do we really need to backpatch any of this? - Heikki
Heikki Linnakangas <hlinnakangas@vmware.com> writes: > On 01/13/2015 07:24 PM, Tom Lane wrote: >> In hindsight, that's a bad API and we should change it to something like >> ExplainState *es = NewExplainState(); >> so that the sizeof the struct isn't embedded in extension code. But we >> definitely can't do that in back branches. > Actually, it would make sense to do exactly that, to break any > extensions that are doing the unsafe thing in an obvious way. The > downside would be that an extension using the new API would then not > work on an old server. I guess that's a possibility ... > We could repurpose one of the existing fields in ExplainState to point > to another struct that contains more fields. Something like this: > ... > That's pretty ugly, but it would work even if there are ExplainState > structs embeded in extensions. As long as they don't try to look at the > grouping_stack field; I think that's fairly safe assumption. Yeah, I was thinking the same thing, but it's *mighty* ugly and would also create a back-patch hazard, since presumably we'd not do that in HEAD. > But do we really need to backpatch any of this? Alexey's example consumes only a couple hundred MB in 9.2, vs about 7GB peak in 9.3 and up. That seems like a pretty nasty regression. regards, tom lane
I wrote: > Heikki Linnakangas <hlinnakangas@vmware.com> writes: >> But do we really need to backpatch any of this? > Alexey's example consumes only a couple hundred MB in 9.2, vs about 7GB > peak in 9.3 and up. That seems like a pretty nasty regression. I did a bit more measurement of the time and backend memory consumption for Alexey's example EXPLAIN: 9.2: 0.9 sec, circa 200 MB HEAD: 56 sec, circa 7300 MB with patch below: 3.7 sec, circa 300 MB So while this doesn't get us all the way back down to where we were before we started trying to guarantee unique table/column identifiers in EXPLAIN printouts, it's at least a lot closer. Not sure whether to just commit this to HEAD and call it a day, or to risk back-patching. It occurred to me that we could use your idea of a secondary data structure and minimize the code impact with a macro layer, ie #define grouping_stack pointer_field->groupingstack But I'm not sure if it's worth the trouble. 9.3 has been like this right along, and this is the first complaint. One compromise idea would be to back-patch only as far as 9.4. If there are extensions out there that have this ABI issue, expecting them to rebuild for 9.4.1 might be OK. regards, tom lane diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 8a0be5d..39ceaf2 100644 *** a/src/backend/commands/explain.c --- b/src/backend/commands/explain.c *************** ExplainPrintPlan(ExplainState *es, Query *** 563,568 **** --- 563,570 ---- es->rtable = queryDesc->plannedstmt->rtable; ExplainPreScanNode(queryDesc->planstate, &rels_used); es->rtable_names = select_rtable_names_for_explain(es->rtable, rels_used); + es->deparse_cxt = deparse_context_for_plan_rtable(es->rtable, + es->rtable_names); ExplainNode(queryDesc->planstate, NIL, NULL, NULL, es); } *************** show_plan_tlist(PlanState *planstate, Li *** 1678,1687 **** return; /* Set up deparsing context */ ! context = deparse_context_for_planstate((Node *) planstate, ! ancestors, ! es->rtable, ! es->rtable_names); useprefix = list_length(es->rtable) > 1; /* Deparse each result column (we now include resjunk ones) */ --- 1680,1688 ---- return; /* Set up deparsing context */ ! context = set_deparse_context_planstate(es->deparse_cxt, ! (Node *) planstate, ! ancestors); useprefix = list_length(es->rtable) > 1; /* Deparse each result column (we now include resjunk ones) */ *************** show_expression(Node *node, const char * *** 1710,1719 **** char *exprstr; /* Set up deparsing context */ ! context = deparse_context_for_planstate((Node *) planstate, ! ancestors, ! es->rtable, ! es->rtable_names); /* Deparse the expression */ exprstr = deparse_expression(node, context, useprefix, false); --- 1711,1719 ---- char *exprstr; /* Set up deparsing context */ ! context = set_deparse_context_planstate(es->deparse_cxt, ! (Node *) planstate, ! ancestors); /* Deparse the expression */ exprstr = deparse_expression(node, context, useprefix, false); *************** show_sort_group_keys(PlanState *planstat *** 1855,1864 **** return; /* Set up deparsing context */ ! context = deparse_context_for_planstate((Node *) planstate, ! ancestors, ! es->rtable, ! es->rtable_names); useprefix = (list_length(es->rtable) > 1 || es->verbose); for (keyno = 0; keyno < nkeys; keyno++) --- 1855,1863 ---- return; /* Set up deparsing context */ ! context = set_deparse_context_planstate(es->deparse_cxt, ! (Node *) planstate, ! ancestors); useprefix = (list_length(es->rtable) > 1 || es->verbose); for (keyno = 0; keyno < nkeys; keyno++) diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index dd748ac..cacd72c 100644 *** a/src/backend/utils/adt/ruleutils.c --- b/src/backend/utils/adt/ruleutils.c *************** deparse_context_for(const char *aliasnam *** 2520,2526 **** } /* ! * deparse_context_for_planstate - Build deparse context for a plan * * When deparsing an expression in a Plan tree, we might have to resolve * OUTER_VAR, INNER_VAR, or INDEX_VAR references. To do this, the caller must --- 2520,2562 ---- } /* ! * deparse_context_for_plan_rtable - Build deparse context for a plan's rtable ! * ! * When deparsing an expression in a Plan tree, we use the plan's rangetable ! * to resolve names of simple Vars. The initialization of column names for ! * this is rather expensive if the rangetable is large, and it'll be the same ! * for every expression in the Plan tree; so we do it just once and re-use ! * the result of this function for each expression. (Note that the result ! * is not usable until set_deparse_context_planstate() is applied to it.) ! * ! * In addition to the plan's rangetable list, pass the per-RTE alias names ! * assigned by a previous call to select_rtable_names_for_explain. ! */ ! List * ! deparse_context_for_plan_rtable(List *rtable, List *rtable_names) ! { ! deparse_namespace *dpns; ! ! dpns = (deparse_namespace *) palloc0(sizeof(deparse_namespace)); ! ! /* Initialize fields that stay the same across the whole plan tree */ ! dpns->rtable = rtable; ! dpns->rtable_names = rtable_names; ! dpns->ctes = NIL; ! ! /* ! * Set up column name aliases. We will get rather bogus results for join ! * RTEs, but that doesn't matter because plan trees don't contain any join ! * alias Vars. ! */ ! set_simple_column_names(dpns); ! ! /* Return a one-deep namespace stack */ ! return list_make1(dpns); ! } ! ! /* ! * set_deparse_context_planstate - Specify Plan node containing expression * * When deparsing an expression in a Plan tree, we might have to resolve * OUTER_VAR, INNER_VAR, or INDEX_VAR references. To do this, the caller must *************** deparse_context_for(const char *aliasnam *** 2539,2575 **** * most-closely-nested first. This is needed to resolve PARAM_EXEC Params. * Note we assume that all the PlanStates share the same rtable. * ! * The plan's rangetable list must also be passed, along with the per-RTE ! * alias names assigned by a previous call to select_rtable_names_for_explain. ! * (We use the rangetable to resolve simple Vars, but the plan inputs are ! * necessary for Vars with special varnos.) */ List * ! deparse_context_for_planstate(Node *planstate, List *ancestors, ! List *rtable, List *rtable_names) { deparse_namespace *dpns; ! dpns = (deparse_namespace *) palloc0(sizeof(deparse_namespace)); ! ! /* Initialize fields that stay the same across the whole plan tree */ ! dpns->rtable = rtable; ! dpns->rtable_names = rtable_names; ! dpns->ctes = NIL; ! ! /* ! * Set up column name aliases. We will get rather bogus results for join ! * RTEs, but that doesn't matter because plan trees don't contain any join ! * alias Vars. ! */ ! set_simple_column_names(dpns); /* Set our attention on the specific plan node passed in */ set_deparse_planstate(dpns, (PlanState *) planstate); dpns->ancestors = ancestors; ! /* Return a one-deep namespace stack */ ! return list_make1(dpns); } /* --- 2575,2602 ---- * most-closely-nested first. This is needed to resolve PARAM_EXEC Params. * Note we assume that all the PlanStates share the same rtable. * ! * Once this function has been called, deparse_expression() can be called on ! * subsidiary expression(s) of the specified PlanState node. To deparse ! * expressions of a different Plan node in the same Plan tree, re-call this ! * function to identify the new parent Plan node. ! * ! * The result is the same List passed in; this is a notational convenience. */ List * ! set_deparse_context_planstate(List *dpcontext, ! Node *planstate, List *ancestors) { deparse_namespace *dpns; ! /* Should always have one-entry namespace list for Plan deparsing */ ! Assert(list_length(dpcontext) == 1); ! dpns = (deparse_namespace *) linitial(dpcontext); /* Set our attention on the specific plan node passed in */ set_deparse_planstate(dpns, (PlanState *) planstate); dpns->ancestors = ancestors; ! return dpcontext; } /* diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h index 6e26950..598f374 100644 *** a/src/include/commands/explain.h --- b/src/include/commands/explain.h *************** typedef struct ExplainState *** 41,46 **** --- 41,47 ---- List *rtable_names; /* alias names for RTEs */ int indent; /* current indentation level */ List *grouping_stack; /* format-specific grouping state */ + List *deparse_cxt; /* context list for deparsing expressions */ } ExplainState; /* Hook for plugins to get control in ExplainOneQuery() */ diff --git a/src/include/utils/ruleutils.h b/src/include/utils/ruleutils.h index 188503c..86d2bea 100644 *** a/src/include/utils/ruleutils.h --- b/src/include/utils/ruleutils.h *************** extern char *pg_get_constraintdef_string *** 25,32 **** extern char *deparse_expression(Node *expr, List *dpcontext, bool forceprefix, bool showimplicit); extern List *deparse_context_for(const char *aliasname, Oid relid); ! extern List *deparse_context_for_planstate(Node *planstate, List *ancestors, ! List *rtable, List *rtable_names); extern List *select_rtable_names_for_explain(List *rtable, Bitmapset *rels_used); extern char *generate_collation_name(Oid collid); --- 25,33 ---- extern char *deparse_expression(Node *expr, List *dpcontext, bool forceprefix, bool showimplicit); extern List *deparse_context_for(const char *aliasname, Oid relid); ! extern List *deparse_context_for_plan_rtable(List *rtable, List *rtable_names); ! extern List *set_deparse_context_planstate(List *dpcontext, ! Node *planstate, List *ancestors); extern List *select_rtable_names_for_explain(List *rtable, Bitmapset *rels_used); extern char *generate_collation_name(Oid collid);
On Tue, Jan 13, 2015 at 8:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> Heikki Linnakangas <hlinnakangas@vmware.com> writes: >>> But do we really need to backpatch any of this? > >> Alexey's example consumes only a couple hundred MB in 9.2, vs about 7GB >> peak in 9.3 and up. That seems like a pretty nasty regression. > > I did a bit more measurement of the time and backend memory consumption > for Alexey's example EXPLAIN: > > 9.2: 0.9 sec, circa 200 MB > HEAD: 56 sec, circa 7300 MB > with patch below: 3.7 sec, circa 300 MB > > So while this doesn't get us all the way back down to where we were before > we started trying to guarantee unique table/column identifiers in EXPLAIN > printouts, it's at least a lot closer. > > Not sure whether to just commit this to HEAD and call it a day, or to > risk back-patching. I think we need to back-patch something; that's a pretty nasty regression, and I have some EDB-internal reports that might be from the same cause. I'm not too concerned about forcibly breaking the API here, but I can understand why somebody might want to do that. If we do, I like the idea of renaming ExplainInitState() or maybe by replacing it by a NewExplainState() function that is used instead. But I'm not sure how necessary it is really. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jan 13, 2015 at 8:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Not sure whether to just commit this to HEAD and call it a day, or to >> risk back-patching. > I think we need to back-patch something; that's a pretty nasty > regression, and I have some EDB-internal reports that might be from > the same cause. OK. I went ahead and made the necessary code changes to avoid changing the struct size in released branches, as per Heikki's idea. regards, tom lane