Обсуждение: Centralizing protective copying of utility statements

Поиск

Список

Период

Сортировка

Centralizing protective copying of utility statements

От

Tom Lane

Дата:

17 июня 2021 г., 04:39:49

I haven't pushed my quick-hack fix for bug #17053 ([1]) because
I wasn't really satisfied with band-aiding that problem in one
more place.  I took a look around to see if we could protect
against the whole class of scribble-on-a-utility-statement
issues in a more centralized way.

What I found is that there are only two places that call
ProcessUtility with a statement that might be coming from the plan
cache.  _SPI_execute_plan is always doing so, so it can just
unconditionally copy the statement.  The other one is
PortalRunUtility, which can examine the Portal to see if the
parsetree came out of cache or not.  Having added copyObject
calls there, we can get rid of the retail calls that exist
in not-quite-enough utility statement execution routines.

I think this would have been more complicated before plpgsql
started using the plancache; at least, some of the comments
removed here refer to plpgsql as being an independent hazard.
Also, I didn't risk removing any copyObject calls that are
further down than the top level of statement execution handlers.
Some of those are visibly still necessary, and others are hard
to be sure about.

Although this adds some overhead in the form of copying of
utility node trees that won't actually mutate during execution,
I think that won't be too bad because those trees tend to be
small and hence cheap to copy.  The statements that can have
a lot of substructure usually contain expression trees or the
like, which do have to be copied for safety.  Moreover, we buy
back a lot of cost by removing pointless copying when we're
not executing on a cached plan.

(BTW, in case you are wondering: this hazard only exists for
utility statements, because we long ago made the executor
not modify the Plan tree it's given.)

This is possibly too aggressive to consider for back-patching.
In the back branches, perhaps we should just use my original
localized fix.  Another conservative (but expensive) answer
for the back branches is to add the new copyObject calls
but not remove any of the old ones.

Thoughts?

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/17053-3ca3f501bbc212b4%40postgresql.org

diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 89a4f8f810..b6eacd5baa 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -438,14 +438,8 @@ BeginCopyTo(ParseState *pstate,
         /*
          * Run parse analysis and rewrite.  Note this also acquires sufficient
          * locks on the source table(s).
-         *
-         * Because the parser and planner tend to scribble on their input, we
-         * make a preliminary copy of the source querytree.  This prevents
-         * problems in the case that the COPY is in a portal or plpgsql
-         * function and is executed repeatedly.  (See also the same hack in
-         * DECLARE CURSOR and PREPARE.)  XXX FIXME someday.
          */
-        rewritten = pg_analyze_and_rewrite(copyObject(raw_query),
+        rewritten = pg_analyze_and_rewrite(raw_query,
                                            pstate->p_sourcetext, NULL, 0,
                                            NULL);

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index dce882012e..0982851715 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -299,14 +299,8 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
          * rewriter.  We do not do AcquireRewriteLocks: we assume the query
          * either came straight from the parser, or suitable locks were
          * acquired by plancache.c.
-         *
-         * Because the rewriter and planner tend to scribble on the input, we
-         * make a preliminary copy of the source querytree.  This prevents
-         * problems in the case that CTAS is in a portal or plpgsql function
-         * and is executed repeatedly.  (See also the same hack in EXPLAIN and
-         * PREPARE.)
          */
-        rewritten = QueryRewrite(copyObject(query));
+        rewritten = QueryRewrite(query);

         /* SELECT should never rewrite to more or less than one SELECT query */
         if (list_length(rewritten) != 1)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 9a60865d19..7045f6ad55 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -256,14 +256,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
      * rewriter.  We do not do AcquireRewriteLocks: we assume the query either
      * came straight from the parser, or suitable locks were acquired by
      * plancache.c.
-     *
-     * Because the rewriter and planner tend to scribble on the input, we make
-     * a preliminary copy of the source querytree.  This prevents problems in
-     * the case that the EXPLAIN is in a portal or plpgsql function and is
-     * executed repeatedly.  (See also the same hack in DECLARE CURSOR and
-     * PREPARE.)  XXX FIXME someday.
      */
-    rewritten = QueryRewrite(castNode(Query, copyObject(stmt->query)));
+    rewritten = QueryRewrite(castNode(Query, stmt->query));

     /* emit opening boilerplate */
     ExplainBeginOutput(es);
@@ -441,8 +435,7 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
     {
         /*
          * We have to rewrite the contained SELECT and then pass it back to
-         * ExplainOneQuery.  It's probably not really necessary to copy the
-         * contained parsetree another time, but let's be safe.
+         * ExplainOneQuery.
          */
         CreateTableAsStmt *ctas = (CreateTableAsStmt *) utilityStmt;
         List       *rewritten;
@@ -463,7 +456,7 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
             return;
         }

-        rewritten = QueryRewrite(castNode(Query, copyObject(ctas->query)));
+        rewritten = QueryRewrite(castNode(Query, ctas->query));
         Assert(list_length(rewritten) == 1);
         ExplainOneQuery(linitial_node(Query, rewritten),
                         CURSOR_OPT_PARALLEL_OK, ctas->into, es,
@@ -482,7 +475,7 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
         DeclareCursorStmt *dcs = (DeclareCursorStmt *) utilityStmt;
         List       *rewritten;

-        rewritten = QueryRewrite(castNode(Query, copyObject(dcs->query)));
+        rewritten = QueryRewrite(castNode(Query, dcs->query));
         Assert(list_length(rewritten) == 1);
         ExplainOneQuery(linitial_node(Query, rewritten),
                         dcs->options, NULL, es,
diff --git a/src/backend/commands/policy.c b/src/backend/commands/policy.c
index 5cacc088cd..5d469309ce 100644
--- a/src/backend/commands/policy.c
+++ b/src/backend/commands/policy.c
@@ -747,12 +747,12 @@ CreatePolicy(CreatePolicyStmt *stmt)
     addNSItemToQuery(with_check_pstate, nsitem, false, true, true);

     qual = transformWhereClause(qual_pstate,
-                                copyObject(stmt->qual),
+                                stmt->qual,
                                 EXPR_KIND_POLICY,
                                 "POLICY");

     with_check_qual = transformWhereClause(with_check_pstate,
-                                           copyObject(stmt->with_check),
+                                           stmt->with_check,
                                            EXPR_KIND_POLICY,
                                            "POLICY");

@@ -922,7 +922,7 @@ AlterPolicy(AlterPolicyStmt *stmt)

         addNSItemToQuery(qual_pstate, nsitem, false, true, true);

-        qual = transformWhereClause(qual_pstate, copyObject(stmt->qual),
+        qual = transformWhereClause(qual_pstate, stmt->qual,
                                     EXPR_KIND_POLICY,
                                     "POLICY");

@@ -946,7 +946,7 @@ AlterPolicy(AlterPolicyStmt *stmt)
         addNSItemToQuery(with_check_pstate, nsitem, false, true, true);

         with_check_qual = transformWhereClause(with_check_pstate,
-                                               copyObject(stmt->with_check),
+                                               stmt->with_check,
                                                EXPR_KIND_POLICY,
                                                "POLICY");

diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index d34cc39fde..3ea30bcbc9 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -76,14 +76,8 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
      * rewriter.  We do not do AcquireRewriteLocks: we assume the query either
      * came straight from the parser, or suitable locks were acquired by
      * plancache.c.
-     *
-     * Because the rewriter and planner tend to scribble on the input, we make
-     * a preliminary copy of the source querytree.  This prevents problems in
-     * the case that the DECLARE CURSOR is in a portal or plpgsql function and
-     * is executed repeatedly.  (See also the same hack in EXPLAIN and
-     * PREPARE.)  XXX FIXME someday.
      */
-    rewritten = QueryRewrite((Query *) copyObject(query));
+    rewritten = QueryRewrite(query);

     /* SELECT should never rewrite to more or less than one query */
     if (list_length(rewritten) != 1)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 65d8ac65b1..5e03c7c5aa 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -78,12 +78,9 @@ PrepareQuery(ParseState *pstate, PrepareStmt *stmt,
     /*
      * Need to wrap the contained statement in a RawStmt node to pass it to
      * parse analysis.
-     *
-     * Because parse analysis scribbles on the raw querytree, we must make a
-     * copy to ensure we don't modify the passed-in tree.  FIXME someday.
      */
     rawstmt = makeNode(RawStmt);
-    rawstmt->stmt = (Node *) copyObject(stmt->query);
+    rawstmt->stmt = stmt->query;
     rawstmt->stmt_location = stmt_location;
     rawstmt->stmt_len = stmt_len;

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 028e8ac46b..4e23c7fce5 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -4408,8 +4408,7 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
      * Copy the original subcommand for each table.  This avoids conflicts
      * when different child tables need to make different parse
      * transformations (for example, the same column may have different column
-     * numbers in different children).  It also ensures that we don't corrupt
-     * the original parse tree, in case it is saved in plancache.
+     * numbers in different children).
      */
     cmd = copyObject(cmd);

diff --git a/src/backend/commands/view.c b/src/backend/commands/view.c
index f2642dba6c..4df05a0b33 100644
--- a/src/backend/commands/view.c
+++ b/src/backend/commands/view.c
@@ -417,12 +417,9 @@ DefineView(ViewStmt *stmt, const char *queryString,
     /*
      * Run parse analysis to convert the raw parse tree to a Query.  Note this
      * also acquires sufficient locks on the source table(s).
-     *
-     * Since parse analysis scribbles on its input, copy the raw parse tree;
-     * this ensures we don't corrupt a prepared statement, for example.
      */
     rawstmt = makeNode(RawStmt);
-    rawstmt->stmt = (Node *) copyObject(stmt->query);
+    rawstmt->stmt = stmt->query;
     rawstmt->stmt_location = stmt_location;
     rawstmt->stmt_len = stmt_len;

diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index b8bd05e894..70800a649e 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2532,6 +2532,14 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
                 ProcessUtilityContext context;
                 QueryCompletion qc;

+                /*
+                 * Copy the utility statement out of the plan cache, in case
+                 * its processing scribbles on the statement's node tree.
+                 * (This could be refined if we knew which utility statements
+                 * do that.)
+                 */
+                stmt = copyObject(stmt);
+
                 /*
                  * If the SPI context is atomic, or we were not told to allow
                  * nonatomic operations, tell ProcessUtility this is an atomic
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index c9708e38f4..81d3e7990c 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -11,10 +11,6 @@
  * Hence these functions are now called at the start of execution of their
  * respective utility commands.
  *
- * NOTE: in general we must avoid scribbling on the passed-in raw parse
- * tree, since it might be in a plan cache.  The simplest solution is
- * a quick copyObject() call before manipulating the query tree.
- *
  *
  * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -177,12 +173,6 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString)
     Oid            existing_relid;
     ParseCallbackState pcbstate;

-    /*
-     * We must not scribble on the passed-in CreateStmt, so copy it.  (This is
-     * overkill, but easy.)
-     */
-    stmt = copyObject(stmt);
-
     /* Set up pstate */
     pstate = make_parsestate(NULL);
     pstate->p_sourcetext = queryString;
@@ -2824,12 +2814,6 @@ transformIndexStmt(Oid relid, IndexStmt *stmt, const char *queryString)
     if (stmt->transformed)
         return stmt;

-    /*
-     * We must not scribble on the passed-in IndexStmt, so copy it.  (This is
-     * overkill, but easy.)
-     */
-    stmt = copyObject(stmt);
-
     /* Set up pstate */
     pstate = make_parsestate(NULL);
     pstate->p_sourcetext = queryString;
@@ -2925,12 +2909,6 @@ transformStatsStmt(Oid relid, CreateStatsStmt *stmt, const char *queryString)
     if (stmt->transformed)
         return stmt;

-    /*
-     * We must not scribble on the passed-in CreateStatsStmt, so copy it.
-     * (This is overkill, but easy.)
-     */
-    stmt = copyObject(stmt);
-
     /* Set up pstate */
     pstate = make_parsestate(NULL);
     pstate->p_sourcetext = queryString;
@@ -2993,9 +2971,6 @@ transformStatsStmt(Oid relid, CreateStatsStmt *stmt, const char *queryString)
  *
  * actions and whereClause are output parameters that receive the
  * transformed results.
- *
- * Note that we must not scribble on the passed-in RuleStmt, so we do
- * copyObject() on the actions and WHERE clause.
  */
 void
 transformRuleStmt(RuleStmt *stmt, const char *queryString,
@@ -3070,7 +3045,7 @@ transformRuleStmt(RuleStmt *stmt, const char *queryString,

     /* take care of the where clause */
     *whereClause = transformWhereClause(pstate,
-                                        (Node *) copyObject(stmt->whereClause),
+                                        stmt->whereClause,
                                         EXPR_KIND_WHERE,
                                         "WHERE");
     /* we have to fix its collations too */
@@ -3142,8 +3117,7 @@ transformRuleStmt(RuleStmt *stmt, const char *queryString,
             addNSItemToQuery(sub_pstate, newnsitem, false, true, false);

             /* Transform the rule action statement */
-            top_subqry = transformStmt(sub_pstate,
-                                       (Node *) copyObject(action));
+            top_subqry = transformStmt(sub_pstate, action);

             /*
              * We cannot support utility-statement actions (eg NOTIFY) with
@@ -3325,12 +3299,6 @@ transformAlterTableStmt(Oid relid, AlterTableStmt *stmt,
     AlterTableCmd *newcmd;
     ParseNamespaceItem *nsitem;

-    /*
-     * We must not scribble on the passed-in AlterTableStmt, so copy it. (This
-     * is overkill, but easy.)
-     */
-    stmt = copyObject(stmt);
-
     /* Caller is responsible for locking the relation */
     rel = relation_open(relid, NoLock);
     tupdesc = RelationGetDescr(rel);
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index f7f08e6c05..4e02e1821c 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -1144,6 +1144,15 @@ PortalRunUtility(Portal portal, PlannedStmt *pstmt,
     else
         portal->portalSnapshot = NULL;

+    /*
+     * If the Portal's statement list belongs to a cached plan, we must not
+     * damage the original statement node tree.  Processing of the statement
+     * might scribble on the node tree, so copy the statement.  (This could be
+     * refined if we knew which utility statements do that.)
+     */
+    if (portal->cplan)
+        pstmt = copyObject(pstmt);
+
     ProcessUtility(pstmt,
                    portal->sourceText,
                    isTopLevel ? PROCESS_UTILITY_TOPLEVEL : PROCESS_UTILITY_QUERY,
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 1a8fc16773..81f34dfb69 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -485,6 +485,10 @@ CheckRestrictedOperation(const char *cmdname)
  *    qc: where to store command completion status data.  May be NULL,
  *        but if not, then caller must have initialized it.
  *
+ * Some utility statement execution functions will scribble on the passed
+ * "pstmt" node tree.  It is caller's responsibility that that tree not be
+ * something that must be preserved.
+ *
  * Caller MUST supply a queryString; it is not allowed (anymore) to pass NULL.
  * If you really don't have source text, you can pass a constant string,
  * perhaps "(query not available)".

Re: Centralizing protective copying of utility statements

От

Zhihong Yu

Дата:

17 июня 2021 г., 05:23:12

On Wed, Jun 16, 2021 at 6:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

I haven't pushed my quick-hack fix for bug #17053 ([1]) because
I wasn't really satisfied with band-aiding that problem in one
more place. I took a look around to see if we could protect
against the whole class of scribble-on-a-utility-statement
issues in a more centralized way.

What I found is that there are only two places that call
ProcessUtility with a statement that might be coming from the plan
cache. _SPI_execute_plan is always doing so, so it can just
unconditionally copy the statement. The other one is
PortalRunUtility, which can examine the Portal to see if the
parsetree came out of cache or not. Having added copyObject
calls there, we can get rid of the retail calls that exist
in not-quite-enough utility statement execution routines.

I think this would have been more complicated before plpgsql
started using the plancache; at least, some of the comments
removed here refer to plpgsql as being an independent hazard.
Also, I didn't risk removing any copyObject calls that are
further down than the top level of statement execution handlers.
Some of those are visibly still necessary, and others are hard
to be sure about.

Although this adds some overhead in the form of copying of
utility node trees that won't actually mutate during execution,
I think that won't be too bad because those trees tend to be
small and hence cheap to copy. The statements that can have
a lot of substructure usually contain expression trees or the
like, which do have to be copied for safety. Moreover, we buy
back a lot of cost by removing pointless copying when we're
not executing on a cached plan.

(BTW, in case you are wondering: this hazard only exists for
utility statements, because we long ago made the executor
not modify the Plan tree it's given.)

This is possibly too aggressive to consider for back-patching.
In the back branches, perhaps we should just use my original
localized fix. Another conservative (but expensive) answer
for the back branches is to add the new copyObject calls
but not remove any of the old ones.

Thoughts?

regards, tom lane

[1] https://www.postgresql.org/message-id/flat/17053-3ca3f501bbc212b4%40postgresql.org

Hi,

For back-patching, if we wait for a while (a few weeks) after this patch gets committed to master branch (and see there is no regression),

it seems that would give us more confidence in backporting to older branches.

Cheers

Re: Centralizing protective copying of utility statements

От

Tom Lane

Дата:

17 июня 2021 г., 18:30:10

I wrote:
> What I found is that there are only two places that call
> ProcessUtility with a statement that might be coming from the plan
> cache.  _SPI_execute_plan is always doing so, so it can just
> unconditionally copy the statement.  The other one is
> PortalRunUtility, which can examine the Portal to see if the
> parsetree came out of cache or not.  Having added copyObject
> calls there, we can get rid of the retail calls that exist
> in not-quite-enough utility statement execution routines.

In the light of morning, I'm not too pleased with this patch either.
It's essentially creating a silent API change for ProcessUtility.
I don't know whether there are any out-of-tree ProcessUtility
callers, but if there are, this could easily break them in a way
that basic testing might not catch.

What seems like a much safer answer is to make the API change noisy.
That is, move the responsibility for actually calling copyObject
into ProcessUtility itself, and add a bool parameter saying whether
that needs to be done.  That forces all callers to consider the
issue, and gives them the tool they need to make the right thing
happen.

However, this clearly is not a back-patchable approach.  I'm
thinking there are two plausible alternatives for the back branches:

1. Narrow fix as per my original patch for #17053.  Low cost,
but no protection against other bugs of the same ilk.

2. Still move the responsibility for calling copyObject into
ProcessUtility; but lacking the bool parameter, just do it
unconditionally.  Offers solid protection at some uncertain
performance cost.

I'm not terribly certain which way to go.  Thoughts?

            regards, tom lane

Re: Centralizing protective copying of utility statements

От

Tom Lane

Дата:

17 июня 2021 г., 20:03:29

I wrote:
> What seems like a much safer answer is to make the API change noisy.
> That is, move the responsibility for actually calling copyObject
> into ProcessUtility itself, and add a bool parameter saying whether
> that needs to be done.  That forces all callers to consider the
> issue, and gives them the tool they need to make the right thing
> happen.

Here's a v2 that does it like that.  In this formulation, we're
basically hoisting the responsibility for doing copyObject up into
ProcessUtility from its direct children, which seems like a clearer
way of thinking about what has to change.

We could avoid the side-effects on users of ProcessUtility_hook by
doing the copy step in ProcessUtility itself rather than passing the
flag on to standard_ProcessUtility.  But that sounded like a bit of a
kluge.  Also, putting the work in standard_ProcessUtility preserves
the option to redistribute it into the individual switch arms, in case
anyone does find the extra copying overhead annoying for statement
types that don't need it.  (I don't plan to do any such thing as part
of this bug-fix patch, though.)

Barring objections, I'm going to push this into HEAD fairly soon,
since beta2 is hard upon us.  Still thinking about which way to
fix it in the back branches.

            regards, tom lane

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 09433c8c96..07fe0e7cda 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -320,6 +320,7 @@ static void pgss_ExecutorRun(QueryDesc *queryDesc,
 static void pgss_ExecutorFinish(QueryDesc *queryDesc);
 static void pgss_ExecutorEnd(QueryDesc *queryDesc);
 static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
+                                bool readOnlyTree,
                                 ProcessUtilityContext context, ParamListInfo params,
                                 QueryEnvironment *queryEnv,
                                 DestReceiver *dest, QueryCompletion *qc);
@@ -1069,6 +1070,7 @@ pgss_ExecutorEnd(QueryDesc *queryDesc)
  */
 static void
 pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
+                    bool readOnlyTree,
                     ProcessUtilityContext context,
                     ParamListInfo params, QueryEnvironment *queryEnv,
                     DestReceiver *dest, QueryCompletion *qc)
@@ -1126,11 +1128,11 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
         PG_TRY();
         {
             if (prev_ProcessUtility)
-                prev_ProcessUtility(pstmt, queryString,
+                prev_ProcessUtility(pstmt, queryString, readOnlyTree,
                                     context, params, queryEnv,
                                     dest, qc);
             else
-                standard_ProcessUtility(pstmt, queryString,
+                standard_ProcessUtility(pstmt, queryString, readOnlyTree,
                                         context, params, queryEnv,
                                         dest, qc);
         }
@@ -1176,11 +1178,11 @@ pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
     else
     {
         if (prev_ProcessUtility)
-            prev_ProcessUtility(pstmt, queryString,
+            prev_ProcessUtility(pstmt, queryString, readOnlyTree,
                                 context, params, queryEnv,
                                 dest, qc);
         else
-            standard_ProcessUtility(pstmt, queryString,
+            standard_ProcessUtility(pstmt, queryString, readOnlyTree,
                                     context, params, queryEnv,
                                     dest, qc);
     }
diff --git a/contrib/sepgsql/hooks.c b/contrib/sepgsql/hooks.c
index 34de6158d6..19a3ffb7ff 100644
--- a/contrib/sepgsql/hooks.c
+++ b/contrib/sepgsql/hooks.c
@@ -313,6 +313,7 @@ sepgsql_exec_check_perms(List *rangeTabls, bool abort)
 static void
 sepgsql_utility_command(PlannedStmt *pstmt,
                         const char *queryString,
+                        bool readOnlyTree,
                         ProcessUtilityContext context,
                         ParamListInfo params,
                         QueryEnvironment *queryEnv,
@@ -378,11 +379,11 @@ sepgsql_utility_command(PlannedStmt *pstmt,
         }

         if (next_ProcessUtility_hook)
-            (*next_ProcessUtility_hook) (pstmt, queryString,
+            (*next_ProcessUtility_hook) (pstmt, queryString, readOnlyTree,
                                          context, params, queryEnv,
                                          dest, qc);
         else
-            standard_ProcessUtility(pstmt, queryString,
+            standard_ProcessUtility(pstmt, queryString, readOnlyTree,
                                     context, params, queryEnv,
                                     dest, qc);
     }
diff --git a/src/backend/commands/copyto.c b/src/backend/commands/copyto.c
index 89a4f8f810..b6eacd5baa 100644
--- a/src/backend/commands/copyto.c
+++ b/src/backend/commands/copyto.c
@@ -438,14 +438,8 @@ BeginCopyTo(ParseState *pstate,
         /*
          * Run parse analysis and rewrite.  Note this also acquires sufficient
          * locks on the source table(s).
-         *
-         * Because the parser and planner tend to scribble on their input, we
-         * make a preliminary copy of the source querytree.  This prevents
-         * problems in the case that the COPY is in a portal or plpgsql
-         * function and is executed repeatedly.  (See also the same hack in
-         * DECLARE CURSOR and PREPARE.)  XXX FIXME someday.
          */
-        rewritten = pg_analyze_and_rewrite(copyObject(raw_query),
+        rewritten = pg_analyze_and_rewrite(raw_query,
                                            pstate->p_sourcetext, NULL, 0,
                                            NULL);

diff --git a/src/backend/commands/createas.c b/src/backend/commands/createas.c
index dce882012e..0982851715 100644
--- a/src/backend/commands/createas.c
+++ b/src/backend/commands/createas.c
@@ -299,14 +299,8 @@ ExecCreateTableAs(ParseState *pstate, CreateTableAsStmt *stmt,
          * rewriter.  We do not do AcquireRewriteLocks: we assume the query
          * either came straight from the parser, or suitable locks were
          * acquired by plancache.c.
-         *
-         * Because the rewriter and planner tend to scribble on the input, we
-         * make a preliminary copy of the source querytree.  This prevents
-         * problems in the case that CTAS is in a portal or plpgsql function
-         * and is executed repeatedly.  (See also the same hack in EXPLAIN and
-         * PREPARE.)
          */
-        rewritten = QueryRewrite(copyObject(query));
+        rewritten = QueryRewrite(query);

         /* SELECT should never rewrite to more or less than one SELECT query */
         if (list_length(rewritten) != 1)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 9a60865d19..e81b990092 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -256,14 +256,8 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
      * rewriter.  We do not do AcquireRewriteLocks: we assume the query either
      * came straight from the parser, or suitable locks were acquired by
      * plancache.c.
-     *
-     * Because the rewriter and planner tend to scribble on the input, we make
-     * a preliminary copy of the source querytree.  This prevents problems in
-     * the case that the EXPLAIN is in a portal or plpgsql function and is
-     * executed repeatedly.  (See also the same hack in DECLARE CURSOR and
-     * PREPARE.)  XXX FIXME someday.
      */
-    rewritten = QueryRewrite(castNode(Query, copyObject(stmt->query)));
+    rewritten = QueryRewrite(castNode(Query, stmt->query));

     /* emit opening boilerplate */
     ExplainBeginOutput(es);
@@ -427,7 +421,8 @@ ExplainOneQuery(Query *query, int cursorOptions,
  * "into" is NULL unless we are explaining the contents of a CreateTableAsStmt.
  *
  * This is exported because it's called back from prepare.c in the
- * EXPLAIN EXECUTE case.
+ * EXPLAIN EXECUTE case.  In that case, we'll be dealing with a statement
+ * that's in the plan cache, so we have to ensure we don't modify it.
  */
 void
 ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
@@ -441,8 +436,7 @@ ExplainOneUtility(Node *utilityStmt, IntoClause *into, ExplainState *es,
     {
         /*
          * We have to rewrite the contained SELECT and then pass it back to
-         * ExplainOneQuery.  It's probably not really necessary to copy the
-         * contained parsetree another time, but let's be safe.
+         * ExplainOneQuery.  Copy to be safe in the EXPLAIN EXECUTE case.
          */
         CreateTableAsStmt *ctas = (CreateTableAsStmt *) utilityStmt;
         List       *rewritten;
diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c
index 008505368c..41857feda9 100644
--- a/src/backend/commands/extension.c
+++ b/src/backend/commands/extension.c
@@ -786,6 +786,7 @@ execute_sql_string(const char *sql)

                 ProcessUtility(stmt,
                                sql,
+                               false,
                                PROCESS_UTILITY_QUERY,
                                NULL,
                                NULL,
diff --git a/src/backend/commands/foreigncmds.c b/src/backend/commands/foreigncmds.c
index eb7103fd3b..bc36311d38 100644
--- a/src/backend/commands/foreigncmds.c
+++ b/src/backend/commands/foreigncmds.c
@@ -1570,8 +1570,7 @@ ImportForeignSchema(ImportForeignSchemaStmt *stmt)
             pstmt->stmt_len = rs->stmt_len;

             /* Execute statement */
-            ProcessUtility(pstmt,
-                           cmd,
+            ProcessUtility(pstmt, cmd, false,
                            PROCESS_UTILITY_SUBCOMMAND, NULL, NULL,
                            None_Receiver, NULL);

diff --git a/src/backend/commands/policy.c b/src/backend/commands/policy.c
index 5cacc088cd..5d469309ce 100644
--- a/src/backend/commands/policy.c
+++ b/src/backend/commands/policy.c
@@ -747,12 +747,12 @@ CreatePolicy(CreatePolicyStmt *stmt)
     addNSItemToQuery(with_check_pstate, nsitem, false, true, true);

     qual = transformWhereClause(qual_pstate,
-                                copyObject(stmt->qual),
+                                stmt->qual,
                                 EXPR_KIND_POLICY,
                                 "POLICY");

     with_check_qual = transformWhereClause(with_check_pstate,
-                                           copyObject(stmt->with_check),
+                                           stmt->with_check,
                                            EXPR_KIND_POLICY,
                                            "POLICY");

@@ -922,7 +922,7 @@ AlterPolicy(AlterPolicyStmt *stmt)

         addNSItemToQuery(qual_pstate, nsitem, false, true, true);

-        qual = transformWhereClause(qual_pstate, copyObject(stmt->qual),
+        qual = transformWhereClause(qual_pstate, stmt->qual,
                                     EXPR_KIND_POLICY,
                                     "POLICY");

@@ -946,7 +946,7 @@ AlterPolicy(AlterPolicyStmt *stmt)
         addNSItemToQuery(with_check_pstate, nsitem, false, true, true);

         with_check_qual = transformWhereClause(with_check_pstate,
-                                               copyObject(stmt->with_check),
+                                               stmt->with_check,
                                                EXPR_KIND_POLICY,
                                                "POLICY");

diff --git a/src/backend/commands/portalcmds.c b/src/backend/commands/portalcmds.c
index d34cc39fde..3ea30bcbc9 100644
--- a/src/backend/commands/portalcmds.c
+++ b/src/backend/commands/portalcmds.c
@@ -76,14 +76,8 @@ PerformCursorOpen(ParseState *pstate, DeclareCursorStmt *cstmt, ParamListInfo pa
      * rewriter.  We do not do AcquireRewriteLocks: we assume the query either
      * came straight from the parser, or suitable locks were acquired by
      * plancache.c.
-     *
-     * Because the rewriter and planner tend to scribble on the input, we make
-     * a preliminary copy of the source querytree.  This prevents problems in
-     * the case that the DECLARE CURSOR is in a portal or plpgsql function and
-     * is executed repeatedly.  (See also the same hack in EXPLAIN and
-     * PREPARE.)  XXX FIXME someday.
      */
-    rewritten = QueryRewrite((Query *) copyObject(query));
+    rewritten = QueryRewrite(query);

     /* SELECT should never rewrite to more or less than one query */
     if (list_length(rewritten) != 1)
diff --git a/src/backend/commands/prepare.c b/src/backend/commands/prepare.c
index 65d8ac65b1..5e03c7c5aa 100644
--- a/src/backend/commands/prepare.c
+++ b/src/backend/commands/prepare.c
@@ -78,12 +78,9 @@ PrepareQuery(ParseState *pstate, PrepareStmt *stmt,
     /*
      * Need to wrap the contained statement in a RawStmt node to pass it to
      * parse analysis.
-     *
-     * Because parse analysis scribbles on the raw querytree, we must make a
-     * copy to ensure we don't modify the passed-in tree.  FIXME someday.
      */
     rawstmt = makeNode(RawStmt);
-    rawstmt->stmt = (Node *) copyObject(stmt->query);
+    rawstmt->stmt = stmt->query;
     rawstmt->stmt_location = stmt_location;
     rawstmt->stmt_len = stmt_len;

diff --git a/src/backend/commands/schemacmds.c b/src/backend/commands/schemacmds.c
index a60eb161e4..6c6ab9ee34 100644
--- a/src/backend/commands/schemacmds.c
+++ b/src/backend/commands/schemacmds.c
@@ -191,6 +191,7 @@ CreateSchemaCommand(CreateSchemaStmt *stmt, const char *queryString,
         /* do this step */
         ProcessUtility(wrapper,
                        queryString,
+                       false,
                        PROCESS_UTILITY_SUBCOMMAND,
                        NULL,
                        NULL,
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 028e8ac46b..4e23c7fce5 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -4408,8 +4408,7 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
      * Copy the original subcommand for each table.  This avoids conflicts
      * when different child tables need to make different parse
      * transformations (for example, the same column may have different column
-     * numbers in different children).  It also ensures that we don't corrupt
-     * the original parse tree, in case it is saved in plancache.
+     * numbers in different children).
      */
     cmd = copyObject(cmd);

diff --git a/src/backend/commands/view.c b/src/backend/commands/view.c
index f2642dba6c..4df05a0b33 100644
--- a/src/backend/commands/view.c
+++ b/src/backend/commands/view.c
@@ -417,12 +417,9 @@ DefineView(ViewStmt *stmt, const char *queryString,
     /*
      * Run parse analysis to convert the raw parse tree to a Query.  Note this
      * also acquires sufficient locks on the source table(s).
-     *
-     * Since parse analysis scribbles on its input, copy the raw parse tree;
-     * this ensures we don't corrupt a prepared statement, for example.
      */
     rawstmt = makeNode(RawStmt);
-    rawstmt->stmt = (Node *) copyObject(stmt->query);
+    rawstmt->stmt = stmt->query;
     rawstmt->stmt_location = stmt_location;
     rawstmt->stmt_len = stmt_len;

diff --git a/src/backend/executor/functions.c b/src/backend/executor/functions.c
index e2ea51aafe..296e54e60a 100644
--- a/src/backend/executor/functions.c
+++ b/src/backend/executor/functions.c
@@ -886,6 +886,7 @@ postquel_getnext(execution_state *es, SQLFunctionCachePtr fcache)
     {
         ProcessUtility(es->qd->plannedstmt,
                        fcache->src,
+                       false,
                        PROCESS_UTILITY_QUERY,
                        es->qd->params,
                        es->qd->queryEnv,
diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index b8bd05e894..c0a4f58f02 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -2545,6 +2545,7 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI,
                 InitializeQueryCompletion(&qc);
                 ProcessUtility(stmt,
                                plansource->query_string,
+                               true,
                                context,
                                paramLI,
                                _SPI_current->queryEnv,
diff --git a/src/backend/parser/parse_utilcmd.c b/src/backend/parser/parse_utilcmd.c
index c9708e38f4..81d3e7990c 100644
--- a/src/backend/parser/parse_utilcmd.c
+++ b/src/backend/parser/parse_utilcmd.c
@@ -11,10 +11,6 @@
  * Hence these functions are now called at the start of execution of their
  * respective utility commands.
  *
- * NOTE: in general we must avoid scribbling on the passed-in raw parse
- * tree, since it might be in a plan cache.  The simplest solution is
- * a quick copyObject() call before manipulating the query tree.
- *
  *
  * Portions Copyright (c) 1996-2021, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
@@ -177,12 +173,6 @@ transformCreateStmt(CreateStmt *stmt, const char *queryString)
     Oid            existing_relid;
     ParseCallbackState pcbstate;

-    /*
-     * We must not scribble on the passed-in CreateStmt, so copy it.  (This is
-     * overkill, but easy.)
-     */
-    stmt = copyObject(stmt);
-
     /* Set up pstate */
     pstate = make_parsestate(NULL);
     pstate->p_sourcetext = queryString;
@@ -2824,12 +2814,6 @@ transformIndexStmt(Oid relid, IndexStmt *stmt, const char *queryString)
     if (stmt->transformed)
         return stmt;

-    /*
-     * We must not scribble on the passed-in IndexStmt, so copy it.  (This is
-     * overkill, but easy.)
-     */
-    stmt = copyObject(stmt);
-
     /* Set up pstate */
     pstate = make_parsestate(NULL);
     pstate->p_sourcetext = queryString;
@@ -2925,12 +2909,6 @@ transformStatsStmt(Oid relid, CreateStatsStmt *stmt, const char *queryString)
     if (stmt->transformed)
         return stmt;

-    /*
-     * We must not scribble on the passed-in CreateStatsStmt, so copy it.
-     * (This is overkill, but easy.)
-     */
-    stmt = copyObject(stmt);
-
     /* Set up pstate */
     pstate = make_parsestate(NULL);
     pstate->p_sourcetext = queryString;
@@ -2993,9 +2971,6 @@ transformStatsStmt(Oid relid, CreateStatsStmt *stmt, const char *queryString)
  *
  * actions and whereClause are output parameters that receive the
  * transformed results.
- *
- * Note that we must not scribble on the passed-in RuleStmt, so we do
- * copyObject() on the actions and WHERE clause.
  */
 void
 transformRuleStmt(RuleStmt *stmt, const char *queryString,
@@ -3070,7 +3045,7 @@ transformRuleStmt(RuleStmt *stmt, const char *queryString,

     /* take care of the where clause */
     *whereClause = transformWhereClause(pstate,
-                                        (Node *) copyObject(stmt->whereClause),
+                                        stmt->whereClause,
                                         EXPR_KIND_WHERE,
                                         "WHERE");
     /* we have to fix its collations too */
@@ -3142,8 +3117,7 @@ transformRuleStmt(RuleStmt *stmt, const char *queryString,
             addNSItemToQuery(sub_pstate, newnsitem, false, true, false);

             /* Transform the rule action statement */
-            top_subqry = transformStmt(sub_pstate,
-                                       (Node *) copyObject(action));
+            top_subqry = transformStmt(sub_pstate, action);

             /*
              * We cannot support utility-statement actions (eg NOTIFY) with
@@ -3325,12 +3299,6 @@ transformAlterTableStmt(Oid relid, AlterTableStmt *stmt,
     AlterTableCmd *newcmd;
     ParseNamespaceItem *nsitem;

-    /*
-     * We must not scribble on the passed-in AlterTableStmt, so copy it. (This
-     * is overkill, but easy.)
-     */
-    stmt = copyObject(stmt);
-
     /* Caller is responsible for locking the relation */
     rel = relation_open(relid, NoLock);
     tupdesc = RelationGetDescr(rel);
diff --git a/src/backend/tcop/pquery.c b/src/backend/tcop/pquery.c
index f7f08e6c05..920ac50baf 100644
--- a/src/backend/tcop/pquery.c
+++ b/src/backend/tcop/pquery.c
@@ -1144,8 +1144,13 @@ PortalRunUtility(Portal portal, PlannedStmt *pstmt,
     else
         portal->portalSnapshot = NULL;

+    /*
+     * Run it.  We tell ProcessUtility that the node tree is read-only if it
+     * came from a cached plan.
+     */
     ProcessUtility(pstmt,
                    portal->sourceText,
+                   (portal->cplan != NULL),
                    isTopLevel ? PROCESS_UTILITY_TOPLEVEL : PROCESS_UTILITY_QUERY,
                    portal->portalParams,
                    portal->queryEnv,
diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index 1a8fc16773..d00189fe32 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -476,6 +476,7 @@ CheckRestrictedOperation(const char *cmdname)
  *
  *    pstmt: PlannedStmt wrapper for the utility statement
  *    queryString: original source text of command
+ *    readOnlyTree: treat pstmt's node tree as read-only
  *    context: identifies source of statement (toplevel client command,
  *        non-toplevel client command, subcommand of a larger utility command)
  *    params: parameters to use during execution
@@ -501,6 +502,7 @@ CheckRestrictedOperation(const char *cmdname)
 void
 ProcessUtility(PlannedStmt *pstmt,
                const char *queryString,
+               bool readOnlyTree,
                ProcessUtilityContext context,
                ParamListInfo params,
                QueryEnvironment *queryEnv,
@@ -518,11 +520,11 @@ ProcessUtility(PlannedStmt *pstmt,
      * call standard_ProcessUtility().
      */
     if (ProcessUtility_hook)
-        (*ProcessUtility_hook) (pstmt, queryString,
+        (*ProcessUtility_hook) (pstmt, queryString, readOnlyTree,
                                 context, params, queryEnv,
                                 dest, qc);
     else
-        standard_ProcessUtility(pstmt, queryString,
+        standard_ProcessUtility(pstmt, queryString, readOnlyTree,
                                 context, params, queryEnv,
                                 dest, qc);
 }
@@ -541,13 +543,14 @@ ProcessUtility(PlannedStmt *pstmt,
 void
 standard_ProcessUtility(PlannedStmt *pstmt,
                         const char *queryString,
+                        bool readOnlyTree,
                         ProcessUtilityContext context,
                         ParamListInfo params,
                         QueryEnvironment *queryEnv,
                         DestReceiver *dest,
                         QueryCompletion *qc)
 {
-    Node       *parsetree = pstmt->utilityStmt;
+    Node       *parsetree;
     bool        isTopLevel = (context == PROCESS_UTILITY_TOPLEVEL);
     bool        isAtomicContext = (!(context == PROCESS_UTILITY_TOPLEVEL || context ==
PROCESS_UTILITY_QUERY_NONATOMIC)|| IsTransactionBlock()); 
     ParseState *pstate;
@@ -556,6 +559,18 @@ standard_ProcessUtility(PlannedStmt *pstmt,
     /* This can recurse, so check for excessive recursion */
     check_stack_depth();

+    /*
+     * If the given node tree is read-only, make a copy to ensure that parse
+     * transformations don't damage the original tree.  This could be
+     * refactored to avoid making unnecessary copies in more cases, but it's
+     * not clear that it's worth a great deal of trouble over.  Statements
+     * that are complex enough to be expensive to copy are exactly the ones
+     * we'd need to copy, so that only marginal savings seem possible.
+     */
+    if (readOnlyTree)
+        pstmt = copyObject(pstmt);
+    parsetree = pstmt->utilityStmt;
+
     /* Prohibit read/write commands in read-only states. */
     readonly_flags = ClassifyUtilityCommandAsReadOnly(parsetree);
     if (readonly_flags != COMMAND_IS_STRICTLY_READ_ONLY &&
@@ -1211,6 +1226,7 @@ ProcessUtilitySlow(ParseState *pstate,

                             ProcessUtility(wrapper,
                                            queryString,
+                                           false,
                                            PROCESS_UTILITY_SUBCOMMAND,
                                            params,
                                            NULL,
@@ -1918,6 +1934,7 @@ ProcessUtilityForAlterTable(Node *stmt, AlterTableUtilityContext *context)

     ProcessUtility(wrapper,
                    context->queryString,
+                   false,
                    PROCESS_UTILITY_SUBCOMMAND,
                    context->params,
                    context->queryEnv,
diff --git a/src/include/tcop/utility.h b/src/include/tcop/utility.h
index 841062b4b3..212e9b3280 100644
--- a/src/include/tcop/utility.h
+++ b/src/include/tcop/utility.h
@@ -69,17 +69,21 @@ typedef struct AlterTableUtilityContext

 /* Hook for plugins to get control in ProcessUtility() */
 typedef void (*ProcessUtility_hook_type) (PlannedStmt *pstmt,
-                                          const char *queryString, ProcessUtilityContext context,
+                                          const char *queryString,
+                                          bool readOnlyTree,
+                                          ProcessUtilityContext context,
                                           ParamListInfo params,
                                           QueryEnvironment *queryEnv,
                                           DestReceiver *dest, QueryCompletion *qc);
 extern PGDLLIMPORT ProcessUtility_hook_type ProcessUtility_hook;

 extern void ProcessUtility(PlannedStmt *pstmt, const char *queryString,
+                           bool readOnlyTree,
                            ProcessUtilityContext context, ParamListInfo params,
                            QueryEnvironment *queryEnv,
                            DestReceiver *dest, QueryCompletion *qc);
 extern void standard_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
+                                    bool readOnlyTree,
                                     ProcessUtilityContext context, ParamListInfo params,
                                     QueryEnvironment *queryEnv,
                                     DestReceiver *dest, QueryCompletion *qc);

Re: Centralizing protective copying of utility statements

От

Julien Rouhaud

Дата:

17 июня 2021 г., 21:00:55

On Thu, Jun 17, 2021 at 01:03:29PM -0400, Tom Lane wrote:
> 
> Here's a v2 that does it like that.  In this formulation, we're
> basically hoisting the responsibility for doing copyObject up into
> ProcessUtility from its direct children, which seems like a clearer
> way of thinking about what has to change.

I agree that forcing an API break is better.  Just a nit:

+ *    readOnlyTree: treat pstmt's node tree as read-only

Maybe it's because I'm not a native english speaker, or because it's quite
late here, but I don't find "treat as read-only" really clear.  I don't have a
concise better wording to suggest.

> Still thinking about which way to fix it in the back branches.

I'm +0.5 for a narrow fix, due to the possibility of unspotted similar problem
vs possibility of performance regression ratio.

Re: Centralizing protective copying of utility statements

От

Andres Freund

Дата:

17 июня 2021 г., 22:23:00

Hi,

On 2021-06-16 21:39:49 -0400, Tom Lane wrote:
> Although this adds some overhead in the form of copying of
> utility node trees that won't actually mutate during execution,
> I think that won't be too bad because those trees tend to be
> small and hence cheap to copy.  The statements that can have
> a lot of substructure usually contain expression trees or the
> like, which do have to be copied for safety.  Moreover, we buy
> back a lot of cost by removing pointless copying when we're
> not executing on a cached plan.

Have you evaluated the cost in some form? I don't think it a relevant
cost for most utility statements, but there's a few exceptions that *do*
worry me. In particular, in some workloads transaction statements are
very frequent.

Greetings,

Andres Freund

Re: Centralizing protective copying of utility statements

От

Andres Freund

Дата:

17 июня 2021 г., 22:25:43

Hi,

On 2021-06-17 13:03:29 -0400, Tom Lane wrote:
> Here's a v2 that does it like that.  In this formulation, we're
> basically hoisting the responsibility for doing copyObject up into
> ProcessUtility from its direct children, which seems like a clearer
> way of thinking about what has to change.
> 
> We could avoid the side-effects on users of ProcessUtility_hook by
> doing the copy step in ProcessUtility itself rather than passing the
> flag on to standard_ProcessUtility.  But that sounded like a bit of a
> kluge.  Also, putting the work in standard_ProcessUtility preserves
> the option to redistribute it into the individual switch arms, in case
> anyone does find the extra copying overhead annoying for statement
> types that don't need it.  (I don't plan to do any such thing as part
> of this bug-fix patch, though.)
> 
> Barring objections, I'm going to push this into HEAD fairly soon,
> since beta2 is hard upon us.  Still thinking about which way to
> fix it in the back branches.

Phew. Do we really want to break a quite significant number of
extensions this long after feature freeze? Since we already need to find
a backpatchable way to deal with the issue it seems like deferring the
API change to 15 might be prudent?

Greetings,

Andres Freund

Re: Centralizing protective copying of utility statements

От

Tom Lane

Дата:

17 июня 2021 г., 22:53:22

Andres Freund <andres@anarazel.de> writes:
> Phew. Do we really want to break a quite significant number of
> extensions this long after feature freeze? Since we already need to find
> a backpatchable way to deal with the issue it seems like deferring the
> API change to 15 might be prudent?

Uh, nobody ever promised that server-internal APIs are frozen as of beta1;
that would be a horrid crimp on our ability to fix bugs during beta.
I've generally supposed that we don't start expecting that till RC stage.

            regards, tom lane

Re: Centralizing protective copying of utility statements

От

Andres Freund

Дата:

17 июня 2021 г., 23:17:22

Hi,

On 2021-06-17 15:53:22 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > Phew. Do we really want to break a quite significant number of
> > extensions this long after feature freeze? Since we already need to find
> > a backpatchable way to deal with the issue it seems like deferring the
> > API change to 15 might be prudent?
> 
> Uh, nobody ever promised that server-internal APIs are frozen as of beta1;
> that would be a horrid crimp on our ability to fix bugs during beta.

Sure, there's no promise. But I still think it's worth taking the amount
of breakage more into account than pre beta?

Greetings,

Andres Freund

Re: Centralizing protective copying of utility statements

От

Tom Lane

Дата:

17 июня 2021 г., 23:36:22

Andres Freund <andres@anarazel.de> writes:
> On 2021-06-16 21:39:49 -0400, Tom Lane wrote:
>> Although this adds some overhead in the form of copying of
>> utility node trees that won't actually mutate during execution,
>> I think that won't be too bad because those trees tend to be
>> small and hence cheap to copy.

> Have you evaluated the cost in some form? I don't think it a relevant
> cost for most utility statements, but there's a few exceptions that *do*
> worry me. In particular, in some workloads transaction statements are
> very frequent.

I hadn't, but since you mention it, I tried this test case:

$ cat trivial.sql 
begin;
commit;
$ pgbench -n -M prepared -f trivial.sql -T 60

I got these results on HEAD:
tps = 23853.244130 (without initial connection time)
tps = 23810.759969 (without initial connection time)
tps = 23167.608493 (without initial connection time)
tps = 23784.432746 (without initial connection time)

and adding the v2 patch:
tps = 23298.081147 (without initial connection time)
tps = 23614.466755 (without initial connection time)
tps = 23475.297853 (without initial connection time)
tps = 23530.826527 (without initial connection time)

So if you squint there might be a sub-one-percent difference
there, but it's barely distinguishable from the noise.  In
any situation where the transactions are doing actual work,
I doubt you could measure a difference.

(In any case, if someone does get excited about this, they
could rearrange things to push the copyObject calls into the
individual arms of the switch in ProcessUtility.  Personally
though I doubt it could be worth the code bloat.)

            regards, tom lane

Re: Centralizing protective copying of utility statements

От

Tom Lane

Дата:

17 июня 2021 г., 23:50:57

Andres Freund <andres@anarazel.de> writes:
> On 2021-06-17 15:53:22 -0400, Tom Lane wrote:
>> Uh, nobody ever promised that server-internal APIs are frozen as of beta1;
>> that would be a horrid crimp on our ability to fix bugs during beta.

> Sure, there's no promise. But I still think it's worth taking the amount
> of breakage more into account than pre beta?

Are there really so many people using the ProcessUtility hook?
In a quick look on codesearch.debian.net, I found

hypopg
pgaudit
pgextwlist
pglogical

which admittedly is more than none, but it's not a huge number
either.  I have to think that fixing this bug reliably is a
more important consideration.

            regards, tom lane

Re: Centralizing protective copying of utility statements

От

Andres Freund

Дата:

18 июня 2021 г., 00:08:34

Hi,

On 2021-06-17 16:50:57 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2021-06-17 15:53:22 -0400, Tom Lane wrote:
> >> Uh, nobody ever promised that server-internal APIs are frozen as of beta1;
> >> that would be a horrid crimp on our ability to fix bugs during beta.
> 
> > Sure, there's no promise. But I still think it's worth taking the amount
> > of breakage more into account than pre beta?
> 
> Are there really so many people using the ProcessUtility hook?
> In a quick look on codesearch.debian.net, I found
> 
> hypopg
> pgaudit
> pgextwlist
> pglogical

The do seem to be quite a few more outside of Debian. E.g.
https://github.com/search?p=2&q=ProcessUtility_hook&type=Code
shows quite a few.

Unfortunately github is annoying to search through - it doesn't seem to
have any logic to prevent duplicates across repositories :(. Which means
there's dozens of copies of postgres code included...

> which admittedly is more than none, but it's not a huge number
> either.  I have to think that fixing this bug reliably is a
> more important consideration.

Sure!

Greetings,

Andres Freund

Re: Centralizing protective copying of utility statements

От

Tom Lane

Дата:

18 июня 2021 г., 00:11:26

I wrote:
> (In any case, if someone does get excited about this, they
> could rearrange things to push the copyObject calls into the
> individual arms of the switch in ProcessUtility.  Personally
> though I doubt it could be worth the code bloat.)

It occurred to me to try making the copying code look like

    if (readOnlyTree)
    {
        switch (nodeTag(parsetree))
        {
            case T_TransactionStmt:
                /* stmt is immutable anyway, no need to copy */
                break;
            default:
                pstmt = copyObject(pstmt);
                parsetree = pstmt->utilityStmt;
                break;
        }
    }

This didn't move the needle at all, in fact it seems maybe a
shade slower:

tps = 23502.288878 (without initial connection time)
tps = 23643.821923 (without initial connection time)
tps = 23082.976795 (without initial connection time)
tps = 23547.527641 (without initial connection time)

So I think this confirms my gut feeling that copyObject on a
TransactionStmt is negligible.  To the extent that the prior
measurement shows a real difference, it's probably a chance effect
of changing code layout elsewhere.

            regards, tom lane

Re: Centralizing protective copying of utility statements

От

Michael Paquier

Дата:

18 июня 2021 г., 03:57:10

On Thu, Jun 17, 2021 at 02:08:34PM -0700, Andres Freund wrote:
> Unfortunately github is annoying to search through - it doesn't seem to
> have any logic to prevent duplicates across repositories :(. Which means
> there's dozens of copies of postgres code included...

I agree with the position of doing something now while in beta.  I
have not looked at the tree, but I am rather sure that we had changes
in the hooks while in beta phase in the past.

>> which admittedly is more than none, but it's not a huge number
>> either.  I have to think that fixing this bug reliably is a
>> more important consideration.
>
> Sure!

The DECLARE CURSOR case in ExplainOneUtility() does a copy of a Query.
Perhaps a comment should be added to explain why a copy is still
required?
--
Michael

Вложения

signature.asc

Re: Centralizing protective copying of utility statements

От

Tom Lane

Дата:

18 июня 2021 г., 05:26:00

Michael Paquier <michael@paquier.xyz> writes:
> The DECLARE CURSOR case in ExplainOneUtility() does a copy of a Query.
> Perhaps a comment should be added to explain why a copy is still
> required?

I did add a comment about that in the v2 patch --- the issue is the
call path for EXPLAIN EXECUTE.

            regards, tom lane

Re: Centralizing protective copying of utility statements

От

Tom Lane

Дата:

18 июня 2021 г., 17:24:20

Julien Rouhaud <rjuju123@gmail.com> writes:
> On Thu, Jun 17, 2021 at 01:03:29PM -0400, Tom Lane wrote:
> + *    readOnlyTree: treat pstmt's node tree as read-only

> Maybe it's because I'm not a native english speaker, or because it's quite
> late here, but I don't find "treat as read-only" really clear.  I don't have a
> concise better wording to suggest.

Maybe "if true, pstmt's node tree must not be modified" ?

>> Still thinking about which way to fix it in the back branches.

> I'm +0.5 for a narrow fix, due to the possibility of unspotted similar problem
> vs possibility of performance regression ratio.

After sleeping on it another day, I'm inclined to think the same.  The
key attraction of a centralized fix is that it prevents the possibility
of new bugs of the same ilk in newly-added features.  Given how long
these CREATE/ALTER DOMAIN bugs escaped detection, it's hard to have
full confidence that there are no others in the back branches --- but
they must be in really lightly-used features.

            regards, tom lane

Re: Centralizing protective copying of utility statements

От

Julien Rouhaud

Дата:

18 июня 2021 г., 18:15:45

On Fri, Jun 18, 2021 at 10:24:20AM -0400, Tom Lane wrote:
> Julien Rouhaud <rjuju123@gmail.com> writes:
> > On Thu, Jun 17, 2021 at 01:03:29PM -0400, Tom Lane wrote:
> > + *    readOnlyTree: treat pstmt's node tree as read-only
> 
> > Maybe it's because I'm not a native english speaker, or because it's quite
> > late here, but I don't find "treat as read-only" really clear.  I don't have a
> > concise better wording to suggest.
> 
> Maybe "if true, pstmt's node tree must not be modified" ?

Thanks, I find it way better!

Re: Centralizing protective copying of utility statements

От

Tom Lane

Дата:

18 июня 2021 г., 18:24:00

Julien Rouhaud <rjuju123@gmail.com> writes:
> On Fri, Jun 18, 2021 at 10:24:20AM -0400, Tom Lane wrote:
>> Maybe "if true, pstmt's node tree must not be modified" ?

> Thanks, I find it way better!

OK, pushed that way, and with a couple other comment tweaks from
an additional re-reading.

            regards, tom lane

Re: Centralizing protective copying of utility statements

От

Julien Rouhaud

Дата:

19 июня 2021 г., 07:49:07

On Fri, Jun 18, 2021 at 11:24:00AM -0400, Tom Lane wrote:
> Julien Rouhaud <rjuju123@gmail.com> writes:
> > On Fri, Jun 18, 2021 at 10:24:20AM -0400, Tom Lane wrote:
> >> Maybe "if true, pstmt's node tree must not be modified" ?
> 
> > Thanks, I find it way better!
> 
> OK, pushed that way, and with a couple other comment tweaks from
> an additional re-reading.

Thanks!  For the record I already pushed the required compatibility changes for
hypopg extension.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Centralizing protective copying of utility statements

Вложения