Fixing target-column indirection in INSERT with multiple VALUES

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Fixing target-column indirection in INSERT with multiple VALUES
Дата
Msg-id 9578.1469645245@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: Fixing target-column indirection in INSERT with multiple VALUES  (Kevin Grittner <kgrittn@gmail.com>)
Список pgsql-hackers
I looked into the problem reported in bug #14265,
https://www.postgresql.org/message-id/20160727005725.7438.26021@wrigleys.postgresql.org

The core of the problem is a design decision that seems pretty bad in
hindsight: if we have array-subscripting or field-selection decoration
in the target column list of an INSERT with multiple VALUES rows, then
we apply transformAssignedExpr() to each element of each VALUES row
independently.  So for example in
    INSERT INTO foo (col[1]) VALUES (1), (2), (3)
there are going to be three identical ArrayRef nodes (with different
input expressions) inside the VALUES RTE, and then just a Var referencing
the VALUES RTE in the top-level targetlist.  While that works, it's not
so good when we have
    INSERT INTO foo (col[1], col[2]) VALUES (1, 2), (3, 4)
because the rewriter needs to merge the assignments to "col" by nesting
the ArrayRefs together, but it fails to find the ArrayRefs in the
top-level targetlist, leading to the complained-of error.

Other reasons not to like this design are the space taken by the redundant
copies of the ArrayRef nodes, and the ugly logic needed in ruleutils.c
to deal with this situation, which is unlike either the INSERT-with-
single-VALUES-row or the INSERT-SELECT cases.

So attached is a proposed patch that fixes this bug by moving the ArrayRef
and/or FieldStore nodes associated with the column target list into the
top-level targetlist.  I still have it calling transformAssignedExpr
for each VALUES expression, which means that the parser still generates
(and then throws away) ArrayRef/FieldStore nodes for each VALUES row when
there is indirection decoration in the target list.  It would probably be
possible to avoid that, but it would take very invasive refactoring of
transformAssignmentIndirection and allied routines, and I'm dubious that
the case comes up often enough to be worth complicating that code even
more.

Another issue is that this would require an initdb because it changes the
representation of stored rules for cases like this, which means we could
only fix it in HEAD and there wouldn't be a back-patch.  Given that it's
been like this forever and we've not had complaints before, I think that's
acceptable.

Conceivably we could fix this without initdb-forcing changes in the
parser's output, if we taught the rewriter to apply rewriteTargetListIU
to each row of the VALUES RTE rather than to the top-level targetlist.
But that would be a complex mess for multiple reasons --- for instance,
it would change the output column set for the RTE.  So I'm not at all
excited about pursuing that path.

In short, I propose applying the attached to HEAD but not attempting
to fix the issue in the back branches.

            regards, tom lane

diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 29c8c4e..eac86cc 100644
*** a/src/backend/parser/analyze.c
--- b/src/backend/parser/analyze.c
*************** post_parse_analyze_hook_type post_parse_
*** 51,57 ****
  static Query *transformDeleteStmt(ParseState *pstate, DeleteStmt *stmt);
  static Query *transformInsertStmt(ParseState *pstate, InsertStmt *stmt);
  static List *transformInsertRow(ParseState *pstate, List *exprlist,
!                    List *stmtcols, List *icolumns, List *attrnos);
  static OnConflictExpr *transformOnConflictClause(ParseState *pstate,
                            OnConflictClause *onConflictClause);
  static int    count_rowexpr_columns(ParseState *pstate, Node *expr);
--- 51,58 ----
  static Query *transformDeleteStmt(ParseState *pstate, DeleteStmt *stmt);
  static Query *transformInsertStmt(ParseState *pstate, InsertStmt *stmt);
  static List *transformInsertRow(ParseState *pstate, List *exprlist,
!                    List *stmtcols, List *icolumns, List *attrnos,
!                    bool strip_indirection);
  static OnConflictExpr *transformOnConflictClause(ParseState *pstate,
                            OnConflictClause *onConflictClause);
  static int    count_rowexpr_columns(ParseState *pstate, Node *expr);
*************** transformInsertStmt(ParseState *pstate,
*** 619,625 ****
          /* Prepare row for assignment to target table */
          exprList = transformInsertRow(pstate, exprList,
                                        stmt->cols,
!                                       icolumns, attrnos);
      }
      else if (list_length(selectStmt->valuesLists) > 1)
      {
--- 620,627 ----
          /* Prepare row for assignment to target table */
          exprList = transformInsertRow(pstate, exprList,
                                        stmt->cols,
!                                       icolumns, attrnos,
!                                       false);
      }
      else if (list_length(selectStmt->valuesLists) > 1)
      {
*************** transformInsertStmt(ParseState *pstate,
*** 663,672 ****
                                              exprLocation((Node *) sublist))));
              }

!             /* Prepare row for assignment to target table */
              sublist = transformInsertRow(pstate, sublist,
                                           stmt->cols,
!                                          icolumns, attrnos);

              /*
               * We must assign collations now because assign_query_collations
--- 665,684 ----
                                              exprLocation((Node *) sublist))));
              }

!             /*
!              * Prepare row for assignment to target table.  We process any
!              * indirection on the target column specs normally but then strip
!              * off the resulting field/array assignment nodes, since we don't
!              * want the parsed statement to contain copies of those in each
!              * VALUES row.  (It's annoying to have to transform the
!              * indirection specs over and over like this, but avoiding it
!              * would take some really messy refactoring of
!              * transformAssignmentIndirection.)
!              */
              sublist = transformInsertRow(pstate, sublist,
                                           stmt->cols,
!                                          icolumns, attrnos,
!                                          true);

              /*
               * We must assign collations now because assign_query_collations
*************** transformInsertStmt(ParseState *pstate,
*** 717,722 ****
--- 729,742 ----
           * Generate list of Vars referencing the RTE
           */
          expandRTE(rte, rtr->rtindex, 0, -1, false, NULL, &exprList);
+
+         /*
+          * Re-apply any indirection on the target column specs to the Vars
+          */
+         exprList = transformInsertRow(pstate, exprList,
+                                       stmt->cols,
+                                       icolumns, attrnos,
+                                       false);
      }
      else
      {
*************** transformInsertStmt(ParseState *pstate,
*** 739,745 ****
          /* Prepare row for assignment to target table */
          exprList = transformInsertRow(pstate, exprList,
                                        stmt->cols,
!                                       icolumns, attrnos);
      }

      /*
--- 759,766 ----
          /* Prepare row for assignment to target table */
          exprList = transformInsertRow(pstate, exprList,
                                        stmt->cols,
!                                       icolumns, attrnos,
!                                       false);
      }

      /*
*************** transformInsertStmt(ParseState *pstate,
*** 808,819 ****
  /*
   * Prepare an INSERT row for assignment to the target table.
   *
!  * The row might be either a VALUES row, or variables referencing a
!  * sub-SELECT output.
   */
  static List *
  transformInsertRow(ParseState *pstate, List *exprlist,
!                    List *stmtcols, List *icolumns, List *attrnos)
  {
      List       *result;
      ListCell   *lc;
--- 829,845 ----
  /*
   * Prepare an INSERT row for assignment to the target table.
   *
!  * exprlist: transformed expressions for source values; these might come from
!  * a VALUES row, or be Vars referencing a sub-SELECT or VALUES RTE output.
!  * stmtcols: original target-columns spec for INSERT (we just test for NIL)
!  * icolumns: effective target-columns spec (list of ResTarget)
!  * attrnos: integer column numbers (must be same length as icolumns)
!  * strip_indirection: if true, remove any field/array assignment nodes
   */
  static List *
  transformInsertRow(ParseState *pstate, List *exprlist,
!                    List *stmtcols, List *icolumns, List *attrnos,
!                    bool strip_indirection)
  {
      List       *result;
      ListCell   *lc;
*************** transformInsertRow(ParseState *pstate, L
*** 879,884 ****
--- 905,933 ----
                                       col->indirection,
                                       col->location);

+         if (strip_indirection)
+         {
+             while (expr)
+             {
+                 if (IsA(expr, FieldStore))
+                 {
+                     FieldStore *fstore = (FieldStore *) expr;
+
+                     expr = (Expr *) linitial(fstore->newvals);
+                 }
+                 else if (IsA(expr, ArrayRef))
+                 {
+                     ArrayRef   *aref = (ArrayRef *) expr;
+
+                     if (aref->refassgnexpr == NULL)
+                         break;
+                     expr = aref->refassgnexpr;
+                 }
+                 else
+                     break;
+             }
+         }
+
          result = lappend(result, expr);

          icols = lnext(icols);
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 2915e21..e59bd20 100644
*** a/src/backend/utils/adt/ruleutils.c
--- b/src/backend/utils/adt/ruleutils.c
*************** static void get_tablesample_def(TableSam
*** 438,445 ****
                      deparse_context *context);
  static void get_opclass_name(Oid opclass, Oid actual_datatype,
                   StringInfo buf);
! static Node *processIndirection(Node *node, deparse_context *context,
!                    bool printit);
  static void printSubscripts(ArrayRef *aref, deparse_context *context);
  static char *get_relation_name(Oid relid);
  static char *generate_relation_name(Oid relid, List *namespaces);
--- 438,444 ----
                      deparse_context *context);
  static void get_opclass_name(Oid opclass, Oid actual_datatype,
                   StringInfo buf);
! static Node *processIndirection(Node *node, deparse_context *context);
  static void printSubscripts(ArrayRef *aref, deparse_context *context);
  static char *get_relation_name(Oid relid);
  static char *generate_relation_name(Oid relid, List *namespaces);
*************** get_values_def(List *values_lists, depar
*** 4551,4561 ****
                  appendStringInfoChar(buf, ',');

              /*
!              * Strip any top-level nodes representing indirection assignments,
!              * then print the result.  Whole-row Vars need special treatment.
               */
!             get_rule_expr_toplevel(processIndirection(col, context, false),
!                                    context, false);
          }
          appendStringInfoChar(buf, ')');
      }
--- 4550,4558 ----
                  appendStringInfoChar(buf, ',');

              /*
!              * Print the value.  Whole-row Vars need special treatment.
               */
!             get_rule_expr_toplevel(col, context, false);
          }
          appendStringInfoChar(buf, ')');
      }
*************** get_insert_query_def(Query *query, depar
*** 5512,5518 ****
      RangeTblEntry *values_rte = NULL;
      RangeTblEntry *rte;
      char       *sep;
-     ListCell   *values_cell;
      ListCell   *l;
      List       *strippedexprs;

--- 5509,5514 ----
*************** get_insert_query_def(Query *query, depar
*** 5563,5579 ****
                           quote_identifier(rte->alias->aliasname));

      /*
!      * Add the insert-column-names list.  To handle indirection properly, we
!      * need to look for indirection nodes in the top targetlist (if it's
!      * INSERT ... SELECT or INSERT ... single VALUES), or in the first
!      * expression list of the VALUES RTE (if it's INSERT ... multi VALUES). We
!      * assume that all the expression lists will have similar indirection in
!      * the latter case.
       */
-     if (values_rte)
-         values_cell = list_head((List *) linitial(values_rte->values_lists));
-     else
-         values_cell = NULL;
      strippedexprs = NIL;
      sep = "";
      if (query->targetList)
--- 5559,5567 ----
                           quote_identifier(rte->alias->aliasname));

      /*
!      * Add the insert-column-names list.  Any indirection decoration needed on
!      * the column names can be inferred from the top targetlist.
       */
      strippedexprs = NIL;
      sep = "";
      if (query->targetList)
*************** get_insert_query_def(Query *query, depar
*** 5599,5618 ****
          /*
           * Print any indirection needed (subfields or subscripts), and strip
           * off the top-level nodes representing the indirection assignments.
           */
!         if (values_cell)
!         {
!             /* we discard the stripped expression in this case */
!             processIndirection((Node *) lfirst(values_cell), context, true);
!             values_cell = lnext(values_cell);
!         }
!         else
!         {
!             /* we keep a list of the stripped expressions in this case */
!             strippedexprs = lappend(strippedexprs,
!                                     processIndirection((Node *) tle->expr,
!                                                        context, true));
!         }
      }
      if (query->targetList)
          appendStringInfoString(buf, ") ");
--- 5587,5600 ----
          /*
           * Print any indirection needed (subfields or subscripts), and strip
           * off the top-level nodes representing the indirection assignments.
+          * Add the stripped expressions to strippedexprs.  (If it's a
+          * single-VALUES statement, the stripped expressions are the VALUES to
+          * print below.  Otherwise they're just Vars and not really
+          * interesting.)
           */
!         strippedexprs = lappend(strippedexprs,
!                                 processIndirection((Node *) tle->expr,
!                                                    context));
      }
      if (query->targetList)
          appendStringInfoString(buf, ") ");
*************** get_update_query_targetlist_def(Query *q
*** 5891,5897 ****
           * Print any indirection needed (subfields or subscripts), and strip
           * off the top-level nodes representing the indirection assignments.
           */
!         expr = processIndirection((Node *) tle->expr, context, true);

          /*
           * If we're in a multiassignment, skip printing anything more, unless
--- 5873,5879 ----
           * Print any indirection needed (subfields or subscripts), and strip
           * off the top-level nodes representing the indirection assignments.
           */
!         expr = processIndirection((Node *) tle->expr, context);

          /*
           * If we're in a multiassignment, skip printing anything more, unless
*************** get_rule_expr(Node *node, deparse_contex
*** 7296,7302 ****
                       * subscripting in immediate descendants.  It returns the
                       * RHS expr that is actually being "assigned".
                       */
!                     refassgnexpr = processIndirection(node, context, true);
                      appendStringInfoString(buf, " := ");
                      get_rule_expr(refassgnexpr, context, showimplicit);
                  }
--- 7278,7284 ----
                       * subscripting in immediate descendants.  It returns the
                       * RHS expr that is actually being "assigned".
                       */
!                     refassgnexpr = processIndirection(node, context);
                      appendStringInfoString(buf, " := ");
                      get_rule_expr(refassgnexpr, context, showimplicit);
                  }
*************** get_opclass_name(Oid opclass, Oid actual
*** 9535,9546 ****
   * processIndirection - take care of array and subfield assignment
   *
   * We strip any top-level FieldStore or assignment ArrayRef nodes that
!  * appear in the input, and return the subexpression that's to be assigned.
!  * If printit is true, we also print out the appropriate decoration for the
!  * base column name (that the caller just printed).
   */
  static Node *
! processIndirection(Node *node, deparse_context *context, bool printit)
  {
      StringInfo    buf = context->buf;

--- 9517,9528 ----
   * processIndirection - take care of array and subfield assignment
   *
   * We strip any top-level FieldStore or assignment ArrayRef nodes that
!  * appear in the input, printing them as decoration for the base column
!  * name (which we assume the caller just printed).  Return the subexpression
!  * that's to be assigned.
   */
  static Node *
! processIndirection(Node *node, deparse_context *context)
  {
      StringInfo    buf = context->buf;

*************** processIndirection(Node *node, deparse_c
*** 9568,9575 ****
              Assert(list_length(fstore->fieldnums) == 1);
              fieldname = get_relid_attribute_name(typrelid,
                                              linitial_int(fstore->fieldnums));
!             if (printit)
!                 appendStringInfo(buf, ".%s", quote_identifier(fieldname));

              /*
               * We ignore arg since it should be an uninteresting reference to
--- 9550,9556 ----
              Assert(list_length(fstore->fieldnums) == 1);
              fieldname = get_relid_attribute_name(typrelid,
                                              linitial_int(fstore->fieldnums));
!             appendStringInfo(buf, ".%s", quote_identifier(fieldname));

              /*
               * We ignore arg since it should be an uninteresting reference to
*************** processIndirection(Node *node, deparse_c
*** 9583,9590 ****

              if (aref->refassgnexpr == NULL)
                  break;
!             if (printit)
!                 printSubscripts(aref, context);

              /*
               * We ignore refexpr since it should be an uninteresting reference
--- 9564,9570 ----

              if (aref->refassgnexpr == NULL)
                  break;
!             printSubscripts(aref, context);

              /*
               * We ignore refexpr since it should be an uninteresting reference
diff --git a/src/test/regress/expected/insert.out b/src/test/regress/expected/insert.out
index 96c7f9e..70107b5 100644
*** a/src/test/regress/expected/insert.out
--- b/src/test/regress/expected/insert.out
*************** select col1, col2, char_length(col3) fro
*** 81,83 ****
--- 81,162 ----
  (8 rows)

  drop table inserttest;
+ --
+ -- check indirection (field/array assignment), cf bug #14265
+ --
+ -- these tests are aware that transformInsertStmt has 3 separate code paths
+ --
+ create type insert_test_type as (if1 int, if2 text[]);
+ create table inserttest (f1 int, f2 int[],
+                          f3 insert_test_type, f4 insert_test_type[]);
+ insert into inserttest (f2[1], f2[2]) values (1,2);
+ insert into inserttest (f2[1], f2[2]) values (3,4), (5,6);
+ insert into inserttest (f2[1], f2[2]) select 7,8;
+ insert into inserttest (f2[1], f2[2]) values (1,default);  -- not supported
+ ERROR:  cannot set an array element to DEFAULT
+ LINE 1: insert into inserttest (f2[1], f2[2]) values (1,default);
+                                        ^
+ insert into inserttest (f3.if1, f3.if2) values (1,array['foo']);
+ insert into inserttest (f3.if1, f3.if2) values (1,'{foo}'), (2,'{bar}');
+ insert into inserttest (f3.if1, f3.if2) select 3, '{baz,quux}';
+ insert into inserttest (f3.if1, f3.if2) values (1,default);  -- not supported
+ ERROR:  cannot set a subfield to DEFAULT
+ LINE 1: insert into inserttest (f3.if1, f3.if2) values (1,default);
+                                         ^
+ insert into inserttest (f3.if2[1], f3.if2[2]) values ('foo', 'bar');
+ insert into inserttest (f3.if2[1], f3.if2[2]) values ('foo', 'bar'), ('baz', 'quux');
+ insert into inserttest (f3.if2[1], f3.if2[2]) select 'bear', 'beer';
+ insert into inserttest (f4[1].if2[1], f4[1].if2[2]) values ('foo', 'bar');
+ insert into inserttest (f4[1].if2[1], f4[1].if2[2]) values ('foo', 'bar'), ('baz', 'quux');
+ insert into inserttest (f4[1].if2[1], f4[1].if2[2]) select 'bear', 'beer';
+ select * from inserttest;
+  f1 |  f2   |        f3        |           f4
+ ----+-------+------------------+------------------------
+     | {1,2} |                  |
+     | {3,4} |                  |
+     | {5,6} |                  |
+     | {7,8} |                  |
+     |       | (1,{foo})        |
+     |       | (1,{foo})        |
+     |       | (2,{bar})        |
+     |       | (3,"{baz,quux}") |
+     |       | (,"{foo,bar}")   |
+     |       | (,"{foo,bar}")   |
+     |       | (,"{baz,quux}")  |
+     |       | (,"{bear,beer}") |
+     |       |                  | {"(,\"{foo,bar}\")"}
+     |       |                  | {"(,\"{foo,bar}\")"}
+     |       |                  | {"(,\"{baz,quux}\")"}
+     |       |                  | {"(,\"{bear,beer}\")"}
+ (16 rows)
+
+ -- also check reverse-listing
+ create table inserttest2 (f1 bigint, f2 text);
+ create rule irule1 as on insert to inserttest2 do also
+   insert into inserttest (f3.if2[1], f3.if2[2])
+   values (new.f1,new.f2);
+ create rule irule2 as on insert to inserttest2 do also
+   insert into inserttest (f4[1].if1, f4[1].if2[2])
+   values (1,'fool'),(new.f1,new.f2);
+ create rule irule3 as on insert to inserttest2 do also
+   insert into inserttest (f4[1].if1, f4[1].if2[2])
+   select new.f1, new.f2;
+ \d+ inserttest2
+                      Table "public.inserttest2"
+  Column |  Type  | Modifiers | Storage  | Stats target | Description
+ --------+--------+-----------+----------+--------------+-------------
+  f1     | bigint |           | plain    |              |
+  f2     | text   |           | extended |              |
+ Rules:
+     irule1 AS
+     ON INSERT TO inserttest2 DO  INSERT INTO inserttest (f3.if2[1], f3.if2[2])
+   VALUES (new.f1, new.f2)
+     irule2 AS
+     ON INSERT TO inserttest2 DO  INSERT INTO inserttest (f4[1].if1, f4[1].if2[2]) VALUES (1,'fool'::text),
(new.f1,new.f2)
+     irule3 AS
+     ON INSERT TO inserttest2 DO  INSERT INTO inserttest (f4[1].if1, f4[1].if2[2])  SELECT new.f1,
+             new.f2
+
+ drop table inserttest2;
+ drop table inserttest;
+ drop type insert_test_type;
diff --git a/src/test/regress/sql/insert.sql b/src/test/regress/sql/insert.sql
index a0ae850..7924d5d 100644
*** a/src/test/regress/sql/insert.sql
--- b/src/test/regress/sql/insert.sql
*************** insert into inserttest values(30, 50, re
*** 36,38 ****
--- 36,86 ----
  select col1, col2, char_length(col3) from inserttest;

  drop table inserttest;
+
+ --
+ -- check indirection (field/array assignment), cf bug #14265
+ --
+ -- these tests are aware that transformInsertStmt has 3 separate code paths
+ --
+
+ create type insert_test_type as (if1 int, if2 text[]);
+
+ create table inserttest (f1 int, f2 int[],
+                          f3 insert_test_type, f4 insert_test_type[]);
+
+ insert into inserttest (f2[1], f2[2]) values (1,2);
+ insert into inserttest (f2[1], f2[2]) values (3,4), (5,6);
+ insert into inserttest (f2[1], f2[2]) select 7,8;
+ insert into inserttest (f2[1], f2[2]) values (1,default);  -- not supported
+
+ insert into inserttest (f3.if1, f3.if2) values (1,array['foo']);
+ insert into inserttest (f3.if1, f3.if2) values (1,'{foo}'), (2,'{bar}');
+ insert into inserttest (f3.if1, f3.if2) select 3, '{baz,quux}';
+ insert into inserttest (f3.if1, f3.if2) values (1,default);  -- not supported
+
+ insert into inserttest (f3.if2[1], f3.if2[2]) values ('foo', 'bar');
+ insert into inserttest (f3.if2[1], f3.if2[2]) values ('foo', 'bar'), ('baz', 'quux');
+ insert into inserttest (f3.if2[1], f3.if2[2]) select 'bear', 'beer';
+
+ insert into inserttest (f4[1].if2[1], f4[1].if2[2]) values ('foo', 'bar');
+ insert into inserttest (f4[1].if2[1], f4[1].if2[2]) values ('foo', 'bar'), ('baz', 'quux');
+ insert into inserttest (f4[1].if2[1], f4[1].if2[2]) select 'bear', 'beer';
+
+ select * from inserttest;
+
+ -- also check reverse-listing
+ create table inserttest2 (f1 bigint, f2 text);
+ create rule irule1 as on insert to inserttest2 do also
+   insert into inserttest (f3.if2[1], f3.if2[2])
+   values (new.f1,new.f2);
+ create rule irule2 as on insert to inserttest2 do also
+   insert into inserttest (f4[1].if1, f4[1].if2[2])
+   values (1,'fool'),(new.f1,new.f2);
+ create rule irule3 as on insert to inserttest2 do also
+   insert into inserttest (f4[1].if1, f4[1].if2[2])
+   select new.f1, new.f2;
+ \d+ inserttest2
+
+ drop table inserttest2;
+ drop table inserttest;
+ drop type insert_test_type;

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Macro customizable hashtable / bitmapscan & aggregation perf
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Rethinking TupleTableSlot deforming