Обсуждение: Re: BUG #15237: I got "ERROR: source for a multiple-column UPDATE item must be a sub-SELECT or ROW() expression"

Поиск
Список
Период
Сортировка
[Pruning the CC list and moving this to -hackers]

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom> It's telling you what to do: use a ROW() expression, ie
 Tom> update fvt_obj_operate_update_table_033 set (c_int) = row(20)
 Tom> where c_int = 20;

 >> Yeah, but (a) this used to work, and has worked since at least as
 >> far back as 9.0, and (b) the spec requires it to work.

 Tom> As far as (a) goes, it was an intentional compatibility breakage,

So here I have a problem: the fact that it was a compatibility breakage
seems not to have been discussed *or even so much as mentioned* on
-hackers at any time.

The original patch, a self-described "lazy-sunday project", made
absolutely no mention of any compatibility issue, simply discussed the
new options it was allowing, and got no feedback or review. Had it
mentioned or even hinted that existing queries might be broken, I would
hope that there would have been more discussion at the time.

It then apparently went unnoticed until after the release of pg 10, at
which point it got retroactively documented (in the release notes and
nowhere else), in response to a brief discussion of a user complaint
that happened on -general and not -hackers (or even -bugs).

 Tom> cf commits 86182b189 and 906bfcad7,

I encourage everyone to refer to these and decide whether this is
sufficient grounds or sufficient discussion to introduce a breaking
change, even a minor one.

 Tom> and as for (b), the SQL committee is just nuts here.

We all know that the spec is a dog's breakfast, yes.

But breaking backwards compatibility AND the spec, even if it's in order
to support other parts of the spec, needs to be something that's
actually discussed and the tradeoff assessed rather than being snuck
through not only without discussion but also without even asking "is
there any sane way we can avoid breaking this".

 Tom> The expectation in 906bfcad7 was that we'd move towards the
 Tom> *other* thing the spec demands, namely that the RHS can be any
 Tom> a_expr yielding a suitable row value.

The spec doesn't have anything that corresponds to our a_expr,
obviously.

 Tom> We can't do that if it's unclear what is or isn't a ROW()
 Tom> expression, because then the intended semantics would be
 Tom> undecidable.

As far as I can tell, the spec distinguishes a lot of these cases only
by means of the type of the value.

For example:

<contextually typed row value expression> can be a <row value
special case> which is a <nonparenthesized value expression primary>
which must have a row type. (Note that for example (select row(1,2))
is a nonparenthesized value expression primary even though it has parens
around it.)

Or <contextually typed row value constructor> can be a <common
value expression> which can be one of several <$type value expression>
productions all of which contain <nonparenthesized value
expression primary> as one of the possible expansions. So for example
(select 1) ends up being treated as row((select 1)).

 Tom> And I don't think we want a situation in which adding "extra"
 Tom> parens changes a legal command into an illegal one.

To some extent I agree, but the spec does rely on this in places:

 Tom> For example, suppose f(...) returns a single-column tuple result.
 Tom> This should be legal, if x matches the type of the single column:

 Tom>     update ... set (x) = f(...)

 Tom> but if we try to do what you seem to have in mind, this would not
 Tom> be:

 Tom>     update ... set (x) = (f(...))

The spec indeed says that this is not valid, since it would wrap an
additional ROW() around the result, and would try and assign the row
value to x.

 Tom> That's sufficiently brain-dead that I don't think we want to go
 Tom> there.

Maybe so, but not without discussion.

-- 
Andrew (irc:RhodiumToad)


On Wed, Jun 13, 2018 at 12:48 AM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
> The original patch, a self-described "lazy-sunday project", made
> absolutely no mention of any compatibility issue, simply discussed the
> new options it was allowing, and got no feedback or review. Had it
> mentioned or even hinted that existing queries might be broken, I would
> hope that there would have been more discussion at the time.

I think that is a fair criticism.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


On Wed, Jun 13, 2018 at 05:48:50AM +0100, Andrew Gierth wrote:
> It then apparently went unnoticed until after the release of pg 10, at
> which point it got retroactively documented (in the release notes and
> nowhere else), in response to a brief discussion of a user complaint
> that happened on -general and not -hackers (or even -bugs).

Actually I noticed and pointed out during beta;
https://www.postgresql.org/message-id/flat/20170719174507.GA19616%40telsasoft.com#20170719174507.GA19616@telsasoft.com

Justin


>>>>> "Justin" == Justin Pryzby <pryzby@telsasoft.com> writes:

 >> It then apparently went unnoticed until after the release of pg 10,
 >> at which point it got retroactively documented (in the release notes
 >> and nowhere else), in response to a brief discussion of a user
 >> complaint that happened on -general and not -hackers (or even
 >> -bugs).

 Justin> Actually I noticed and pointed out during beta;
 Justin>
https://www.postgresql.org/message-id/flat/20170719174507.GA19616%40telsasoft.com#20170719174507.GA19616@telsasoft.com

Interesting, I missed that when searching. So that tells us:

1. Breaking SET (col) = (val) wasn't actually intended or noticed in the
   original patch;

2. The break was then defended based on an incorrect reading of the spec
   (as I pointed out already on -bugs, the syntax _is_ legal in the spec
   with only one column, for just the same reasons that VALUES (1) isn't
   required to be VALUES row(1); the spec adds the ROW( ) construct
   itself in one of the syntax rules, in a way that's easy to overlook)

-- 
Andrew (irc:RhodiumToad)


Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> As far as (a) goes, it was an intentional compatibility breakage,

> So here I have a problem: the fact that it was a compatibility breakage
> seems not to have been discussed *or even so much as mentioned* on
> -hackers at any time.

I will freely admit that at the time I did the original patch, I did not
think that the change for the single-column case was of interest to
anyone.  Who'd write an extra four parentheses that they didn't need?
But it *was* discussed later, including on -hackers:

https://www.postgresql.org/message-id/flat/20170719174507.GA19616%40telsasoft.com#20170719174507.GA19616@telsasoft.com
https://www.postgresql.org/message-id/flat/CAMjNa7cDLzPcs0xnRpkvqmJ6Vb6G3EH8CYGp9ZBjXdpFfTz6dg%40mail.gmail.com

>  Tom> For example, suppose f(...) returns a single-column tuple result.
>  Tom> This should be legal, if x matches the type of the single column:
>  Tom>     update ... set (x) = f(...)
>  Tom> but if we try to do what you seem to have in mind, this would not
>  Tom> be:
>  Tom>     update ... set (x) = (f(...))

> The spec indeed says that this is not valid, since it would wrap an
> additional ROW() around the result, and would try and assign the row
> value to x.

I think that arguing that the spec requires that is fairly shaky, and
implementing it would be even shakier; for example, we'd have to start
worrying about whether ruleutils.c's habit of throwing in extra parens
when in doubt could ever result in unexpected change to the meaning.

I'd also point out that with the rule that the extra parens implicitly
mean "ROW()", it's fairly unclear what should happen with

    update ... set (x) = ((select ...))

If somebody were to hold a gun to my head and say "you must make this
work", I'd probably do something like the attached, which abuses the
operator_precedence_warning infrastructure to detect whether there were
extra parens.  One would have to argue about exactly what it should do
with parens around ROW() or (SELECT ...), but I chose to ignore them.
(I think that in the SELECT case, the grammar would actually absorb
the extra parens elsewhere, so that we couldn't do anything differently
anyway without further kluges.)

But I repeat that this is a bad idea and we'd regret it in the long run;
treating extra parens as meaning ROW() in some cases is just a
fundamentally unsound idea from both syntactic and semantic standpoints.
Given the extremely small number of complaints since v10 came out,
I think we should just stay with what we have.

            regards, tom lane

PS: I noticed while messing with this that there's a separate problem:
right now, turning on operator_precedence_warning causes unexpected
failures with extra parens around the RHS.  The code added below to
make transformMultiAssignRef look through AEXPR_PAREN nodes before
deciding anything is needed anyway to fix that.

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 90dfac2..64bd254 100644
*** a/src/backend/parser/gram.y
--- b/src/backend/parser/gram.y
*************** static Node *makeRecursiveViewSelect(cha
*** 470,476 ****
  %type <node>    def_arg columnElem where_clause where_or_current_clause
                  a_expr b_expr c_expr AexprConst indirection_el opt_slice_bound
                  columnref in_expr having_clause func_table xmltable array_expr
!                 ExclusionWhereClause operator_def_arg
  %type <list>    rowsfrom_item rowsfrom_list opt_col_def_list
  %type <boolean> opt_ordinality
  %type <list>    ExclusionConstraintList ExclusionConstraintElem
--- 470,476 ----
  %type <node>    def_arg columnElem where_clause where_or_current_clause
                  a_expr b_expr c_expr AexprConst indirection_el opt_slice_bound
                  columnref in_expr having_clause func_table xmltable array_expr
!                 ct_row_expr ExclusionWhereClause operator_def_arg
  %type <list>    rowsfrom_item rowsfrom_list opt_col_def_list
  %type <boolean> opt_ordinality
  %type <list>    ExclusionConstraintList ExclusionConstraintElem
*************** set_clause:
*** 11070,11076 ****
                      $1->val = (Node *) $3;
                      $$ = list_make1($1);
                  }
!             | '(' set_target_list ')' '=' a_expr
                  {
                      int ncolumns = list_length($2);
                      int i = 1;
--- 11070,11076 ----
                      $1->val = (Node *) $3;
                      $$ = list_make1($1);
                  }
!             | '(' set_target_list ')' '=' ct_row_expr
                  {
                      int ncolumns = list_length($2);
                      int i = 1;
*************** c_expr:        columnref                                { $$ = $1; }
*** 13451,13457 ****
                          n->indirection = check_indirection($4, yyscanner);
                          $$ = (Node *)n;
                      }
!                     else if (operator_precedence_warning)
                      {
                          /*
                           * If precedence warnings are enabled, insert
--- 13451,13458 ----
                          n->indirection = check_indirection($4, yyscanner);
                          $$ = (Node *)n;
                      }
!                     else if (operator_precedence_warning ||
!                              pg_yyget_extra(yyscanner)->in_ctre)
                      {
                          /*
                           * If precedence warnings are enabled, insert
*************** c_expr:        columnref                                { $$ = $1; }
*** 13469,13474 ****
--- 13470,13479 ----
                           * suppress warnings, it's probably safe enough; and
                           * we'd just as soon not waste cycles on dummy parse
                           * nodes if we don't have to.
+                          *
+                          * We also insert AEXPR_PAREN nodes if we're inside a
+                          * ct_row_expr, so that parse analysis can distinguish
+                          * "(x)" from "x" later.
                           */
                          $$ = (Node *) makeA_Expr(AEXPR_PAREN, NIL, $2, NULL,
                                                   exprLocation($2));
*************** opt_asymmetric: ASYMMETRIC
*** 14599,14604 ****
--- 14604,14623 ----
              | /*EMPTY*/
          ;

+ /*
+  * Deal with the spec's poorly-thought-through "contextually typed row value
+  * expression" syntax.  For that, we need to detect whether the expression has
+  * outer parentheses.  We can't define it as "a_expr | (a_expr)" without
+  * causing reduce/reduce conflicts, so instead force AEXPR_PAREN nodes to be
+  * inserted for parens throughout the subexpression, and we'll sort it out
+  * during parse analysis.
+  */
+ ct_row_expr:
+             { pg_yyget_extra(yyscanner)->in_ctre++; }
+             a_expr
+             { pg_yyget_extra(yyscanner)->in_ctre--; $$ = $2; }
+         ;
+

  /*****************************************************************************
   *
*************** void
*** 16328,16331 ****
--- 16347,16351 ----
  parser_init(base_yy_extra_type *yyext)
  {
      yyext->parsetree = NIL;        /* in case grammar forgets to set it */
+     yyext->in_ctre = 0;            /* not handling ct_row_expr */
  }
diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c
index 385e54a..0d09d7b 100644
*** a/src/backend/parser/parse_expr.c
--- b/src/backend/parser/parse_expr.c
*************** transformMultiAssignRef(ParseState *psta
*** 1499,1515 ****
      /* We only need to transform the source if this is the first column */
      if (maref->colno == 1)
      {
          /*
           * For now, we only allow EXPR SubLinks and RowExprs as the source of
           * an UPDATE multiassignment.  This is sufficient to cover interesting
           * cases; at worst, someone would have to write (SELECT * FROM expr)
           * to expand a composite-returning expression of another form.
           */
!         if (IsA(maref->source, SubLink) &&
!             ((SubLink *) maref->source)->subLinkType == EXPR_SUBLINK)
          {
              /* Relabel it as a MULTIEXPR_SUBLINK */
!             sublink = (SubLink *) maref->source;
              sublink->subLinkType = MULTIEXPR_SUBLINK;
              /* And transform it */
              sublink = (SubLink *) transformExprRecurse(pstate,
--- 1499,1548 ----
      /* We only need to transform the source if this is the first column */
      if (maref->colno == 1)
      {
+         Node       *source = maref->source;
+         bool        have_parens = false;
+
+         /* Drill through any AEXPR_PAREN nodes */
+         while (source && IsA(source, A_Expr) &&
+                ((A_Expr *) source)->kind == AEXPR_PAREN)
+         {
+             have_parens = true;
+             source = ((A_Expr *) source)->lexpr;
+         }
+
+         /*
+          * If we have a single-column assignment, and the RHS had parens but
+          * isn't a SubLink or RowExpr, pretend it's a single-column RowExpr.
+          * This is quite broken, as it will cause odd inconsistencies whenever
+          * we get around to allowing other syntactic cases for the RHS, but
+          * some people insisted.
+          */
+         if (have_parens && maref->ncolumns == 1 &&
+             !(IsA(source, SubLink) &&
+               ((SubLink *) source)->subLinkType == EXPR_SUBLINK) &&
+             !IsA(source, RowExpr))
+         {
+             RowExpr    *r = makeNode(RowExpr);
+
+             r->args = list_make1(source);
+             r->row_typeid = InvalidOid;
+             r->colnames = NIL;
+             r->row_format = COERCE_IMPLICIT_CAST;    /* abuse */
+             r->location = -1;
+             source = (Node *) r;
+         }
+
          /*
           * For now, we only allow EXPR SubLinks and RowExprs as the source of
           * an UPDATE multiassignment.  This is sufficient to cover interesting
           * cases; at worst, someone would have to write (SELECT * FROM expr)
           * to expand a composite-returning expression of another form.
           */
!         if (IsA(source, SubLink) &&
!             ((SubLink *) source)->subLinkType == EXPR_SUBLINK)
          {
              /* Relabel it as a MULTIEXPR_SUBLINK */
!             sublink = (SubLink *) source;
              sublink->subLinkType = MULTIEXPR_SUBLINK;
              /* And transform it */
              sublink = (SubLink *) transformExprRecurse(pstate,
*************** transformMultiAssignRef(ParseState *psta
*** 1542,1552 ****
               */
              sublink->subLinkId = list_length(pstate->p_multiassign_exprs);
          }
!         else if (IsA(maref->source, RowExpr))
          {
              /* Transform the RowExpr, allowing SetToDefault items */
              rexpr = (RowExpr *) transformRowExpr(pstate,
!                                                  (RowExpr *) maref->source,
                                                   true);

              /* Check it returns required number of columns */
--- 1575,1585 ----
               */
              sublink->subLinkId = list_length(pstate->p_multiassign_exprs);
          }
!         else if (IsA(source, RowExpr))
          {
              /* Transform the RowExpr, allowing SetToDefault items */
              rexpr = (RowExpr *) transformRowExpr(pstate,
!                                                  (RowExpr *) source,
                                                   true);

              /* Check it returns required number of columns */
*************** transformMultiAssignRef(ParseState *psta
*** 1568,1574 ****
              ereport(ERROR,
                      (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                       errmsg("source for a multiple-column UPDATE item must be a sub-SELECT or ROW() expression"),
!                      parser_errposition(pstate, exprLocation(maref->source))));
      }
      else
      {
--- 1601,1607 ----
              ereport(ERROR,
                      (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
                       errmsg("source for a multiple-column UPDATE item must be a sub-SELECT or ROW() expression"),
!                      parser_errposition(pstate, exprLocation(source))));
      }
      else
      {
diff --git a/src/include/parser/gramparse.h b/src/include/parser/gramparse.h
index 42e7ede..6e51081 100644
*** a/src/include/parser/gramparse.h
--- b/src/include/parser/gramparse.h
*************** typedef struct base_yy_extra_type
*** 53,58 ****
--- 53,59 ----
       * State variables that belong to the grammar.
       */
      List       *parsetree;        /* final parse result is delivered here */
+     int            in_ctre;        /* handling contextually-typed row expr? */
  } base_yy_extra_type;

  /*

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom> For example, suppose f(...) returns a single-column tuple result.
 Tom> This should be legal, if x matches the type of the single column:
 Tom> update ... set (x) = f(...)
 Tom> but if we try to do what you seem to have in mind, this would not
 Tom> be:
 Tom> update ... set (x) = (f(...))

 >> The spec indeed says that this is not valid, since it would wrap an
 >> additional ROW() around the result, and would try and assign the row
 >> value to x.

 Tom> I think that arguing that the spec requires that is fairly shaky,

I'll be more precise: the spec does not define any syntax that would
match UPDATE ... SET (x) = (f(...))  where f() is a function returning a
row type of degree 1 (or any other degree for that matter).

Specifically, the expansions of <contextually typed row value constructor>
don't allow a row-valued function to appear alone that way, while the
alternative, <row value special case>, requires a <nonparenthesized
value expression primary>.

The spec accordingly doesn't require that the construct be rejected, it
could be accepted as a (nonstandard) extension.

 Tom> I'd also point out that with the rule that the extra parens
 Tom> implicitly mean "ROW()", it's fairly unclear what should happen
 Tom> with

 Tom>     update ... set (x) = ((select ...))

The spec doesn't say that parens (even extra parens) implicitly mean
ROW(), what it does say is that some branches of the syntax implicitly
add a ROW() and some do not.

Specifically, if the thing to the right of the = is a <common value
expression>, <boolean value expression>, or NULL/DEFAULT without parens,
then ROW() is added. ((select scalarcol from ...)) is a <common value
expression>, ((select ROW(scalarcol) from ...)) is, however, not.

 Tom> But I repeat that this is a bad idea and we'd regret it in the
 Tom> long run; treating extra parens as meaning ROW() in some cases is
 Tom> just a fundamentally unsound idea from both syntactic and semantic
 Tom> standpoints.

But that's not actually the requirement. Parens are significant to the
spec, but that doesn't mean they have to be significant to us in the
same way as long as we end up accepting all the spec's cases (or at
least the useful ones).

Here is a real problem case (where "rowcol" is a column with a row type
and "rowval" has that same type):

UPDATE ... SET (rowcol) = (rowval)

There literally isn't any way to write the above in the spec except as

UPDATE ... SET (rowcol) = ROW(rowval)
or
UPDATE ... SET (rowcol) = (select ROW(rowval) from ...)

If we don't try and handle SET (rowcol) = (rowval), then I think we can
make all the spec's cases work by this rule: if the thing on the RHS of
the assignment is not a row type, then treat it as a row type of degree
1. That works without needing special rules for parens, and while it
accepts a lot of possibilities that the spec rejects, I don't see any
problems there.

 Tom> Given the extremely small number of complaints since v10 came out,
 Tom> I think we should just stay with what we have.

I tried looking at what combinations were supported by other major
databases:

UPDATE ... SET (cols...) = (exprs...)

Supported by DB2 and SQLite, both of which allow 1 or more expressions.
Not supported by MySQL, MSSQL, Oracle.

UPDATE ... SET (cols...) = (select exprs...)

Supported by DB2, Oracle, SQLite, for 1 or more columns. Not supported
by MySQL, MSSQL. (Note that this syntax is apparently not valid in the
spec except for the case of a single non-rowtype column!)

UPDATE ... SET (cols...) = (DEFAULT,...)

Supported by DB2, for 1 or more columns. Not supported by MySQL, MSSQL,
Oracle, SQLite.

UPDATE ... SET (cols...) = ROW(...)
UPDATE ... SET (cols...) = nonparenthesized_row_expression
UPDATE ... SET (cols...) = (select ROW(...) from ...)

Not supported by anything that I could find. (HSQLDB claims to support
the full standard syntax, but I don't believe it yet.)

-- 
Andrew (irc:RhodiumToad)