Обсуждение: Adding OLD/NEW support to RETURNING

Поиск
Список
Период
Сортировка

Adding OLD/NEW support to RETURNING

От
Dean Rasheed
Дата:
I have been playing around with the idea of adding support for OLD/NEW
to RETURNING, partly motivated by the discussion on the MERGE
RETURNING thread [1], but also because I think it would be a very
useful addition for other commands (UPDATE in particular).

This was discussed a long time ago [2], but that previous discussion
didn't lead to a workable patch, and so I have taken a different
approach here.

My first thought was that this would only really make sense for UPDATE
and MERGE, since OLD/NEW are pretty pointless for INSERT/DELETE
respectively. However...

1. For an INSERT with an ON CONFLICT ... DO UPDATE clause, returning
OLD might be very useful, since it provides a way to see which rows
conflicted, and return the old conflicting values.

2. If a DELETE is turned into an UPDATE by a rule (e.g., to mark rows
as deleted, rather than actually deleting them), then returning NEW
can also be useful. (I admit, this is a somewhat obscure use case, but
it's still possible.)

3. In a MERGE, we need to be able to handle all 3 command types anyway.

4. It really isn't any extra effort to support INSERT and DELETE.

So in the attached very rough patch (no docs, minimal testing) I have
just allowed OLD/NEW in RETURNING for all command types (except, I
haven't done MERGE here - I think that's best kept as a separate
patch). If there is no OLD/NEW row in a particular context, it just
returns NULLs. The regression tests contain examples of  1 & 2 above.


Based on Robert Haas' suggestion in [2], the patch works by adding a
new "varreturningtype" field to Var nodes. This field is set during
parse analysis of the returning clause, which adds new namespace
aliases for OLD and NEW, if tables with those names/aliases are not
already present. So the resulting Var nodes have the same
varno/varattno as they would normally have had, but a different
varreturningtype.

For the most part, the rewriter and parser are then untouched, except
for a couple of places necessary to ensure that the new field makes it
through correctly. In particular, none of this affects the shape of
the final plan produced. All of the work to support the new Var
returning type is done in the executor.

This turns out to be relatively straightforward, except for
cross-partition updates, which was a little trickier since the tuple
format of the old row isn't necessarily compatible with the new row,
which is in a different partition table and so might have a different
column order.

One thing that I've explicitly disallowed is returning OLD/NEW for
updates to foreign tables. It's possible that could be added in a
later patch, but I have no plans to support that right now.


One difficult question is what names to use for the new aliases. I
think OLD and NEW are the most obvious and natural choices. However,
there is a problem - if they are used in a trigger function, they will
conflict. In PL/pgSQL, this leads to an error like the following:

ERROR:  column reference "new.f1" is ambiguous
LINE 3:     RETURNING new.f1, new.f4
                      ^
DETAIL:  It could refer to either a PL/pgSQL variable or a table column.

That's the same error that you'd get if a different alias name had
been chosen, and it happened to conflict with a user-defined PL/pgSQL
variable, except that in that case, the user could just change their
variable name to fix the problem, which is not possible with the
automatically-added OLD/NEW trigger variables. As a way round that, I
added a way to optionally change the alias used in the RETURNING list,
using the following syntax:

  RETURNING [ WITH ( { OLD | NEW } AS output_alias [, ...] ) ]
    * | output_expression [ [ AS ] output_name ] [, ...]

for example:

  RETURNING WITH (OLD AS o) o.id, o.val, ...

I'm not sure how good a solution that is, but the syntax doesn't look
too bad to me (somewhat reminiscent of a WITH-query), and it's only
necessary in cases where there is a name conflict.

The simpler solution would be to just pick different alias names to
start with. The previous thread seemed to settle on BEFORE/AFTER, but
I don't find those names particularly intuitive or appealing. Over on
[1], PREVIOUS/CURRENT was suggested, which I prefer, but they still
don't seem as natural as OLD/NEW.

So, as is often the case, naming things turns out to be the hardest
problem, which is why I quite like the idea of letting the user pick
their own name, if they need to. In most contexts, OLD and NEW will
work, so they won't need to.

Thoughts?

Regards,
Dean

[1] https://www.postgresql.org/message-id/flat/CAEZATCWePEGQR5LBn-vD6SfeLZafzEm2Qy_L_Oky2=qw2w3Pzg@mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/51822C0F.5030807%40gmail.com

Вложения

Re: Adding OLD/NEW support to RETURNING

От
jian he
Дата:
On Mon, Dec 4, 2023 at 8:15 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> I have been playing around with the idea of adding support for OLD/NEW
> to RETURNING, partly motivated by the discussion on the MERGE
> RETURNING thread [1], but also because I think it would be a very
> useful addition for other commands (UPDATE in particular).
>
> This was discussed a long time ago [2], but that previous discussion
> didn't lead to a workable patch, and so I have taken a different
> approach here.
>
> Thoughts?
>


  /* get the tuple from the relation being scanned */
- scratch.opcode = EEOP_ASSIGN_SCAN_VAR;
+ switch (variable->varreturningtype)
+ {
+ case VAR_RETURNING_OLD:
+ scratch.opcode = EEOP_ASSIGN_OLD_VAR;
+ break;
+ case VAR_RETURNING_NEW:
+ scratch.opcode = EEOP_ASSIGN_NEW_VAR;
+ break;
+ default:
+ scratch.opcode = EEOP_ASSIGN_SCAN_VAR;
+ break;
+ }
I have roughly an idea of what this code is doing. but do you need to
refactor the above comment?


/* for EEOP_INNER/OUTER/SCAN_FETCHSOME */
in src/backend/executor/execExpr.c, do you need to update the comment?

create temp table foo (f1 int, f2 int);
insert into foo values (1,2), (3,4);
INSERT INTO foo select 11, 22  RETURNING WITH (old AS new, new AS old)
new.*, old.*;
--this works. which is fine.

create or replace function stricttest1() returns void as $$
declare x record;
begin
    insert into foo values(5,6) returning new.* into x;
    raise notice 'x.f1 = % x.f2 %', x.f1, x.f2;
end$$ language plpgsql;
select * from stricttest1();
--this works.

create or replace function stricttest2() returns void as $$
declare x record; y record;
begin
    INSERT INTO foo select 11, 22  RETURNING WITH (old AS o, new AS n)
o into x, n into y;
    raise notice 'x.f1: % x.f2 % y.f1 % y.f2 %', x.f1,x.f2, y.f1, y.f2;
end$$ language plpgsql;
--this does not work.
--because
https://www.postgresql.org/message-id/flat/CAFj8pRB76FE2MVxJYPc1RvXmsf2upoTgoPCC9GsvSAssCM2APQ%40mail.gmail.com

create or replace function stricttest3() returns void as $$
declare x record; y record;
begin
    INSERT INTO foo select 11, 22  RETURNING WITH (old AS o, new AS n) o.*,n.*
    into x;
    raise notice 'x.f1 % x.f2 %, % %', x.f1, x.f2, x.f1,x.f2;
end$$ language plpgsql;
select * from stricttest3();
--this is not what we want. because old and new share the same column name
--so here you cannot get the "new" content.

create or replace function stricttest4() returns void as $$
declare x record; y record;
begin
    INSERT INTO foo select 11, 22
    RETURNING WITH (old AS o, new AS n)
    o.f1 as of1,o.f2 as of2,n.f1 as nf1, n.f2 as nf2
    into x;
    raise notice 'x.0f1 % x.of2 % nf1 % nf2 %', x.of1, x.of2, x.nf1, x.nf2;
end$$ language plpgsql;
--kind of verbose, but works, which is fine.

create or replace function stricttest5() returns void as $$
declare x record; y record;
      a foo%ROWTYPE; b foo%ROWTYPE;
begin
  INSERT INTO foo select 11, 22
  RETURNING WITH (old AS o, new AS n) o into a, n into b;
end$$ language plpgsql;
-- expect this to work.



Re: Adding OLD/NEW support to RETURNING

От
Dean Rasheed
Дата:
On Sat, 16 Dec 2023 at 13:04, jian he <jian.universality@gmail.com> wrote:
>
>   /* get the tuple from the relation being scanned */
> I have roughly an idea of what this code is doing. but do you need to
> refactor the above comment?
>
> /* for EEOP_INNER/OUTER/SCAN_FETCHSOME */
> in src/backend/executor/execExpr.c, do you need to update the comment?
>

Thanks for looking at this.

Attached is a new version with some updated comments. In addition, I
fixed a couple of issues:

In raw_expression_tree_walker(), I had missed one of the new node types.

When "old" or "new" are specified by themselves in the RETURNING list
to return the whole old/new row, the parser was generating a RowExpr
node, which appeared to work OK, but failed if there were any dropped
columns in the relation. I have changed this to generate a wholerow
Var instead, which deals with that issue, and seems better for
efficiency and consistency with existing code.

In addition, I have added code during executor startup to record
whether or not the RETURNING list actually has any references to
OLD/NEW values. This allows the building of old/new tuple slots to be
skipped when they're not actually needed, reducing per-row overheads.

I still haven't written any docs yet.


> create or replace function stricttest2() returns void as $$
> declare x record; y record;
> begin
>     INSERT INTO foo select 11, 22  RETURNING WITH (old AS o, new AS n)
> o into x, n into y;
>     raise notice 'x.f1: % x.f2 % y.f1 % y.f2 %', x.f1,x.f2, y.f1, y.f2;
> end$$ language plpgsql;
> --this does not work.
> --because
https://www.postgresql.org/message-id/flat/CAFj8pRB76FE2MVxJYPc1RvXmsf2upoTgoPCC9GsvSAssCM2APQ%40mail.gmail.com
>
> create or replace function stricttest5() returns void as $$
> declare x record; y record;
>       a foo%ROWTYPE; b foo%ROWTYPE;
> begin
>   INSERT INTO foo select 11, 22
>   RETURNING WITH (old AS o, new AS n) o into a, n into b;
> end$$ language plpgsql;
> -- expect this to work.

Yeah, but note that multiple INTO clauses aren't allowed. An
alternative is to create a custom type to hold the old and new
records, e.g.:

CREATE TYPE foo_delta AS (old foo, new foo);

then you can just do "RETURNING old, new INTO delta" where delta is a
variable of type foo_delta, and you can extract individual fields
using expressions like "(delta.old).f1".

Regards,
Dean

Вложения

Re: Adding OLD/NEW support to RETURNING

От
Tomas Vondra
Дата:
On 12/4/23 13:14, Dean Rasheed wrote:
> I have been playing around with the idea of adding support for OLD/NEW
> to RETURNING, partly motivated by the discussion on the MERGE
> RETURNING thread [1], but also because I think it would be a very
> useful addition for other commands (UPDATE in particular).
> 

Sounds reasonable ...

> This was discussed a long time ago [2], but that previous discussion
> didn't lead to a workable patch, and so I have taken a different
> approach here.
> 

Presumably the 2013 thread went nowhere because of some implementation
problems, not simply because the author lost interest and disappeared?
Would it be helpful for this new patch to briefly summarize what the
main issues were and how this new approach deals with that? (It's hard
to say if reading the old thread is necessary/helpful for understanding
this new patch, and time is a scarce resource.)

> My first thought was that this would only really make sense for UPDATE
> and MERGE, since OLD/NEW are pretty pointless for INSERT/DELETE
> respectively. However...
> 
> 1. For an INSERT with an ON CONFLICT ... DO UPDATE clause, returning
> OLD might be very useful, since it provides a way to see which rows
> conflicted, and return the old conflicting values.
> 
> 2. If a DELETE is turned into an UPDATE by a rule (e.g., to mark rows
> as deleted, rather than actually deleting them), then returning NEW
> can also be useful. (I admit, this is a somewhat obscure use case, but
> it's still possible.)
> 
> 3. In a MERGE, we need to be able to handle all 3 command types anyway.
> 
> 4. It really isn't any extra effort to support INSERT and DELETE.
> 
> So in the attached very rough patch (no docs, minimal testing) I have
> just allowed OLD/NEW in RETURNING for all command types (except, I
> haven't done MERGE here - I think that's best kept as a separate
> patch). If there is no OLD/NEW row in a particular context, it just
> returns NULLs. The regression tests contain examples of  1 & 2 above.
> 
> 
> Based on Robert Haas' suggestion in [2], the patch works by adding a
> new "varreturningtype" field to Var nodes. This field is set during
> parse analysis of the returning clause, which adds new namespace
> aliases for OLD and NEW, if tables with those names/aliases are not
> already present. So the resulting Var nodes have the same
> varno/varattno as they would normally have had, but a different
> varreturningtype.
> 

No opinion on whether varreturningtype is the right approach - it sounds
like it's working better than the 2013 patch, but I won't pretend my
knowledge of this code is sufficient to make judgments beyond that.

> For the most part, the rewriter and parser are then untouched, except
> for a couple of places necessary to ensure that the new field makes it
> through correctly. In particular, none of this affects the shape of
> the final plan produced. All of the work to support the new Var
> returning type is done in the executor.
> 
> This turns out to be relatively straightforward, except for
> cross-partition updates, which was a little trickier since the tuple
> format of the old row isn't necessarily compatible with the new row,
> which is in a different partition table and so might have a different
> column order.
> 

So we just "remap" the attributes, right?

> One thing that I've explicitly disallowed is returning OLD/NEW for
> updates to foreign tables. It's possible that could be added in a
> later patch, but I have no plans to support that right now.
> 

Sounds like an acceptable restriction, as long as it's documented.

What are the challenges for supporting OLD/NEW for foreign tables? I
guess we'd need to ask the FDW handler to tell us if it can support
OLD/NEW for this table (and only allow it for postgres_fdw with
sufficiently new server version), and then deparse the SQL.

I'm asking because this seems like a nice first patch idea, but if I
don't see some major obstacle that I don't see ...

> 
> One difficult question is what names to use for the new aliases. I
> think OLD and NEW are the most obvious and natural choices. However,
> there is a problem - if they are used in a trigger function, they will
> conflict. In PL/pgSQL, this leads to an error like the following:
> 
> ERROR:  column reference "new.f1" is ambiguous
> LINE 3:     RETURNING new.f1, new.f4
>                       ^
> DETAIL:  It could refer to either a PL/pgSQL variable or a table column.
> 
> That's the same error that you'd get if a different alias name had
> been chosen, and it happened to conflict with a user-defined PL/pgSQL
> variable, except that in that case, the user could just change their
> variable name to fix the problem, which is not possible with the
> automatically-added OLD/NEW trigger variables. As a way round that, I
> added a way to optionally change the alias used in the RETURNING list,
> using the following syntax:
> 
>   RETURNING [ WITH ( { OLD | NEW } AS output_alias [, ...] ) ]
>     * | output_expression [ [ AS ] output_name ] [, ...]
> 
> for example:
> 
>   RETURNING WITH (OLD AS o) o.id, o.val, ...
> 
> I'm not sure how good a solution that is, but the syntax doesn't look
> too bad to me (somewhat reminiscent of a WITH-query), and it's only
> necessary in cases where there is a name conflict.
> 
> The simpler solution would be to just pick different alias names to
> start with. The previous thread seemed to settle on BEFORE/AFTER, but
> I don't find those names particularly intuitive or appealing. Over on
> [1], PREVIOUS/CURRENT was suggested, which I prefer, but they still
> don't seem as natural as OLD/NEW.
> 
> So, as is often the case, naming things turns out to be the hardest
> problem, which is why I quite like the idea of letting the user pick
> their own name, if they need to. In most contexts, OLD and NEW will
> work, so they won't need to.
> 

I think OLD/NEW with a way to define a custom alias when needed seems
acceptable. Or at least I can't think of a clearly better solution. Yes,
using some other name might not have this problem, but I guess we'd have
to pick an existing keyword or add one. And Tom didn't seem thrilled
with reserving a keyword in 2013 ...

Plus I think there's value in consistency, and OLD/NEW seems way more
natural that BEFORE/AFTER.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Adding OLD/NEW support to RETURNING

От
Dean Rasheed
Дата:
On Sat, 24 Feb 2024 at 17:52, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> Presumably the 2013 thread went nowhere because of some implementation
> problems, not simply because the author lost interest and disappeared?
> Would it be helpful for this new patch to briefly summarize what the
> main issues were and how this new approach deals with that? (It's hard
> to say if reading the old thread is necessary/helpful for understanding
> this new patch, and time is a scarce resource.)

Thanks for looking!

The 2013 patch got fairly far down a particular implementation path
(adding a new kind of RTE called RTE_ALIAS) before Robert reviewed it
[1]. He pointed out various specific issues, as well as questioning
the overall approach, and suggesting a different approach that would
have involved significant rewriting (this is essentially the approach
that I have taken, adding a new field to Var nodes).

[1] https://www.postgresql.org/message-id/CA%2BTgmoY5EXE-YKMV7CsdSFj-noyZz%3D2z45sgyJX5Y84rO3RnWQ%40mail.gmail.com

The thread kind-of petered out shortly after that, with the conclusion
that the patch needed a pretty significant redesign and rewrite.


> No opinion on whether varreturningtype is the right approach - it sounds
> like it's working better than the 2013 patch, but I won't pretend my
> knowledge of this code is sufficient to make judgments beyond that.
>
> > For the most part, the rewriter and parser are then untouched, except
> > for a couple of places necessary to ensure that the new field makes it
> > through correctly. In particular, none of this affects the shape of
> > the final plan produced. All of the work to support the new Var
> > returning type is done in the executor.

(Of course, I meant the rewriter and the *planner* are largely untouched.)

I think this is one of the main advantages of this approach. The 2013
design, adding a new RTE kind, required changes all over the place,
including lots of hacking in the planner.


> > This turns out to be relatively straightforward, except for
> > cross-partition updates, which was a little trickier since the tuple
> > format of the old row isn't necessarily compatible with the new row,
> > which is in a different partition table and so might have a different
> > column order.
>
> So we just "remap" the attributes, right?

Right. That's what the majority of the new code in ExecDelete() and
ExecInsert() is for. It's not that complicated, but it did require a
bit of care.


> What are the challenges for supporting OLD/NEW for foreign tables?

I didn't really look at that in any detail, but I don't think it
should be too hard. It's not something I want to tackle now though,
because the patch is big enough already.


> I think OLD/NEW with a way to define a custom alias when needed seems
> acceptable. Or at least I can't think of a clearly better solution. Yes,
> using some other name might not have this problem, but I guess we'd have
> to pick an existing keyword or add one. And Tom didn't seem thrilled
> with reserving a keyword in 2013 ...
>
> Plus I think there's value in consistency, and OLD/NEW seems way more
> natural that BEFORE/AFTER.

Yes, I think OLD/NEW are much nicer too.

Attached is a new patch, now with docs (no other code changes).

Regards,
Dean

Вложения

Re: Adding OLD/NEW support to RETURNING

От
jian he
Дата:
On Sat, Mar 9, 2024 at 3:53 AM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
>
> Attached is a new patch, now with docs (no other code changes).
>

Hi,
some issues I found, while playing around with
support-returning-old-new-v2.patch

doc/src/sgml/ref/update.sgml:
    [ RETURNING [ WITH ( { OLD | NEW } AS <replaceable
class="parameter">output_alias</replaceable> [, ...] ) ]
                * | <replaceable
class="parameter">output_expression</replaceable> [ [ AS ]
<replaceable class="parameter">output_name</replaceable> ] [, ...] ]
</synopsis>

There is no parameter explanation for `*`.
so, I think the synopsis may not cover cases like:
`
update foo set f3 = 443 RETURNING new.*;
`
I saw the explanation at output_alias, though.

-----------------------------------------------------------------------------
insert into foo select 1, 2 RETURNING old.*, new.f2, old.f1();
ERROR:  function old.f1() does not exist
LINE 1: ...sert into foo select 1, 2 RETURNING old.*, new.f2, old.f1();
                                                              ^
HINT:  No function matches the given name and argument types. You
might need to add explicit type casts.

I guess that's ok, slightly different context evaluation. if you say
"old.f1", old refers to the virtual table "old",
but "old.f1()", the "old" , reevaluate to the schema "old".
you need privilege to schema "old", you also need execution privilege
to function "old.f1()" to execute the above query.
so seems no security issue after all.
-----------------------------------------------------------------------------
I found a fancy expression:
`
CREATE  TABLE foo (f1 serial, f2 text, f3 int default 42);
insert into foo select 1, 2 union select 11, 22 RETURNING old.*,
new.f2, (select sum(new.f1) over());
`
is this ok?

also the following works on PG16, not sure it's a bug.
`
 insert into foo select 1, 2 union select 11, 22 RETURNING  (select count(*));
`
but not these
`
insert into foo select 1, 2 union select 11, 22 RETURNING  (select
count(old.*));
insert into foo select 1, 2 union select 11, 22 RETURNING  (select sum(f1));
`
-----------------------------------------------------------------------------
I found another interesting case, while trying to add some tests on
for new code in createplan.c.
in postgres_fdw.sql, right after line `MERGE ought to fail cleanly`

--this will work
insert into itrtest select 1, 'foo' returning new.*,old.*;
--these two will fail
insert into remp1 select 1, 'foo' returning new.*;
insert into remp1 select 1, 'foo' returning old.*;

itrtest is the partitioned non-foreign table.
remp1 is the partition of itrtest, foreign table.

------------------------------------------------------------------------------------------
I did find a segment fault bug.
insert into foo select 1, 2 RETURNING (select sum(old.f1) over());

This information is set in a subplan node.
/* update the ExprState's flags if Var refers to OLD/NEW */
if (variable->varreturningtype == VAR_RETURNING_OLD)
state->flags |= EEO_FLAG_HAS_OLD;
else if (variable->varreturningtype == VAR_RETURNING_NEW)
state->flags |= EEO_FLAG_HAS_NEW;

but in ExecInsert:
`
else if (resultRelInfo->ri_projectReturning->pi_state.flags & EEO_FLAG_HAS_OLD)
{
oldSlot = ExecGetReturningSlot(estate, resultRelInfo);
ExecStoreAllNullTuple(oldSlot);
oldSlot->tts_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
}
`
it didn't use subplan node state->flags information. so the ExecInsert
above code, never called, and should be executed.
however
`
insert into foo select 1, 2 RETURNING (select sum(new.f1)over());`
works

Similarly this
 `
delete from foo RETURNING (select sum(new.f1) over());
`
also causes segmentation fault.
------------------------------------------------------------------------------------------
diff --git a/src/include/executor/tuptable.h b/src/include/executor/tuptable.h
new file mode 100644
index 6133dbc..c9d3661
--- a/src/include/executor/tuptable.h
+++ b/src/include/executor/tuptable.h
@@ -411,12 +411,21 @@ slot_getsysattr(TupleTableSlot *slot, in
 {
  Assert(attnum < 0); /* caller error */

+ /*
+ * If the tid is not valid, there is no physical row, and all system
+ * attributes are deemed to be NULL, except for the tableoid.
+ */
  if (attnum == TableOidAttributeNumber)
  {
  *isnull = false;
  return ObjectIdGetDatum(slot->tts_tableOid);
  }
- else if (attnum == SelfItemPointerAttributeNumber)
+ if (!ItemPointerIsValid(&slot->tts_tid))
+ {
+ *isnull = true;
+ return PointerGetDatum(NULL);
+ }
+ if (attnum == SelfItemPointerAttributeNumber)
  {
  *isnull = false;
  return PointerGetDatum(&slot->tts_tid);

These changes is slot_getsysattr is somehow independ of this feature?



Re: Adding OLD/NEW support to RETURNING

От
Dean Rasheed
Дата:
On Sun, 10 Mar 2024 at 23:41, jian he <jian.universality@gmail.com> wrote:
>
> Hi,
> some issues I found, while playing around with
> support-returning-old-new-v2.patch
>

Thanks for testing. This is very useful.


> doc/src/sgml/ref/update.sgml:
>
> There is no parameter explanation for `*`.
> so, I think the synopsis may not cover cases like:
> `
> update foo set f3 = 443 RETURNING new.*;
> `
> I saw the explanation at output_alias, though.

"*" is documented under output_alias and output_expression. I'm not
sure that it makes sense to have a separate top-level parameter
section for it, because "*" is also something that can appear after
table_name, meaning something completely different, so it might get
confusing. Perhaps the explanation under output_expression can be
expanded a bit. I'll think about it some more.


> insert into foo select 1, 2 RETURNING old.*, new.f2, old.f1();
> ERROR:  function old.f1() does not exist
> LINE 1: ...sert into foo select 1, 2 RETURNING old.*, new.f2, old.f1();
>                                                               ^
> HINT:  No function matches the given name and argument types. You
> might need to add explicit type casts.

Yes, that's consistent with current behaviour. You can also write
foo.f1() or something_else.f1(). Anything of that form, with
parentheses, is interpreted as schema_name.function_name(), not as a
column reference.


> I found a fancy expression:
> `
> CREATE  TABLE foo (f1 serial, f2 text, f3 int default 42);
> insert into foo select 1, 2 union select 11, 22 RETURNING old.*,
> new.f2, (select sum(new.f1) over());
> `
> is this ok?

Yes, I guess it's OK, though not really useful in practice.

"new.f1" is 1 for the first row and 11 for the second. When you write
"(select sum(new.f1) over())", with no FROM clause, you're implicitly
evaluating over a table with 1 row in the subquery, so it just returns
new.f1.

This is the same as the standalone query

SELECT sum(11) OVER();
 sum
-----
  11
(1 row)

So it's likely that any window function can be used in a FROM-less
subquery inside a RETURNING expression. I can't think of any practical
use for it though. In any case, this isn't something new to this
patch.


> also the following works on PG16, not sure it's a bug.
> `
>  insert into foo select 1, 2 union select 11, 22 RETURNING  (select count(*));
> `

This is OK, because that subquery is an uncorrelated aggregate query
that doesn't reference the outer query. In this case, it's not very
interesting, because it lacks a FROM clause, so it just returns 1. But
you could also write "(SELECT count(*) FROM some_other_table WHERE
...)", and it would work because the aggregate function is evaluated
over the rows of the table in the subquery. That's more useful if the
subquery is made into a correlated subquery by referring to columns
from the outer query. The rules for that are documented here:


https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES:~:text=When%20an%20aggregate%20expression%20appears%20in%20a%20subquery


> but not these
> `
> insert into foo select 1, 2 union select 11, 22 RETURNING  (select
> count(old.*));
> insert into foo select 1, 2 union select 11, 22 RETURNING  (select sum(f1));
> `

In these cases, since the aggregate's arguments are all outer-level
variables, it is associated with the outer query, so it is rejected on
the grounds that aggregate functions aren't allowed in RETURNING.

It is allowed if that subquery has a FROM clause, since the aggregated
arguments are then treated as constants over the rows in the subquery,
so arguably the same could be made to happen without a FROM clause,
but there really is no practical use case for allowing that. Again,
this isn't something new to this patch.


> I found another interesting case, while trying to add some tests on
> for new code in createplan.c.
> in postgres_fdw.sql, right after line `MERGE ought to fail cleanly`
>
> --this will work
> insert into itrtest select 1, 'foo' returning new.*,old.*;
> --these two will fail
> insert into remp1 select 1, 'foo' returning new.*;
> insert into remp1 select 1, 'foo' returning old.*;
>
> itrtest is the partitioned non-foreign table.
> remp1 is the partition of itrtest, foreign table.

Hmm, I was a little surprised that that first example worked, but I
can see why now.

I was content to just say that RETURNING old/new wasn't supported for
foreign tables in this first version, but looking at it more closely,
the only tricky part is direct-modify updates. So if we just disable
direct-modify when there are OLD/NEW variables in the the RETURNING
list, then it "just works".

So I've done that, and added a few additional tests to
postgres_fdw.sql, and removed the doc notes about foreign tables not
being supported. I really thought that there would be more to it than
that, but it seems to work fine.


> I did find a segment fault bug.
> insert into foo select 1, 2 RETURNING (select sum(old.f1) over());
>
> This information is set in a subplan node.
> /* update the ExprState's flags if Var refers to OLD/NEW */
> if (variable->varreturningtype == VAR_RETURNING_OLD)
> state->flags |= EEO_FLAG_HAS_OLD;
> else if (variable->varreturningtype == VAR_RETURNING_NEW)
> state->flags |= EEO_FLAG_HAS_NEW;
>
> but in ExecInsert it didn't use subplan node state->flags information

Ah, good catch!

When recursively initialising a SubPlan, if any of its expressions is
found to contain OLD/NEW Vars, it needs to update the flags on the
parent ExprState. Fixed in the new version.


> @@ -411,12 +411,21 @@ slot_getsysattr(TupleTableSlot *slot, in
>  {
>   Assert(attnum < 0); /* caller error */
>
> + /*
> + * If the tid is not valid, there is no physical row, and all system
> + * attributes are deemed to be NULL, except for the tableoid.
> + */
>   if (attnum == TableOidAttributeNumber)
>   {
>   *isnull = false;
>   return ObjectIdGetDatum(slot->tts_tableOid);
>   }
> - else if (attnum == SelfItemPointerAttributeNumber)
> + if (!ItemPointerIsValid(&slot->tts_tid))
> + {
> + *isnull = true;
> + return PointerGetDatum(NULL);
> + }
> + if (attnum == SelfItemPointerAttributeNumber)
>   {
>   *isnull = false;
>   return PointerGetDatum(&slot->tts_tid);
>
> These changes is slot_getsysattr is somehow independ of this feature?

This is necessary because under some circumstances, when returning
old/new, the corresponding table slot may contain all NULLs and an
invalid ctid. For example, the old slot in an INSERT which didn't do
an ON CONFLICT UPDATE. So we need to guard against that, in case the
user tries to return old.ctid, for example. It's useful to always
return a non-NULL tableoid though, because that's a property of the
table, rather than the row.

Thanks for testing.

Attached is an updated patch, fixing the seg-fault and now with
support for foreign tables.

Regards,
Dean

Вложения

Re: Adding OLD/NEW support to RETURNING

От
Dean Rasheed
Дата:
On Mon, 11 Mar 2024 at 14:03, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> Attached is an updated patch, fixing the seg-fault and now with
> support for foreign tables.
>

Updated version attached tidying up a couple of things and fixing another bug:

1). Tidied up the code in createplan.c that was testing for old/new
Vars in the returning list, by adding a separate function --
contain_vars_returning_old_or_new() -- making it more reusable and
efficient.

2). Updated the deparsing code for EXPLAIN so that old/new Vars are
always prefixed with the alias, so that it's possible to tell them
apart in the EXPLAIN output.

3). Updated rewriteRuleAction() to preserve the old/new alias names in
the rewritten query. I think this was only relevant to the EXPLAIN
output.

4). Fixed a bug in assign_param_for_var() -- this needs to compare the
varreturningtype of the Vars, otherwise 2 different Vars could get
assigned the same Param. As the comment said, this needs to compare
everything that _equalVar() compares, except for the specific fields
listed. Otherwise a subquery like (select old.a = new.a) in the
returning list would only generate one Param for the two up-level
Vars, and produce the wrong result.

5). Removed the ParseState fields p_returning_old and p_returning_new
that weren't being used anymore.

Regards,
Dean

Вложения

Re: Adding OLD/NEW support to RETURNING

От
Dean Rasheed
Дата:
On Tue, 12 Mar 2024 at 18:21, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> Updated version attached tidying up a couple of things and fixing another bug:
>

Rebased version attached, on top of c649fa24a4 (MERGE ... RETURNING support).

This just extends the previous version to work with MERGE, adding a
few extra tests, which is all fairly straightforward.

Regards,
Dean

Вложения

Re: Adding OLD/NEW support to RETURNING

От
jian he
Дата:
On Mon, Mar 18, 2024 at 6:48 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> On Tue, 12 Mar 2024 at 18:21, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> >
> > Updated version attached tidying up a couple of things and fixing another bug:
> >
>
> Rebased version attached, on top of c649fa24a4 (MERGE ... RETURNING support).
>


hi, some minor issues I found out.

+/*
+ * ReplaceReturningVarsFromTargetList -
+ * replace RETURNING list Vars with items from a targetlist
+ *
+ * This is equivalent to calling ReplaceVarsFromTargetList() with a
+ * nomatch_option of REPLACEVARS_REPORT_ERROR, but with the added effect of
+ * copying varreturningtype onto any Vars referring to new_result_relation,
+ * allowing RETURNING OLD/NEW to work in the rewritten query.
+ */
+
+typedef struct
+{
+ ReplaceVarsFromTargetList_context rv_con;
+ int new_result_relation;
+} ReplaceReturningVarsFromTargetList_context;
+
+static Node *
+ReplaceReturningVarsFromTargetList_callback(Var *var,
+ replace_rte_variables_context *context)
+{
+ ReplaceReturningVarsFromTargetList_context *rcon =
(ReplaceReturningVarsFromTargetList_context *) context->callback_arg;
+ Node   *newnode;
+
+ newnode = ReplaceVarsFromTargetList_callback(var, context);
+
+ if (var->varreturningtype != VAR_RETURNING_DEFAULT)
+ SetVarReturningType((Node *) newnode, rcon->new_result_relation,
+ var->varlevelsup, var->varreturningtype);
+
+ return newnode;
+}
+
+Node *
+ReplaceReturningVarsFromTargetList(Node *node,
+   int target_varno, int sublevels_up,
+   RangeTblEntry *target_rte,
+   List *targetlist,
+   int new_result_relation,
+   bool *outer_hasSubLinks)
+{
+ ReplaceReturningVarsFromTargetList_context context;
+
+ context.rv_con.target_rte = target_rte;
+ context.rv_con.targetlist = targetlist;
+ context.rv_con.nomatch_option = REPLACEVARS_REPORT_ERROR;
+ context.rv_con.nomatch_varno = 0;
+ context.new_result_relation = new_result_relation;
+
+ return replace_rte_variables(node, target_varno, sublevels_up,
+ ReplaceReturningVarsFromTargetList_callback,
+ (void *) &context,
+ outer_hasSubLinks);
+}

the ReplaceReturningVarsFromTargetList related comment
should be placed right above the function ReplaceReturningVarsFromTargetList,
not above ReplaceReturningVarsFromTargetList_context?

struct ReplaceReturningVarsFromTargetList_context adds some comments
about new_result_relation would be great.


/* INDEX_VAR is handled by default case */
this comment appears in execExpr.c and execExprInterp.c.
need to move to default case's switch default case?


/*
 * set_deparse_context_plan - Specify Plan node containing expression
 *
 * When deparsing an expression in a Plan tree, we might have to resolve
 * OUTER_VAR, INNER_VAR, or INDEX_VAR references.  To do this, the caller must
 * provide the parent Plan node.
...
*/
does the comment in set_deparse_context_plan need to be updated?

+ * buildNSItemForReturning -
+ * add a ParseNamespaceItem for the OLD or NEW alias in RETURNING.
+ */
+static void
+addNSItemForReturning(ParseState *pstate, const char *aliasname,
+  VarReturningType returning_type)
comment "buildNSItemForReturning" should be "addNSItemForReturning"?


  * results.  If include_dropped is true then empty strings and NULL constants
  * (not Vars!) are returned for dropped columns.
  *
- * rtindex, sublevels_up, and location are the varno, varlevelsup, and location
- * values to use in the created Vars.  Ordinarily rtindex should match the
- * actual position of the RTE in its rangetable.
+ * rtindex, sublevels_up, returning_type, and location are the varno,
+ * varlevelsup, varreturningtype, and location values to use in the created
+ * Vars.  Ordinarily rtindex should match the actual position of the RTE in
+ * its rangetable.
we already updated the comment in expandRTE.
but it seems we only do RTE_RELATION, some part of RTE_FUNCTION.
do we need
`
varnode->varreturningtype = returning_type;
`
for other `rte->rtekind` when there is a makeVar?

(I don't understand this part, in the case where rte->rtekind is
RTE_SUBQUERY, if I add  `varnode->varreturningtype = returning_type;`
the tests still pass.



Re: Adding OLD/NEW support to RETURNING

От
Dean Rasheed
Дата:
On Mon, 18 Mar 2024 at 10:48, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> Rebased version attached, on top of c649fa24a4 (MERGE ... RETURNING support).
>

I have been doing more testing of this and I realised that there was a
problem -- the previous patch worked fine when updating a regular
table, so that old/new.colname is just a Var, but when updating an
auto-updatable view, "colname" could end up being replaced by an
arbitrary expression. In the cases I had tested before, that appeared
to work OK, but actually it wasn't right in all cases where the result
should have been NULL, due to the old/new row being absent (e.g., the
old row in an INSERT).

After thinking about that for a while, the best solution seemed to be
to add a new executable node, which I've called ReturningExpr. This
evaluates the old/new expression if the old/new row exists, but skips
it and returns NULL if the old/new row doesn't exist. The simplest
example is a query like this, which now returns what I would expect:

DROP TABLE IF EXISTS tt CASCADE;
CREATE TABLE tt (a int PRIMARY KEY, b text);
INSERT INTO tt VALUES (1, 'R1');
CREATE VIEW tv AS SELECT a, b, 'Const' c FROM tt;

INSERT INTO tv VALUES (1, 'Row 1'), (2, 'Row 2')
  ON CONFLICT (a) DO UPDATE SET b = excluded.b
  RETURNING old.*, new.*;

 a | b  |   c   | a |   b   |   c
---+----+-------+---+-------+-------
 1 | R1 | Const | 1 | Row 1 | Const
   |    |       | 2 | Row 2 | Const
(2 rows)

(Previously that was returning old.c = 'Const' in both rows, because
the Const node has no old/new qualification.)

In EXPLAIN, I opted to display this as "old/new.(expression)", to make
it clear that the expression is being evaluated in the context of the
old/new row, even if it doesn't directly refer to old/new values from
the table. So, for example, the plan for the above query looks like
this:

                                   QUERY PLAN
--------------------------------------------------------------------------------
 Insert on public.tt
   Output: old.a, old.b, old.('Const'::text), new.a, new.b, new.('Const'::text)
   Conflict Resolution: UPDATE
   Conflict Arbiter Indexes: tt_pkey
   ->  Values Scan on "*VALUES*"
         Output: "*VALUES*".column1, "*VALUES*".column2

(It can't output "old.c" or "new.c" because all knowledge of the view
column "c" is gone by the time it has been through the rewriter, and
in any case, the details of the expression being evaluated are likely
to be useful in general.)

Things get more complicated when subqueries are involved. For example,
given this view definition:

CREATE VIEW tv AS SELECT a, b, (SELECT concat('b=',b)) c FROM tt;

the INSERT above produces this:

 a | b  |  c   | a |   b   |    c
---+----+------+---+-------+---------
 1 | R1 | b=R1 | 1 | Row 1 | b=Row 1
   |    |      | 2 | Row 2 | b=Row 2
(2 rows)

which is as expected. This uses the following query plan:

                                 QUERY PLAN
----------------------------------------------------------------------------
 Insert on public.tt
   Output: old.a, old.b, old.((SubPlan 1)), new.a, new.b, new.((SubPlan 2))
   Conflict Resolution: UPDATE
   Conflict Arbiter Indexes: tt_pkey
   ->  Values Scan on "*VALUES*"
         Output: "*VALUES*".column1, "*VALUES*".column2
   SubPlan 1
     ->  Result
           Output: concat('b=', old.b)
   SubPlan 2
     ->  Result
           Output: concat('b=', new.b)

In this case "b" in the view subquery becomes "old.b" in SubPlan 1 and
"new.b" in SubPlan 2 (each with varlevelsup = 1, and therefore
evaluated as input params to the subplans). The concat() result would
normally always be non-NULL, but it (or rather the SubLink subquery
containing it) is wrapped in a ReturningExpr. As a result, SubPlan 1
is skipped in the second row, for which old does not exist, and ends
up only being executed once in that query, whereas SubPlan 2 is
executed twice.

Things get even more fiddly when the old/new expression itself appears
in a subquery. For example, given the following query:

INSERT INTO tv VALUES (1, 'Row 1'), (2, 'Row 2')
  ON CONFLICT (a) DO UPDATE SET b = excluded.b
  RETURNING old.a, old.b, (SELECT old.c), new.*;

the result is the same, but the query plan is now

                              QUERY PLAN
----------------------------------------------------------------------
 Insert on public.tt
   Output: old.a, old.b, (SubPlan 2), new.a, new.b, new.((SubPlan 3))
   Conflict Resolution: UPDATE
   Conflict Arbiter Indexes: tt_pkey
   ->  Values Scan on "*VALUES*"
         Output: "*VALUES*".column1, "*VALUES*".column2
   SubPlan 1
     ->  Result
           Output: concat('b=', old.b)
   SubPlan 2
     ->  Result
           Output: (old.((SubPlan 1)))
   SubPlan 3
     ->  Result
           Output: concat('b=', new.b)

The ReturningExpr nodes belong to the query level containing the
RETURNING list (hence they have a "levelsup" field, like Var,
PlaceHolderVar, etc.). So in this example, one of the ReturningExpr
nodes is in SubPlan 2, with "levelsup" = 1, wrapping SubPlan 1, i.e.,
it only executes SubPlan 1 if the old row exists.

Although that all sounds quite complicated, all the individual pieces
are quite simple.

Attached is an updated patch in which I have also tidied up a few
other things, but I haven't read your latest review comments yet. I'll
respond to those separately.

Regards,
Dean

Вложения

Re: Adding OLD/NEW support to RETURNING

От
Dean Rasheed
Дата:
On Mon, 25 Mar 2024 at 00:00, jian he <jian.universality@gmail.com> wrote:
>
> hi, some minor issues I found out.
>
> the ReplaceReturningVarsFromTargetList related comment
> should be placed right above the function ReplaceReturningVarsFromTargetList,
> not above ReplaceReturningVarsFromTargetList_context?

Hmm, well there are a mix of possible styles for this kind of
function. Sometimes the outer function comes first, immediately after
the function comment, and then the callback function comes after that.
That has the advantage that all documentation comments related to the
top-level input arguments are next to the function that takes them.
Also, this ordering means that you naturally read it in the order in
which it is initially executed.

The other style, putting the callback function first has the advantage
that you can more immediately see what the function does, since it's
usually the callback that contains the interesting logic.

rewriteManip.c has examples of both styles, but in this case, since
ReplaceReturningVarsFromTargetList() is similar to
ReplaceVarsFromTargetList(), I opted to copy its style.

> struct ReplaceReturningVarsFromTargetList_context adds some comments
> about new_result_relation would be great.

I substantially rewrote that function in the v6 patch. As part of
that, I renamed "new_result_relation" to "new_target_varno", which
more closely matches the existing "target_varno" argument, and I added
comments about what it's for to the top-level function comment block.

> /* INDEX_VAR is handled by default case */
> this comment appears in execExpr.c and execExprInterp.c.
> need to move to default case's switch default case?

No, I think it's fine as it is. Its current placement is where you
might otherwise expect to find a "case INDEX_VAR:" block of code, and
it's explaining why there isn't one there, and where to look instead.

Moving it into the switch default case would lose that effect, and I
think it would reduce the code's readability.

> /*
>  * set_deparse_context_plan - Specify Plan node containing expression
>  *
>  * When deparsing an expression in a Plan tree, we might have to resolve
>  * OUTER_VAR, INNER_VAR, or INDEX_VAR references.  To do this, the caller must
>  * provide the parent Plan node.
> ...
> */
> does the comment in set_deparse_context_plan need to be updated?

In the v6 patch, I moved the code change from
set_deparse_context_plan() down into set_deparse_plan(), because I
thought that would catch more cases, but thinking about it some more,
that wasn't necessary, since it won't change when moving up and down
the ancestor tree. So in v7, I've moved it back and updated the
comment.

> + * buildNSItemForReturning -
> + * add a ParseNamespaceItem for the OLD or NEW alias in RETURNING.
> + */
> +static void
> +addNSItemForReturning(ParseState *pstate, const char *aliasname,
> +  VarReturningType returning_type)
> comment "buildNSItemForReturning" should be "addNSItemForReturning"?

Yes, well spotted. Fixed in v7.

> [in expandRTE()]
>
> - * rtindex, sublevels_up, and location are the varno, varlevelsup, and location
> - * values to use in the created Vars.  Ordinarily rtindex should match the
> - * actual position of the RTE in its rangetable.
> + * rtindex, sublevels_up, returning_type, and location are the varno,
> + * varlevelsup, varreturningtype, and location values to use in the created
> + * Vars.  Ordinarily rtindex should match the actual position of the RTE in
> + * its rangetable.
> we already updated the comment in expandRTE.
> but it seems we only do RTE_RELATION, some part of RTE_FUNCTION.
> do we need
> `
> varnode->varreturningtype = returning_type;
> `
> for other `rte->rtekind` when there is a makeVar?
>
> (I don't understand this part, in the case where rte->rtekind is
> RTE_SUBQUERY, if I add  `varnode->varreturningtype = returning_type;`
> the tests still pass.

In the v6 patch, I already added code to ensure that it's set in all
cases, though I don't think it's strictly necessary. returning_type
can only have a non-default value for the target RTE, which can't
currently be any of those other RTE kinds, but nonetheless it seemed
better from a consistency point-of-view, and to make it more
future-proof.

v7 patch attached, with those updates.

Regards,
Dean

Вложения

Re: Adding OLD/NEW support to RETURNING

От
Dean Rasheed
Дата:
On Mon, 25 Mar 2024 at 14:04, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> v7 patch attached, with those updates.
>

Rebased version attached, forced by 87985cc925.

The changes made in that commit didn't entirely make sense to me, but
the ExecDelete() change, copying data between slots, broke this patch,
because it wasn't setting the slot's tableoid. That copying seemed to
be unnecessary anyway, so I got rid of it, and it works fine. While at
it, I also removed the extra "oldslot" argument added to ExecDelete(),
which didn't seem necessary, and wasn't documented clearly. Those
changes could perhaps be extracted and applied separately.

Regards,
Dean

Вложения

Re: Adding OLD/NEW support to RETURNING

От
Jeff Davis
Дата:
On Tue, 2024-03-26 at 18:49 +0000, Dean Rasheed wrote:
> On Mon, 25 Mar 2024 at 14:04, Dean Rasheed <dean.a.rasheed@gmail.com>
> wrote:
> >
> > v7 patch attached, with those updates.
> >
>
> Rebased version attached, forced by 87985cc925.

This isn't a complete review, but I spent a while looking at this, and
it looks like it's in good shape.

I like the syntax, and I think the solution for renaming the alias
("RETURNING WITH (new as n, old as o)") is a good one.

The implementation touches quite a few areas. How did you identify all
of the potential problem areas? It seems the primary sources of
complexity came from rules, partitioning, and updatable views, is that
right?

Regards,
    Jeff Davis




Re: Adding OLD/NEW support to RETURNING

От
Dean Rasheed
Дата:
On Wed, 27 Mar 2024 at 07:47, Jeff Davis <pgsql@j-davis.com> wrote:
>
> This isn't a complete review, but I spent a while looking at this, and
> it looks like it's in good shape.

Thanks for looking.

> I like the syntax, and I think the solution for renaming the alias
> ("RETURNING WITH (new as n, old as o)") is a good one.

Thanks, that's good to know. Settling on a good syntax can be
difficult, so it's good to know that people are generally supportive
of this.

> The implementation touches quite a few areas. How did you identify all
> of the potential problem areas?

Hmm, well that's one of the hardest parts, and it's really difficult
to be sure that I have.

Initially, when I was just adding a new field to Var, I just tried to
look at all the existing code that made Vars, or copied other
non-default fields like varnullingrels around. I still managed to miss
the necessary change in assign_param_for_var() on my first attempt,
but fortunately that was an easy fix.

More worrying was the fact that I managed to completely overlook the
fact that I needed to worry about non-updatable columns in
auto-updatable views until v6, which added the ReturningExpr node.
Once I realised that I needed that, and that it needed to be tied to a
particular query level, and so needed a "levelsup" field, I just
looked at GroupingFunc to identify the places in code that needed to
be updated to do the right thing for a query-level-aware node.

What I'm most worried about now is that there are other areas of
functionality like that, that I'm overlooking, and which will interact
with this feature in non-trivial ways.

> It seems the primary sources of
> complexity came from rules, partitioning, and updatable views, is that
> right?

Foreign tables looked like it would be tricky at first, but then
turned out to be trivial, after disallowing direct-modify when
returning old/new.

Rules are a whole area that I wish I didn't have to worry about (I
wish we had deprecated them a long time ago). In practice though, I
haven't done much beyond what seemed like the most obvious (and
simplest) thing.

Nonetheless, there are some interesting interactions that probably
need more careful examination. For example, the fact that the
RETURNING clause in a RULE already has its own "special table names"
OLD and NEW, which are actually references to different RTEs, unlike
the OLD and NEW that this patch introduces, which are references to
the result relation. This leads to a number of different cases:

Case 1
======

In the simplest case, the rule can simply contain "RETURNING *". This
leads to what I think is the most obvious and intuitive behaviour:

DROP TABLE IF EXISTS t1, t2 CASCADE;
CREATE TABLE t1 (val1 text);
INSERT INTO t1 VALUES ('Old value 1');
CREATE TABLE t2 (val2 text);
INSERT INTO t2 VALUES ('Old value 2');

CREATE RULE r2 AS ON UPDATE TO t2
  DO INSTEAD UPDATE t1 SET val1 = NEW.val2
  RETURNING *;

UPDATE t2 SET val2 = 'New value 2'
  RETURNING old.val2 AS old_val2, new.val2 AS new_val2,
            t2.val2 AS t2_val2, val2;

  old_val2   |  new_val2   |   t2_val2   |    val2
-------------+-------------+-------------+-------------
 Old value 1 | New value 2 | New value 2 | New value 2
(1 row)

So someone using the table with the rule can access old and new values
in the obvious way, and they will get new values by default for an
UPDATE.

The query plan for this is pretty-much what you'd expect:

                      QUERY PLAN
-------------------------------------------------------
 Update on public.t1
   Output: old.val1, new.val1, t1.val1, t1.val1
   ->  Nested Loop
         Output: 'New value 2'::text, t1.ctid, t2.ctid
         ->  Seq Scan on public.t1
               Output: t1.ctid
         ->  Materialize
               Output: t2.ctid
               ->  Seq Scan on public.t2
                     Output: t2.ctid

Case 2
======

If the rule contains "RETURNING OLD.*", it means that the RETURNING
list of the rewritten query contains Vars that no longer refer to the
result relation, but instead refer to the old data in t2. This leads
the the following behaviour:

DROP TABLE IF EXISTS t1, t2 CASCADE;
CREATE TABLE t1 (val1 text);
INSERT INTO t1 VALUES ('Old value 1');
CREATE TABLE t2 (val2 text);
INSERT INTO t2 VALUES ('Old value 2');

CREATE RULE r2 AS ON UPDATE TO t2
  DO INSTEAD UPDATE t1 SET val1 = NEW.val2
  RETURNING OLD.*;

UPDATE t2 SET val2 = 'New value 2'
  RETURNING old.val2 AS old_val2, new.val2 AS new_val2,
            t2.val2 AS t2_val2, val2;

  old_val2   |  new_val2   |   t2_val2   |    val2
-------------+-------------+-------------+-------------
 Old value 2 | Old value 2 | Old value 2 | Old value 2
(1 row)

The reason this happens is that the Vars in the returning list don't
refer to the result relation, and so setting varreturningtype on them
has no effect, and is simply ignored. This can be seen by looking at
the query plan:

                           QUERY PLAN
----------------------------------------------------------------
 Update on public.t1
   Output: old.(t2.val2), new.(t2.val2), t2.val2, t2.val2
   ->  Nested Loop
         Output: 'New value 2'::text, t1.ctid, t2.ctid, t2.val2
         ->  Seq Scan on public.t1
               Output: t1.ctid
         ->  Materialize
               Output: t2.ctid, t2.val2
               ->  Seq Scan on public.t2
                     Output: t2.ctid, t2.val2

So all the final output values come from t2, not the result relation t1.

Case 3
======

Similarly, if the rule contains "RETURNING NEW.*", the effect is
similar, because again, the Vars in the RETURNING list don't refer to
the result relation in the rewritten query:

DROP TABLE IF EXISTS t1, t2 CASCADE;
CREATE TABLE t1 (val1 text);
INSERT INTO t1 VALUES ('Old value 1');
CREATE TABLE t2 (val2 text);
INSERT INTO t2 VALUES ('Old value 2');

CREATE RULE r2 AS ON UPDATE TO t2
  DO INSTEAD UPDATE t1 SET val1 = NEW.val2
  RETURNING NEW.*;

UPDATE t2 SET val2 = 'New value 2'
  RETURNING old.val2 AS old_val2, new.val2 AS new_val2,
            t2.val2 AS t2_val2, val2;

  old_val2   |  new_val2   |   t2_val2   |    val2
-------------+-------------+-------------+-------------
 New value 2 | New value 2 | New value 2 | New value 2
(1 row)

This time, the query plan shows that the result values are coming from
the new source values:

                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Update on public.t1
   Output: old.('New value 2'::text), new.('New value 2'::text), 'New
value 2'::text, 'New value 2'::text
   ->  Nested Loop
         Output: 'New value 2'::text, t1.ctid, t2.ctid
         ->  Seq Scan on public.t1
               Output: t1.ctid
         ->  Materialize
               Output: t2.ctid
               ->  Seq Scan on public.t2
                     Output: t2.ctid

Case 4
======

It's also possible to use the new returning old/new syntax in the
rule, by defining custom aliases. This has a subtly different meaning,
because it indicates that Vars in the rewritten query should refer to
the result relation, with varreturningtype set accordingly. So, for
example, returning old in the rule using this technique leads to the
following behaviour:

DROP TABLE IF EXISTS t1, t2 CASCADE;
CREATE TABLE t1 (val1 text);
INSERT INTO t1 VALUES ('Old value 1');
CREATE TABLE t2 (val2 text);
INSERT INTO t2 VALUES ('Old value 2');

CREATE RULE r2 AS ON UPDATE TO t2
  DO INSTEAD UPDATE t1 SET val1 = NEW.val2
  RETURNING WITH (OLD AS o) o.*;

UPDATE t2 SET val2 = 'New value 2'
  RETURNING old.val2 AS old_val2, new.val2 AS new_val2,
            t2.val2 AS t2_val2, val2;

  old_val2   |  new_val2   |   t2_val2   |    val2
-------------+-------------+-------------+-------------
 Old value 1 | New value 2 | Old value 1 | Old value 1
(1 row)

The query plan for this indicates that all returned values now come
from the result relation, but the default is to return old values
rather than new values, and it now allows that default to be
overridden:

                      QUERY PLAN
-------------------------------------------------------
 Update on public.t1
   Output: old.val1, new.val1, old.val1, old.val1
   ->  Nested Loop
         Output: 'New value 2'::text, t1.ctid, t2.ctid
         ->  Seq Scan on public.t1
               Output: t1.ctid
         ->  Materialize
               Output: t2.ctid
               ->  Seq Scan on public.t2
                     Output: t2.ctid

Case 5
======

Similarly, the rule can use the new syntax to return new values:

DROP TABLE IF EXISTS t1, t2 CASCADE;
CREATE TABLE t1 (val1 text);
INSERT INTO t1 VALUES ('Old value 1');
CREATE TABLE t2 (val2 text);
INSERT INTO t2 VALUES ('Old value 2');

CREATE RULE r2 AS ON UPDATE TO t2
  DO INSTEAD UPDATE t1 SET val1 = NEW.val2
  RETURNING WITH (NEW AS n) n.*;

UPDATE t2 SET val2 = 'New value 2'
  RETURNING old.val2 AS old_val2, new.val2 AS new_val2,
            t2.val2 AS t2_val2, val2;

  old_val2   |  new_val2   |   t2_val2   |    val2
-------------+-------------+-------------+-------------
 Old value 1 | New value 2 | New value 2 | New value 2
(1 row)

which is the same result as case 1, but with a slightly different query plan:

                      QUERY PLAN
-------------------------------------------------------
 Update on public.t1
   Output: old.val1, new.val1, new.val1, new.val1
   ->  Nested Loop
         Output: 'New value 2'::text, t1.ctid, t2.ctid
         ->  Seq Scan on public.t1
               Output: t1.ctid
         ->  Materialize
               Output: t2.ctid
               ->  Seq Scan on public.t2
                     Output: t2.ctid

This explicitly sets the defaults for "t2.val2" and "val2"
unqualified, whereas in case 1 they were the implicit defaults for an
UPDATE command.

I think that all that is probably reasonable, but it definitely needs
documenting, which I haven't attempted yet.

Overall, I'm pretty hesitant to try to commit this to v17. Aside from
the fact that there's a lot of new code that hasn't had much in the
way of review or discussion, I also feel that I probably haven't fully
considered all areas where additional complexity might arise. It
doesn't seem like that long ago that this was just a prototype, and
it's certainly not that long ago that I had to add a substantial
amount of new code to deal with the auto-updatable view case that I
had completely overlooked.

So on reflection, rather than trying to rush to get this into v17, I
think it would be better to leave it to v18.

Regards,
Dean



Re: Adding OLD/NEW support to RETURNING

От
Jeff Davis
Дата:
On Wed, 2024-03-27 at 13:19 +0000, Dean Rasheed wrote:

> What I'm most worried about now is that there are other areas of
> functionality like that, that I'm overlooking, and which will
> interact
> with this feature in non-trivial ways.

Agreed. I'm not sure exactly how we'd find those other areas (if they
exist) aside from just having more eyes on the code.

>
> So on reflection, rather than trying to rush to get this into v17, I
> think it would be better to leave it to v18.

Thank you for letting me know. That allows some time for others to have
a look.

Regards,
    Jeff Davis




Re: Adding OLD/NEW support to RETURNING

От
Dean Rasheed
Дата:
Rebased version attached, on top of 0294df2f1f (MERGE .. WHEN NOT
MATCHED BY SOURCE), with a few additional tests. No code changes, just
keeping it up to date.

Regards,
Dean

Вложения