Обсуждение: problem with RETURNING and update row movement

Поиск
Список
Период
Сортировка

problem with RETURNING and update row movement

От
Amit Langote
Дата:
Hi,

While working on [1], I came across a bug.

Reproduction steps:

create table foo (a int, b int) partition by list (a);
create table foo1 (c int, b int, a int);
alter table foo1 drop c;
alter table foo attach partition foo1 for values in (1);
create table foo2 partition of foo for values in (2);
create table foo3 partition of foo for values in (3);
create or replace function trigfunc () returns trigger language
plpgsql as $$ begin new.b := 2; return new; end; $$;
create trigger trig before insert on foo2 for each row execute
function trigfunc();
insert into foo values (1, 1), (2, 2), (3, 3);
update foo set a = 2 from (values (1), (2), (3)) s(x) where a = s.x returning *;
ERROR:  attribute 5 of type record has wrong type
DETAIL:  Table has type record, but query expects integer.

The error occurs when projecting the RETURNING list.  The problem is
that the projection being used when the error occurs belongs to result
relation foo2 which is the destination partition of a row movement
operation, but it's trying to access a column in the tuple produced by
the plan belonging to foo1, the source partition of the row movement.
 foo2's RETURNING projection can only work correctly when it's being
fed tuples from the plan belonging to foo2.

Note that the targetlists of the plans belonging to different result
relations can be different depending on the respective relation's
tuple descriptors, so are not interchangeable.  Also, each result
relation's RETURNING list is made to refer to its own plan's output.
Without row movement, there is only one result relation to consider,
so there's no confusion regarding which RETURNING list to compute.
With row movement however, while there is only one plan tuple, there
are two result relations to consider each with its own RETURNING list.
I think we should be computing the *source* relation's RETURNING list,
because only that one of the two can consume the plan tuple correctly.
Attached is a patch that fixes things to be that way.

By the way, the problem exists since PG 11 when UPDATE row movement
feature went in.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/CA%2BHiwqH-2sq-3Zq-CtuWjfRSyrGPXJBf1nCKKvTHuGVyfQ1OYA%40mail.gmail.com

Вложения

Re: problem with RETURNING and update row movement

От
Etsuro Fujita
Дата:
Hi Amit-san,

On Thu, Jun 11, 2020 at 6:10 PM Amit Langote <amitlangote09@gmail.com> wrote:
> Reproduction steps:
>
> create table foo (a int, b int) partition by list (a);
> create table foo1 (c int, b int, a int);
> alter table foo1 drop c;
> alter table foo attach partition foo1 for values in (1);
> create table foo2 partition of foo for values in (2);
> create table foo3 partition of foo for values in (3);
> create or replace function trigfunc () returns trigger language
> plpgsql as $$ begin new.b := 2; return new; end; $$;
> create trigger trig before insert on foo2 for each row execute
> function trigfunc();
> insert into foo values (1, 1), (2, 2), (3, 3);
> update foo set a = 2 from (values (1), (2), (3)) s(x) where a = s.x returning *;
> ERROR:  attribute 5 of type record has wrong type
> DETAIL:  Table has type record, but query expects integer.

Reproduced.  Could you add the patch to the next commitfest so that it
doesn't get lost?

Best regards,
Etsuro Fujita



Re: problem with RETURNING and update row movement

От
Amit Langote
Дата:
On Sun, Jun 14, 2020 at 4:23 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> Hi Amit-san,
>
> On Thu, Jun 11, 2020 at 6:10 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > Reproduction steps:
> >
> > create table foo (a int, b int) partition by list (a);
> > create table foo1 (c int, b int, a int);
> > alter table foo1 drop c;
> > alter table foo attach partition foo1 for values in (1);
> > create table foo2 partition of foo for values in (2);
> > create table foo3 partition of foo for values in (3);
> > create or replace function trigfunc () returns trigger language
> > plpgsql as $$ begin new.b := 2; return new; end; $$;
> > create trigger trig before insert on foo2 for each row execute
> > function trigfunc();
> > insert into foo values (1, 1), (2, 2), (3, 3);
> > update foo set a = 2 from (values (1), (2), (3)) s(x) where a = s.x returning *;
> > ERROR:  attribute 5 of type record has wrong type
> > DETAIL:  Table has type record, but query expects integer.
>
> Reproduced.  Could you add the patch to the next commitfest so that it
> doesn't get lost?

Done.  Thank you for taking a look.

https://commitfest.postgresql.org/28/2597/

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



RE: problem with RETURNING and update row movement

От
"osumi.takamichi@fujitsu.com"
Дата:
Amit san


Hello. I've tested your patch.

This patch can be applied comfortably and make check-world has produced no failure.

I didn't do performance test
because this patch doesn't have an effect on it in my opinion.

I reproduced the bug you described before,
which can be prevented by your patch currently.

After applying your patch, I've gotten a correct output without error
using the test case in the 1st mail of this thread.

Just small comment about your patch.
I felt the test you added in update.sql could be simpler or shorter in other form.
Excuse me if I say something silly.
It's because I supposed you can check the bug is prevented without definitions of both a function and its trigger for
thiscase. Neither of them is essentially connected with the row movement between source partition and destination
partitionand can be replaced by simpler expression ?
 

Best,
    Takamichi Osumi

Re: problem with RETURNING and update row movement

От
Amit Langote
Дата:
Hi Takamichi-san,

On Tue, Jul 14, 2020 at 8:26 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:
>
> Amit san
>
>
> Hello. I've tested your patch.

Thanks for that.

> Just small comment about your patch.
> I felt the test you added in update.sql could be simpler or shorter in other form.
> Excuse me if I say something silly.
> It's because I supposed you can check the bug is prevented without definitions of both a function and its trigger for
thiscase. Neither of them is essentially connected with the row movement between source partition and destination
partitionand can be replaced by simpler expression ? 

Well, it's true that the function and the trigger have nothing to do
with the main bug, but it's often good to be sure that the bug-fix
isn't breaking cases where they are present and have visible effect.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: problem with RETURNING and update row movement

От
Amit Kapila
Дата:
On Thu, Jun 11, 2020 at 2:40 PM Amit Langote <amitlangote09@gmail.com> wrote:
>
> Hi,
>
> While working on [1], I came across a bug.
>
> Reproduction steps:
>
> create table foo (a int, b int) partition by list (a);
> create table foo1 (c int, b int, a int);
> alter table foo1 drop c;
> alter table foo attach partition foo1 for values in (1);
> create table foo2 partition of foo for values in (2);
> create table foo3 partition of foo for values in (3);
> create or replace function trigfunc () returns trigger language
> plpgsql as $$ begin new.b := 2; return new; end; $$;
> create trigger trig before insert on foo2 for each row execute
> function trigfunc();
> insert into foo values (1, 1), (2, 2), (3, 3);
> update foo set a = 2 from (values (1), (2), (3)) s(x) where a = s.x returning *;
> ERROR:  attribute 5 of type record has wrong type
> DETAIL:  Table has type record, but query expects integer.
>
> The error occurs when projecting the RETURNING list.  The problem is
> that the projection being used when the error occurs belongs to result
> relation foo2 which is the destination partition of a row movement
> operation, but it's trying to access a column in the tuple produced by
> the plan belonging to foo1, the source partition of the row movement.
>  foo2's RETURNING projection can only work correctly when it's being
> fed tuples from the plan belonging to foo2.
>

IIUC, here the problem is related to below part of code:
ExecInsert(..)
{
/* Process RETURNING if present */
if (resultRelInfo->ri_projectReturning)
result = ExecProcessReturning(resultRelInfo, slot, planSlot);
..
}

The reason is that planSlot is for foo1 and slot is for foo2 and when
it tries to access tuple during ExecProcessReturning(), it results in
an error.  Is my understanding correct? If so, then can't we ensure
someway that planSlot also belongs to foo2 instead of skipping return
processing in Insert and then later do more work to perform in Update.

Like Takamichi-san, I also think here we don't need trigger/function
in the test case.  If one reads the comment you have added in the test
case, it is not evident why is trigger or function required.  If you
really think it is important to cover the trigger case then either
have a separate test or at least add some comments on how trigger
helps here or what you want to test it.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: problem with RETURNING and update row movement

От
Amit Langote
Дата:
Hi Amit,

Thanks for taking a look at this.

On Mon, Jul 20, 2020 at 8:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> IIUC, here the problem is related to below part of code:
> ExecInsert(..)
> {
> /* Process RETURNING if present */
> if (resultRelInfo->ri_projectReturning)
> result = ExecProcessReturning(resultRelInfo, slot, planSlot);
> ..
> }
>
> The reason is that planSlot is for foo1 and slot is for foo2 and when
> it tries to access tuple during ExecProcessReturning(), it results in
> an error.  Is my understanding correct?

Yes.  Specifically, the problem exists if there are any non-target
relation attributes in RETURNING which are computed by referring to
planSlot, the plan's output tuple, which may be shaped differently
among result relations due to their tuple descriptors being different.

> If so, then can't we ensure
> someway that planSlot also belongs to foo2 instead of skipping return
> processing in Insert and then later do more work to perform in Update.

I did consider that option but failed to see a way to make it work.

I am not sure if there is a way to make a copy of the plan's output
tuple (planSlot) that is compatible with the destination partition.
Simple conversion using execute_attr_map_slot() is okay when we know
the source and the target slots contain relation tuples, but plan's
output tuples will also have other junk attributes.  Also, not all
destination partitions have an associated plan and hence a slot to
hold plan tuples.

> Like Takamichi-san, I also think here we don't need trigger/function
> in the test case.  If one reads the comment you have added in the test
> case, it is not evident why is trigger or function required.  If you
> really think it is important to cover the trigger case then either
> have a separate test or at least add some comments on how trigger
> helps here or what you want to test it.

That's fair.  I have updated the test comment.

To expand on that here, because now we'll be computing RETURNING using
the source partition's projection and the tuple in the source
partition's format, I wanted to make sure that any changes made by the
destination partition's triggers are reflected in the output.

PFA v2.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: problem with RETURNING and update row movement

От
Etsuro Fujita
Дата:
On Wed, Jul 22, 2020 at 3:16 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Mon, Jul 20, 2020 at 8:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > IIUC, here the problem is related to below part of code:
> > ExecInsert(..)
> > {
> > /* Process RETURNING if present */
> > if (resultRelInfo->ri_projectReturning)
> > result = ExecProcessReturning(resultRelInfo, slot, planSlot);
> > ..
> > }
> >
> > The reason is that planSlot is for foo1 and slot is for foo2 and when
> > it tries to access tuple during ExecProcessReturning(), it results in
> > an error.  Is my understanding correct?
>
> Yes.  Specifically, the problem exists if there are any non-target
> relation attributes in RETURNING which are computed by referring to
> planSlot, the plan's output tuple, which may be shaped differently
> among result relations due to their tuple descriptors being different.
>
> > If so, then can't we ensure
> > someway that planSlot also belongs to foo2 instead of skipping return
> > processing in Insert and then later do more work to perform in Update.
>
> I did consider that option but failed to see a way to make it work.
>
> I am not sure if there is a way to make a copy of the plan's output
> tuple (planSlot) that is compatible with the destination partition.
> Simple conversion using execute_attr_map_slot() is okay when we know
> the source and the target slots contain relation tuples, but plan's
> output tuples will also have other junk attributes.  Also, not all
> destination partitions have an associated plan and hence a slot to
> hold plan tuples.

Yeah, I think it might be possible to create planSlot to pass to
ExecInsert() so that we can process RETURNING within that function,
but even if so, that would be cumbersome not only because partitions
can have different rowtypes but because they can have different junk
columns as well, because e.g., subplan foreign partitions may have
different row ID columns as junk columns.  The proposed patch is
simple, so I would vote for it.  (Note: in case of a foreign
partition, we call ExecForeignInsert() with the source partition’s
planSlot in ExecInsert(), which is not correct, but that would sbe OK
because it seems unlikely that the FDW would look at the planSlot for
INSERT.)

One thing I noticed is that the patch changes the existing behavior.
Here is an example:

create table range_parted (a text, b bigint) partition by range (a, b);
create table part_a_1_a_10 partition of range_parted for values from
('a', 1) to ('a', 10);
create table part_b_1_b_10 partition of range_parted for values from
('b', 1) to ('b', 10);
create function trigfunc() returns trigger as $$ begin return null;
end; $$ language plpgsql;
create trigger trig before insert on part_b_1_b_10 for each row
execute function trigfunc();
insert into range_parted values ('a', 1);

In HEAD:

postgres=# update range_parted r set a = 'b' from (values ('a', 1))
s(x, y) where s.x = r.a and s.y = r.b returning tableoid::regclass,
r.*;
 tableoid | a | b
----------+---+---
(0 rows)

UPDATE 0

But with the patch:

postgres=# update range_parted r set a = 'b' from (values ('a', 1))
s(x, y) where s.x = r.a and s.y = r.b returning tableoid::regclass,
r.*;
   tableoid    | a | b
---------------+---+---
 part_a_1_a_10 | b | 1
(1 row)

UPDATE 1

This produces RETURNING, though INSERT on the destination partition
was skipped by the trigger.

Another thing is that the patch assumes that the tuple slot to pass to
ExecInsert() would store the inserted tuple when doing that function,
but that’s not always true, because in case of a foreign partition,
the FDW may return a slot other than the passed-in slot when called
from ExecForeignInsert(), in which case the passed-in slot would not
store the inserted tuple anymore.

To fix these, I modified the patch so that we 1) add to ExecInsert()
an output parameter slot to store the inserted tuple, and 2) compute
RETURNING based on the parameter.  I also added a regression test
case.  Attached is an updated version of the patch.

Sorry for the long delay.

Best regards,
Etsuro Fujita

Вложения

Re: problem with RETURNING and update row movement

От
Etsuro Fujita
Дата:
Yet another thing I noticed is that the patch incorrectly produces
values for the tableoid columns specified in the RETURNING list, like
this:

+UPDATE range_parted r set a = 'c' FROM (VALUES ('a', 1), ('a', 10),
('b', 12)) s(x, y) WHERE s.x = r.a AND s.y = r.b RETURNING
tableoid::regclass, *;
+    tableoid    | a | b  |  c  | d |       e       | x | y
+----------------+---+----+-----+---+---------------+---+----
+ part_a_1_a_10  | c |  1 |   1 | 1 | in trigfunc() | a |  1
+ part_a_10_a_20 | c | 10 | 200 | 1 | in trigfunc() | a | 10
+ part_c_1_100   | c | 12 |  96 | 1 | in trigfunc() | b | 12
+(3 rows)

The source partitions are shown as tableoid, but the destination
partition (ie, part_c_1_c_20) should be shown.  To fix this, I
modified the patch further so that 1) we override tts_tableOid of the
original slot with the OID of the destination partition before calling
ExecProcessReturning() if needed, and 2) in ExecProcessReturning(), we
only initialize ecxt_scantuple's tts_tableOid when needed, which would
save cycles a bit for non-foreign-table-direct-modification cases.

Attached is a new version of the patch.

Best regards,
Etsuro Fujita

Вложения

Re: problem with RETURNING and update row movement

От
Amit Langote
Дата:
Fujita-san,

Thanks for your time on this.

On Sun, Aug 2, 2020 at 5:57 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> Yet another thing I noticed is that the patch incorrectly produces
> values for the tableoid columns specified in the RETURNING list, like
> this:

Yeah, I noticed that too.

> +UPDATE range_parted r set a = 'c' FROM (VALUES ('a', 1), ('a', 10),
> ('b', 12)) s(x, y) WHERE s.x = r.a AND s.y = r.b RETURNING
> tableoid::regclass, *;
> +    tableoid    | a | b  |  c  | d |       e       | x | y
> +----------------+---+----+-----+---+---------------+---+----
> + part_a_1_a_10  | c |  1 |   1 | 1 | in trigfunc() | a |  1
> + part_a_10_a_20 | c | 10 | 200 | 1 | in trigfunc() | a | 10
> + part_c_1_100   | c | 12 |  96 | 1 | in trigfunc() | b | 12
> +(3 rows)
>
> The source partitions are shown as tableoid, but the destination
> partition (ie, part_c_1_c_20) should be shown.  To fix this, I
> modified the patch further so that 1) we override tts_tableOid of the
> original slot with the OID of the destination partition before calling
> ExecProcessReturning() if needed, and 2) in ExecProcessReturning(), we
> only initialize ecxt_scantuple's tts_tableOid when needed, which would
> save cycles a bit for non-foreign-table-direct-modification cases.
>
> Attached is a new version of the patch.

Thanks for the updated patch.  I reviewed your changes in v3 too and
they looked fine to me.

However, I noticed that having system columns like ctid, xmin, etc. in
the RETURNING list is now broken and maybe irrepairably due to the
approach we are taking in the patch.  Let me show an example:

drop table foo;
create table foo (a int, b int) partition by list (a);
create table foo1 (c int, b int, a int);
alter table foo1 drop c;
alter table foo attach partition foo1 for values in (1);
create table foo2 partition of foo for values in (2);
create table foo3 partition of foo for values in (3);
create or replace function trigfunc () returns trigger language
plpgsql as $$ begin new.b := 2; return new; end; $$;
create trigger trig before insert on foo2 for each row execute
function trigfunc();
insert into foo values (1, 1), (2, 2), (3, 3);
update foo set a = 2 from (values (1), (2), (3)) s(x) where a = s.x
returning tableoid::regclass, ctid, xmin, xmax, *;
 tableoid |      ctid      | xmin |    xmax    | a | b | x
----------+----------------+------+------------+---+---+---
 foo2     | (4294967295,0) |  128 | 4294967295 | 2 | 2 | 1
 foo2     | (0,3)          |  782 |          0 | 2 | 2 | 2
 foo2     | (0,4)          |  782 |          0 | 2 | 2 | 3
(3 rows)

During foo1's update, it appears that we are losing the system
information in the physical tuple initialized during ExecInsert on
foo2 during its conversion back to foo1's reltype using the new code.
 I haven't been able to figure out how to preserve the system
information in HeapTuple contained in the destination slot across the
conversion.  Note we want to use the latter to project the RETURNING
list.

By the way, you'll need two adjustments to even get this example
working, one of which is a reported problem that system columns in
RETURNING list during an operation on partitioned table stopped
working in PG 12 [1] for which I've proposed a workaround (attached).
Another is that we forgot in our patch to "materialize" the virtual
tuple after conversion, which means slot_getsysattr() can't find it to
access system columns like xmin, etc.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/141051591267657%40mail.yandex.ru

Вложения

Re: problem with RETURNING and update row movement

От
Amit Langote
Дата:
On Mon, Aug 3, 2020 at 2:54 PM Amit Langote <amitlangote09@gmail.com> wrote:
> However, I noticed that having system columns like ctid, xmin, etc. in
> the RETURNING list is now broken and maybe irrepairably due to the
> approach we are taking in the patch.  Let me show an example:
>
> drop table foo;
> create table foo (a int, b int) partition by list (a);
> create table foo1 (c int, b int, a int);
> alter table foo1 drop c;
> alter table foo attach partition foo1 for values in (1);
> create table foo2 partition of foo for values in (2);
> create table foo3 partition of foo for values in (3);
> create or replace function trigfunc () returns trigger language
> plpgsql as $$ begin new.b := 2; return new; end; $$;
> create trigger trig before insert on foo2 for each row execute
> function trigfunc();
> insert into foo values (1, 1), (2, 2), (3, 3);
> update foo set a = 2 from (values (1), (2), (3)) s(x) where a = s.x
> returning tableoid::regclass, ctid, xmin, xmax, *;
>  tableoid |      ctid      | xmin |    xmax    | a | b | x
> ----------+----------------+------+------------+---+---+---
>  foo2     | (4294967295,0) |  128 | 4294967295 | 2 | 2 | 1
>  foo2     | (0,3)          |  782 |          0 | 2 | 2 | 2
>  foo2     | (0,4)          |  782 |          0 | 2 | 2 | 3
> (3 rows)
>
> During foo1's update, it appears that we are losing the system
> information in the physical tuple initialized during ExecInsert on
> foo2 during its conversion back to foo1's reltype using the new code.
>  I haven't been able to figure out how to preserve the system
> information in HeapTuple contained in the destination slot across the
> conversion.  Note we want to use the latter to project the RETURNING
> list.
>
> By the way, you'll need two adjustments to even get this example
> working, one of which is a reported problem that system columns in
> RETURNING list during an operation on partitioned table stopped
> working in PG 12 [1] for which I've proposed a workaround (attached).
> Another is that we forgot in our patch to "materialize" the virtual
> tuple after conversion, which means slot_getsysattr() can't find it to
> access system columns like xmin, etc.

The only solution I could think of for this so far is this:

+                   if (map)
+                   {
+                       orig_slot = execute_attr_map_slot(map,
+                                                         res_slot,
+                                                         orig_slot);
+
+                       /*
+                        * A HACK to install system information into the just
+                        * converted tuple so that RETURNING computes any
+                        * system columns correctly. This would be the same
+                        * information that would be present in the HeapTuple
+                        * version of the tuple in res_slot.
+                        */
+                       tuple = ExecFetchSlotHeapTuple(orig_slot, true,
+                                                      &should_free);
+                       tuple->t_data->t_infomask &= ~(HEAP_XACT_MASK);
+                       tuple->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
+                       tuple->t_data->t_infomask |= HEAP_XMAX_INVALID;
+                       HeapTupleHeaderSetXmin(tuple->t_data,
+                                              GetCurrentTransactionId());
+                       HeapTupleHeaderSetCmin(tuple->t_data,
+                                              estate->es_output_cid);
+                       HeapTupleHeaderSetXmax(tuple->t_data, 0); /*
for cleanliness */
+                   }
+                   /*
+                    * Override tts_tableOid with the OID of the destination
+                    * partition.
+                    */
+                   orig_slot->tts_tableOid =
RelationGetRelid(destRel->ri_RelationDesc);
+                   /* Also the TID. */
+                   orig_slot->tts_tid = res_slot->tts_tid;

..but it might be too ugly :(.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: problem with RETURNING and update row movement

От
Etsuro Fujita
Дата:
Amit-san,

On Mon, Aug 3, 2020 at 2:55 PM Amit Langote <amitlangote09@gmail.com> wrote:
> However, I noticed that having system columns like ctid, xmin, etc. in
> the RETURNING list is now broken and maybe irrepairably due to the
> approach we are taking in the patch.  Let me show an example:
>
> drop table foo;
> create table foo (a int, b int) partition by list (a);
> create table foo1 (c int, b int, a int);
> alter table foo1 drop c;
> alter table foo attach partition foo1 for values in (1);
> create table foo2 partition of foo for values in (2);
> create table foo3 partition of foo for values in (3);
> create or replace function trigfunc () returns trigger language
> plpgsql as $$ begin new.b := 2; return new; end; $$;
> create trigger trig before insert on foo2 for each row execute
> function trigfunc();
> insert into foo values (1, 1), (2, 2), (3, 3);
> update foo set a = 2 from (values (1), (2), (3)) s(x) where a = s.x
> returning tableoid::regclass, ctid, xmin, xmax, *;
>  tableoid |      ctid      | xmin |    xmax    | a | b | x
> ----------+----------------+------+------------+---+---+---
>  foo2     | (4294967295,0) |  128 | 4294967295 | 2 | 2 | 1
>  foo2     | (0,3)          |  782 |          0 | 2 | 2 | 2
>  foo2     | (0,4)          |  782 |          0 | 2 | 2 | 3
> (3 rows)
>
> During foo1's update, it appears that we are losing the system
> information in the physical tuple initialized during ExecInsert on
> foo2 during its conversion back to foo1's reltype using the new code.
>  I haven't been able to figure out how to preserve the system
> information in HeapTuple contained in the destination slot across the
> conversion.  Note we want to use the latter to project the RETURNING
> list.

Thanks for pointing that out!

> By the way, you'll need two adjustments to even get this example
> working, one of which is a reported problem that system columns in
> RETURNING list during an operation on partitioned table stopped
> working in PG 12 [1] for which I've proposed a workaround (attached).
> Another is that we forgot in our patch to "materialize" the virtual
> tuple after conversion, which means slot_getsysattr() can't find it to
> access system columns like xmin, etc.

Ok, I’ll look at those closely.

Best regards,
Etsuro Fujita



Re: problem with RETURNING and update row movement

От
Etsuro Fujita
Дата:
On Mon, Aug 3, 2020 at 4:39 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Mon, Aug 3, 2020 at 2:54 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > By the way, you'll need two adjustments to even get this example
> > working, one of which is a reported problem that system columns in
> > RETURNING list during an operation on partitioned table stopped
> > working in PG 12 [1] for which I've proposed a workaround (attached).
> > Another is that we forgot in our patch to "materialize" the virtual
> > tuple after conversion, which means slot_getsysattr() can't find it to
> > access system columns like xmin, etc.
>
> The only solution I could think of for this so far is this:
>
> +                   if (map)
> +                   {
> +                       orig_slot = execute_attr_map_slot(map,
> +                                                         res_slot,
> +                                                         orig_slot);
> +
> +                       /*
> +                        * A HACK to install system information into the just
> +                        * converted tuple so that RETURNING computes any
> +                        * system columns correctly. This would be the same
> +                        * information that would be present in the HeapTuple
> +                        * version of the tuple in res_slot.
> +                        */
> +                       tuple = ExecFetchSlotHeapTuple(orig_slot, true,
> +                                                      &should_free);
> +                       tuple->t_data->t_infomask &= ~(HEAP_XACT_MASK);
> +                       tuple->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
> +                       tuple->t_data->t_infomask |= HEAP_XMAX_INVALID;
> +                       HeapTupleHeaderSetXmin(tuple->t_data,
> +                                              GetCurrentTransactionId());
> +                       HeapTupleHeaderSetCmin(tuple->t_data,
> +                                              estate->es_output_cid);
> +                       HeapTupleHeaderSetXmax(tuple->t_data, 0); /*
> for cleanliness */
> +                   }
> +                   /*
> +                    * Override tts_tableOid with the OID of the destination
> +                    * partition.
> +                    */
> +                   orig_slot->tts_tableOid =
> RelationGetRelid(destRel->ri_RelationDesc);
> +                   /* Also the TID. */
> +                   orig_slot->tts_tid = res_slot->tts_tid;
>
> ..but it might be too ugly :(.

Yeah, I think that would be a bit ugly, and actually, is not correct
in case of postgres_fdw foreign table, in which case Xmin and Cmin are
also set to 0 [2].  I think we should probably first address the
tableam issue that you pointed out, but I don't think I'm the right
person to do so.

Best regards,
Etsuro Fujita

[2] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=da7d44b627ba839de32c9409aca659f60324de76



Re: problem with RETURNING and update row movement

От
Amit Langote
Дата:
On Fri, Aug 7, 2020 at 10:45 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> On Mon, Aug 3, 2020 at 4:39 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Mon, Aug 3, 2020 at 2:54 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > By the way, you'll need two adjustments to even get this example
> > > working, one of which is a reported problem that system columns in
> > > RETURNING list during an operation on partitioned table stopped
> > > working in PG 12 [1] for which I've proposed a workaround (attached).
> > > Another is that we forgot in our patch to "materialize" the virtual
> > > tuple after conversion, which means slot_getsysattr() can't find it to
> > > access system columns like xmin, etc.
> >
> > The only solution I could think of for this so far is this:
> >
> > +                   if (map)
> > +                   {
> > +                       orig_slot = execute_attr_map_slot(map,
> > +                                                         res_slot,
> > +                                                         orig_slot);
> > +
> > +                       /*
> > +                        * A HACK to install system information into the just
> > +                        * converted tuple so that RETURNING computes any
> > +                        * system columns correctly. This would be the same
> > +                        * information that would be present in the HeapTuple
> > +                        * version of the tuple in res_slot.
> > +                        */
> > +                       tuple = ExecFetchSlotHeapTuple(orig_slot, true,
> > +                                                      &should_free);
> > +                       tuple->t_data->t_infomask &= ~(HEAP_XACT_MASK);
> > +                       tuple->t_data->t_infomask2 &= ~(HEAP2_XACT_MASK);
> > +                       tuple->t_data->t_infomask |= HEAP_XMAX_INVALID;
> > +                       HeapTupleHeaderSetXmin(tuple->t_data,
> > +                                              GetCurrentTransactionId());
> > +                       HeapTupleHeaderSetCmin(tuple->t_data,
> > +                                              estate->es_output_cid);
> > +                       HeapTupleHeaderSetXmax(tuple->t_data, 0); /*
> > for cleanliness */
> > +                   }
> > +                   /*
> > +                    * Override tts_tableOid with the OID of the destination
> > +                    * partition.
> > +                    */
> > +                   orig_slot->tts_tableOid =
> > RelationGetRelid(destRel->ri_RelationDesc);
> > +                   /* Also the TID. */
> > +                   orig_slot->tts_tid = res_slot->tts_tid;
> >
> > ..but it might be too ugly :(.
>
> Yeah, I think that would be a bit ugly, and actually, is not correct
> in case of postgres_fdw foreign table, in which case Xmin and Cmin are
> also set to 0 [2].

Yeah, need to think a bit harder.

>  I think we should probably first address the
> tableam issue that you pointed out, but I don't think I'm the right
> person to do so.

Okay, I will try to revive that thread.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: problem with RETURNING and update row movement

От
Alvaro Herrera
Дата:
I noticed that this bugfix has stalled, probably because the other
bugfix has also stalled.

It seems that cleanly returning system columns from table AM is not
going to be a simple task -- whatever fix we get for that is likely not
going to make it all the way to PG 12.  So I suggest that we should fix
this bug in 11-13 without depending on that.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: problem with RETURNING and update row movement

От
Amit Langote
Дата:
Hi Alvaro,

On Sat, Sep 12, 2020 at 5:42 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> I noticed that this bugfix has stalled, probably because the other
> bugfix has also stalled.
>
> It seems that cleanly returning system columns from table AM is not
> going to be a simple task -- whatever fix we get for that is likely not
> going to make it all the way to PG 12.  So I suggest that we should fix
> this bug in 11-13 without depending on that.

Although I would be reversing course on what I said upthread, I tend
to agree with that, because the core idea behind the fix for this
issue does not seem likely to be invalidated by any conclusion
regarding the other issue.  That is, as far as the issue here is
concerned, we can't avoid falling back to using the source partition's
RETURNING projection whose scan tuple is provided using the source
partition's tuple slot.

However, given that we have to pass the *new* tuple to the projection,
not the old one, we will need a "clean" way to transfer its (the new
tuple's) system columns into the source partition's tuple slot.  The
sketch I gave upthread of what that could look like was not liked by
Fujita-san much.



--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: problem with RETURNING and update row movement

От
Etsuro Fujita
Дата:
On Mon, Sep 14, 2020 at 3:53 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Sat, Sep 12, 2020 at 5:42 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > I noticed that this bugfix has stalled, probably because the other
> > bugfix has also stalled.
> >
> > It seems that cleanly returning system columns from table AM is not
> > going to be a simple task -- whatever fix we get for that is likely not
> > going to make it all the way to PG 12.  So I suggest that we should fix
> > this bug in 11-13 without depending on that.
>
> Although I would be reversing course on what I said upthread, I tend
> to agree with that, because the core idea behind the fix for this
> issue does not seem likely to be invalidated by any conclusion
> regarding the other issue.  That is, as far as the issue here is
> concerned, we can't avoid falling back to using the source partition's
> RETURNING projection whose scan tuple is provided using the source
> partition's tuple slot.

I agree on that point.

> However, given that we have to pass the *new* tuple to the projection,
> not the old one, we will need a "clean" way to transfer its (the new
> tuple's) system columns into the source partition's tuple slot.  The
> sketch I gave upthread of what that could look like was not liked by
> Fujita-san much.

IIUC, I think two issues are discussed in the thread [1]: (a) there is
currently no way to define the set of meaningful system columns for a
partitioned table that contains pluggable storages other than standard
heap, and (b) even in the case where the set is defined as before,
like partitioned tables that don’t contain any pluggable storages,
system column values are not obtained from a tuple inserted into a
partitioned table in cases as reported in [1], because virtual slots
are assigned for partitioned tables [2][3].  (I think the latter is
the original issue in the thread, though.)

I think we could fix this update-tuple-routing-vs-RETURNING issue
without the fix for (a).  But to transfer system column values in a
cleaner way, I think we need to fix (b) first so that we can always
obtain/copy them from the new tuple moved to another partition by
INSERT following DELETE.

(I think we could address this issue for v11 independently of [1], off course.)

Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/141051591267657%40mail.yandex.ru
[2] https://www.postgresql.org/message-id/CA%2BHiwqHrsNa4e0MfpSzv7xOM94TvX%3DR0MskYxYWfy0kjL0DAdQ%40mail.gmail.com
[3] https://www.postgresql.org/message-id/20200811180231.fcssvhelqpnydnx7%40alap3.anarazel.de



Re: problem with RETURNING and update row movement

От
Amit Langote
Дата:
On Mon, Sep 14, 2020 at 5:56 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> On Mon, Sep 14, 2020 at 3:53 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Sat, Sep 12, 2020 at 5:42 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > > I noticed that this bugfix has stalled, probably because the other
> > > bugfix has also stalled.
> > >
> > > It seems that cleanly returning system columns from table AM is not
> > > going to be a simple task -- whatever fix we get for that is likely not
> > > going to make it all the way to PG 12.  So I suggest that we should fix
> > > this bug in 11-13 without depending on that.
> >
> > Although I would be reversing course on what I said upthread, I tend
> > to agree with that, because the core idea behind the fix for this
> > issue does not seem likely to be invalidated by any conclusion
> > regarding the other issue.  That is, as far as the issue here is
> > concerned, we can't avoid falling back to using the source partition's
> > RETURNING projection whose scan tuple is provided using the source
> > partition's tuple slot.
>
> I agree on that point.
>
> > However, given that we have to pass the *new* tuple to the projection,
> > not the old one, we will need a "clean" way to transfer its (the new
> > tuple's) system columns into the source partition's tuple slot.  The
> > sketch I gave upthread of what that could look like was not liked by
> > Fujita-san much.
>
> IIUC, I think two issues are discussed in the thread [1]: (a) there is
> currently no way to define the set of meaningful system columns for a
> partitioned table that contains pluggable storages other than standard
> heap, and (b) even in the case where the set is defined as before,
> like partitioned tables that don’t contain any pluggable storages,
> system column values are not obtained from a tuple inserted into a
> partitioned table in cases as reported in [1], because virtual slots
> are assigned for partitioned tables [2][3].  (I think the latter is
> the original issue in the thread, though.)

Right, (b) can be solved by using a leaf partition's tuple slot as
proposed.  Although (a) needs a long term fix.

> I think we could fix this update-tuple-routing-vs-RETURNING issue
> without the fix for (a).  But to transfer system column values in a
> cleaner way, I think we need to fix (b) first so that we can always
> obtain/copy them from the new tuple moved to another partition by
> INSERT following DELETE.

Yes, I think you are right.  Although, I am still not sure how to
"copy" system columns from one partition's slot to another's, that is,
without assuming what they are.

> (I think we could address this issue for v11 independently of [1], off course.)

Yeah.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: problem with RETURNING and update row movement

От
Etsuro Fujita
Дата:
On Mon, Sep 14, 2020 at 10:45 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Mon, Sep 14, 2020 at 5:56 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> > IIUC, I think two issues are discussed in the thread [1]: (a) there is
> > currently no way to define the set of meaningful system columns for a
> > partitioned table that contains pluggable storages other than standard
> > heap, and (b) even in the case where the set is defined as before,
> > like partitioned tables that don’t contain any pluggable storages,
> > system column values are not obtained from a tuple inserted into a
> > partitioned table in cases as reported in [1], because virtual slots
> > are assigned for partitioned tables [2][3].  (I think the latter is
> > the original issue in the thread, though.)
>
> Right, (b) can be solved by using a leaf partition's tuple slot as
> proposed.

You mean what is proposed in [3]?

> > I think we could fix this update-tuple-routing-vs-RETURNING issue
> > without the fix for (a).  But to transfer system column values in a
> > cleaner way, I think we need to fix (b) first so that we can always
> > obtain/copy them from the new tuple moved to another partition by
> > INSERT following DELETE.
>
> Yes, I think you are right.  Although, I am still not sure how to
> "copy" system columns from one partition's slot to another's, that is,
> without assuming what they are.

I just thought we assume that partitions support all the existing
system attributes until we have the fix for (a), i.e., the slot
assigned for a partition must have the getsysattr callback routine
from which we can fetch each existing system attribute of a underlying
tuple in the slot, regardless of whether that system attribute is used
for the partition or not.

> > (I think we could address this issue for v11 independently of [1], off course.)
>
> Yeah.

I noticed that I modified your patch incorrectly.  Sorry for that.
Attached is an updated patch fixing that.  I also added a bit of code
to copy CTID from the new tuple moved to another partition, not just
table OID, and did some code/comment adjustments, mainly to match
other places.  I also created a patch for PG11, which I am attaching
as well.

Best regards,
Etsuro Fujita

Вложения

Re: problem with RETURNING and update row movement

От
Amit Langote
Дата:
Fujita-san,

On Sun, Sep 20, 2020 at 11:41 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> On Mon, Sep 14, 2020 at 10:45 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Mon, Sep 14, 2020 at 5:56 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> > > IIUC, I think two issues are discussed in the thread [1]: (a) there is
> > > currently no way to define the set of meaningful system columns for a
> > > partitioned table that contains pluggable storages other than standard
> > > heap, and (b) even in the case where the set is defined as before,
> > > like partitioned tables that don’t contain any pluggable storages,
> > > system column values are not obtained from a tuple inserted into a
> > > partitioned table in cases as reported in [1], because virtual slots
> > > are assigned for partitioned tables [2][3].  (I think the latter is
> > > the original issue in the thread, though.)
> >
> > Right, (b) can be solved by using a leaf partition's tuple slot as
> > proposed.
>
> You mean what is proposed in [3]?

Yes.  Although, I am for assigning a dedicated slot to partitions
*unconditionally*, whereas the PoC patch Andres shared makes it
conditional on either needing tuple conversion between the root and
the partition or having a RETURNING projection present in the query.

> > > I think we could fix this update-tuple-routing-vs-RETURNING issue
> > > without the fix for (a).  But to transfer system column values in a
> > > cleaner way, I think we need to fix (b) first so that we can always
> > > obtain/copy them from the new tuple moved to another partition by
> > > INSERT following DELETE.
> >
> > Yes, I think you are right.  Although, I am still not sure how to
> > "copy" system columns from one partition's slot to another's, that is,
> > without assuming what they are.
>
> I just thought we assume that partitions support all the existing
> system attributes until we have the fix for (a), i.e., the slot
> assigned for a partition must have the getsysattr callback routine
> from which we can fetch each existing system attribute of a underlying
> tuple in the slot, regardless of whether that system attribute is used
> for the partition or not.

Hmm, to copy one slot's system attributes into another, we will also
need a way to "set" the system attributes in the destination slot.
But maybe I didn't fully understand what you said.

> > > (I think we could address this issue for v11 independently of [1], off course.)
> >
> > Yeah.
>
> I noticed that I modified your patch incorrectly.  Sorry for that.
> Attached is an updated patch fixing that.

Ah, no problem.  Thanks for updating.

>  I also added a bit of code
> to copy CTID from the new tuple moved to another partition, not just
> table OID, and did some code/comment adjustments, mainly to match
> other places.  I also created a patch for PG11, which I am attaching
> as well.

In the patch for PG 11:

+                       new_tuple->t_self = new_tuple->t_data->t_ctid =
+                           old_tuple->t_self;
...

Should we add a one-line comment above this block of code to transfer
system attributes?  Maybe: /* Also transfer the system attributes. */?

BTW, do you think we should alter the test in PG 11 branch to test
that system attributes are returned correctly?  Once we settle the
other issue, we can adjust the HEAD's test to do likewise?

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: problem with RETURNING and update row movement

От
Etsuro Fujita
Дата:
On Wed, Sep 23, 2020 at 10:12 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Sun, Sep 20, 2020 at 11:41 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> > On Mon, Sep 14, 2020 at 10:45 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > On Mon, Sep 14, 2020 at 5:56 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> > > > IIUC, I think two issues are discussed in the thread [1]: (a) there is
> > > > currently no way to define the set of meaningful system columns for a
> > > > partitioned table that contains pluggable storages other than standard
> > > > heap, and (b) even in the case where the set is defined as before,
> > > > like partitioned tables that don’t contain any pluggable storages,
> > > > system column values are not obtained from a tuple inserted into a
> > > > partitioned table in cases as reported in [1], because virtual slots
> > > > are assigned for partitioned tables [2][3].  (I think the latter is
> > > > the original issue in the thread, though.)

> > > > I think we could fix this update-tuple-routing-vs-RETURNING issue
> > > > without the fix for (a).  But to transfer system column values in a
> > > > cleaner way, I think we need to fix (b) first so that we can always
> > > > obtain/copy them from the new tuple moved to another partition by
> > > > INSERT following DELETE.
> > >
> > > Yes, I think you are right.  Although, I am still not sure how to
> > > "copy" system columns from one partition's slot to another's, that is,
> > > without assuming what they are.
> >
> > I just thought we assume that partitions support all the existing
> > system attributes until we have the fix for (a), i.e., the slot
> > assigned for a partition must have the getsysattr callback routine
> > from which we can fetch each existing system attribute of a underlying
> > tuple in the slot, regardless of whether that system attribute is used
> > for the partition or not.
>
> Hmm, to copy one slot's system attributes into another, we will also
> need a way to "set" the system attributes in the destination slot.
> But maybe I didn't fully understand what you said.

Sorry, my thought was vague.  To store xmin/xmax/cmin/cmax into a
given slot, we need to extend the TupleTableSlot struct to contain
these attributes as well?  Or we need to introduce a new callback
routine for that (say, setsysattr)?  These would not be
back-patchable, though.

> > I also created a patch for PG11, which I am attaching
> > as well.
>
> In the patch for PG 11:
>
> +                       new_tuple->t_self = new_tuple->t_data->t_ctid =
> +                           old_tuple->t_self;
> ...
>
> Should we add a one-line comment above this block of code to transfer
> system attributes?  Maybe: /* Also transfer the system attributes. */?

Will add that comment.  Thanks for reviewing!

> BTW, do you think we should alter the test in PG 11 branch to test
> that system attributes are returned correctly?

Yeah, I think so.  I didn’t come up with any good test cases for that, though.

> Once we settle the
> other issue, we can adjust the HEAD's test to do likewise?

Yeah, but for the other issue, I started thinking that we should just
forbid referencing xmin/xmax/cmin/cmax in 12, 13, and HEAD...

Best regards,
Etsuro Fujita



Re: problem with RETURNING and update row movement

От
Michael Paquier
Дата:
On Thu, Sep 24, 2020 at 04:25:24AM +0900, Etsuro Fujita wrote:
> Sorry, my thought was vague.  To store xmin/xmax/cmin/cmax into a
> given slot, we need to extend the TupleTableSlot struct to contain
> these attributes as well?  Or we need to introduce a new callback
> routine for that (say, setsysattr)?  These would not be
> back-patchable, though.

Please note that the latest patch fails to apply, so this needs a
rebase.
--
Michael

Вложения

Re: problem with RETURNING and update row movement

От
Amit Langote
Дата:
On Thu, Sep 24, 2020 at 1:54 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Sep 24, 2020 at 04:25:24AM +0900, Etsuro Fujita wrote:
> > Sorry, my thought was vague.  To store xmin/xmax/cmin/cmax into a
> > given slot, we need to extend the TupleTableSlot struct to contain
> > these attributes as well?  Or we need to introduce a new callback
> > routine for that (say, setsysattr)?  These would not be
> > back-patchable, though.
>
> Please note that the latest patch fails to apply, so this needs a
> rebase.

I saw the CF-bot failure too yesterday, although it seems that it's
because the bot is trying to apply the patch version meant for v11
branch onto HEAD branch.  I just checked that the patches apply
cleanly to their respective branches.

-- 
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: problem with RETURNING and update row movement

От
Amit Langote
Дата:
On Thu, Sep 24, 2020 at 4:25 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> On Wed, Sep 23, 2020 at 10:12 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Sun, Sep 20, 2020 at 11:41 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> > > On Mon, Sep 14, 2020 at 10:45 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > > > Although, I am still not sure how to
> > > > "copy" system columns from one partition's slot to another's, that is,
> > > > without assuming what they are.
> > >
> > > I just thought we assume that partitions support all the existing
> > > system attributes until we have the fix for (a), i.e., the slot
> > > assigned for a partition must have the getsysattr callback routine
> > > from which we can fetch each existing system attribute of a underlying
> > > tuple in the slot, regardless of whether that system attribute is used
> > > for the partition or not.
> >
> > Hmm, to copy one slot's system attributes into another, we will also
> > need a way to "set" the system attributes in the destination slot.
> > But maybe I didn't fully understand what you said.
>
> Sorry, my thought was vague.  To store xmin/xmax/cmin/cmax into a
> given slot, we need to extend the TupleTableSlot struct to contain
> these attributes as well?  Or we need to introduce a new callback
> routine for that (say, setsysattr)?  These would not be
> back-patchable, though.

Yeah, I'd think so too.

BTW, the discussion so far on the other thread is oblivious to the
issue being discussed here, where we need to find a way to transfer
system attributes between a pair of partitions that are possibly
incompatible with each other in terms of what set of system attributes
they support.  Although, if we prevent accessing system attributes
when performing the operation on partitioned tables, like what you
seem to propose below, then we wouldn't really have that problem.

> > BTW, do you think we should alter the test in PG 11 branch to test
> > that system attributes are returned correctly?
>
> Yeah, I think so.  I didn’t come up with any good test cases for that, though.
>
> > Once we settle the
> > other issue, we can adjust the HEAD's test to do likewise?
>
> Yeah, but for the other issue, I started thinking that we should just
> forbid referencing xmin/xmax/cmin/cmax in 12, 13, and HEAD...

When the command is being performed on a partitioned table you mean?
That is, it'd be okay to reference them when the command is being
performed directly on a leaf partition, although it's another thing
whether the leaf partitions themselves have sensible values to provide
for them.

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: problem with RETURNING and update row movement

От
Etsuro Fujita
Дата:
On Thu, Sep 24, 2020 at 2:47 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Thu, Sep 24, 2020 at 4:25 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> BTW, the discussion so far on the other thread is oblivious to the
> issue being discussed here, where we need to find a way to transfer
> system attributes between a pair of partitions that are possibly
> incompatible with each other in terms of what set of system attributes
> they support.

Yeah, we should discuss the two issues together.

> Although, if we prevent accessing system attributes
> when performing the operation on partitioned tables, like what you
> seem to propose below, then we wouldn't really have that problem.

Yeah, I think so.

> > Yeah, but for the other issue, I started thinking that we should just
> > forbid referencing xmin/xmax/cmin/cmax in 12, 13, and HEAD...
>
> When the command is being performed on a partitioned table you mean?

Yes.  One concern about that is triggers: IIUC, triggers on a
partition as-is can or can not reference xmin/xmax/cmin/cmax depending
on whether a dedicated tuple slot for the partition is used or not.
We should do something about this if we go in that direction?

> That is, it'd be okay to reference them when the command is being
> performed directly on a leaf partition, although it's another thing
> whether the leaf partitions themselves have sensible values to provide
> for them.

I think so too.

Best regards,
Etsuro Fujita



Re: problem with RETURNING and update row movement

От
Amit Langote
Дата:
On Thu, Sep 24, 2020 at 7:30 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> On Thu, Sep 24, 2020 at 2:47 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > On Thu, Sep 24, 2020 at 4:25 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote:
> > > Yeah, but for the other issue, I started thinking that we should just
> > > forbid referencing xmin/xmax/cmin/cmax in 12, 13, and HEAD...
> >
> > When the command is being performed on a partitioned table you mean?
>
> Yes.  One concern about that is triggers: IIUC, triggers on a
> partition as-is can or can not reference xmin/xmax/cmin/cmax depending
> on whether a dedicated tuple slot for the partition is used or not.
> We should do something about this if we go in that direction?

Maybe I'm missing something, but assuming that we're talking about
prohibiting system attribute access in the RETURNING clause, how does
that affect what triggers can or cannot do?  AFAICS, only AFTER
row-level triggers may sensibly access system attributes and
whether/how they can do so has not much to do with the slot that
ExecInsert() gets the new tuple in.  It seems that the AFTER trigger
infrastructure remembers an affected tuple's ctid and fetches it just
before calling trigger function by asking the result relation's (e.g.,
a partition's) access method.

To illustrate, with HEAD:

create table foo (a int, b int) partition by range (a);
create table foo1 partition of foo for values from (1) to (2);

create or replace function report_system_info () returns trigger
language plpgsql as $$
begin
  raise notice 'ctid: %', new.ctid;
  raise notice 'xmin: %', new.xmin;
  raise notice 'xmax: %', new.xmax;
  raise notice 'cmin: %', new.cmin;
  raise notice 'cmax: %', new.cmax;
  raise notice 'tableoid: %', new.tableoid;
  return NULL;
end; $$;

create trigger foo_after_trig after insert on foo for each row execute
function report_system_info();

begin;

insert into foo values (1);
NOTICE:  ctid: (0,1)
NOTICE:  xmin: 532
NOTICE:  xmax: 0
NOTICE:  cmin: 0
NOTICE:  cmax: 0
NOTICE:  tableoid: 16387

insert into foo values (1);
NOTICE:  ctid: (0,2)
NOTICE:  xmin: 532
NOTICE:  xmax: 0
NOTICE:  cmin: 1
NOTICE:  cmax: 1
NOTICE:  tableoid: 16387

insert into foo values (1);
NOTICE:  ctid: (0,3)
NOTICE:  xmin: 532
NOTICE:  xmax: 0
NOTICE:  cmin: 2
NOTICE:  cmax: 2
NOTICE:  tableoid: 16387

--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: problem with RETURNING and update row movement

От
Amit Langote
Дата:
On Thu, Sep 24, 2020 at 2:26 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Thu, Sep 24, 2020 at 1:54 PM Michael Paquier <michael@paquier.xyz> wrote:
> > On Thu, Sep 24, 2020 at 04:25:24AM +0900, Etsuro Fujita wrote:
> > > Sorry, my thought was vague.  To store xmin/xmax/cmin/cmax into a
> > > given slot, we need to extend the TupleTableSlot struct to contain
> > > these attributes as well?  Or we need to introduce a new callback
> > > routine for that (say, setsysattr)?  These would not be
> > > back-patchable, though.
> >
> > Please note that the latest patch fails to apply, so this needs a
> > rebase.
>
> I saw the CF-bot failure too yesterday, although it seems that it's
> because the bot is trying to apply the patch version meant for v11
> branch onto HEAD branch.  I just checked that the patches apply
> cleanly to their respective branches.

I checked that the last statement is still true, so I've switched the
status back to Needs Review.


--
Amit Langote
EnterpriseDB: http://www.enterprisedb.com



Re: problem with RETURNING and update row movement

От
Amit Langote
Дата:
On Wed, Sep 30, 2020 at 12:34 PM Amit Langote <amitlangote09@gmail.com> wrote:
> On Thu, Sep 24, 2020 at 2:26 PM Amit Langote <amitlangote09@gmail.com> wrote:
> > I saw the CF-bot failure too yesterday, although it seems that it's
> > because the bot is trying to apply the patch version meant for v11
> > branch onto HEAD branch.  I just checked that the patches apply
> > cleanly to their respective branches.
>
> I checked that the last statement is still true, so I've switched the
> status back to Needs Review.

And the patch for HEAD no longer applies, because of a recent
refactoring commit that hit update row movement code.

Rebased patch for HEAD attached.  Patch for PG11 should apply as-is.

Here is a summary of where we stand on this, because another issue
related to using RETURNING with partitioned tables [1] kind of ties
into this.

The problem reported on this thread is solved by ExecUpdate() itself
evaluating the RETURNING list using the source partition's
ri_projectReturning, instead of ExecInsert() doing it using the
destination partition's ri_projectReturning.  It must work that way,
because the tuple descriptor of 'planSlot', which provides the values
of the columns mentioned in RETURNING of tables other than the target
table, is based on the source partition.  Note that the tuple inserted
into the destination partition needs to be converted into the source
partition if their tuple descriptors differ, so that RETURNING
correctly returns the new values of the updated tuple.

The point we are stuck on is whether this fix will create problems for
the case when the RETURNING list contains system columns.  Now that we
will be projecting the tuple inserted into the destination using its
copy in the source partition's format and we have no way of preserving
the system information in the tuple across conversions, we need to
find a way to make RETURNING project the correct system information.
Fujita-san has suggested (or agreed with a suggestion made at [1])
that we should simply prevent system information from being projected
with RETURNING when the command is being performed on a partitioned
table, in which case this becomes a non-issue.  If that is not what we
decide to do on the other thread, then at least we will have to figure
out over there how we will support RETURNING system columns when row
movement occurs during an update on partitioned tables.  The question
I guess is whether that thread must conclude before the fix here can
be committed.

--
Amit Langote
EDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/flat/141051591267657%40mail.yandex.ru

Вложения

Re: problem with RETURNING and update row movement

От
Amit Langote
Дата:
On Sat, Mar 6, 2021 at 12:52 AM David Steele <david@pgmasters.net> wrote:
> On 10/30/20 6:00 AM, Amit Langote wrote:
> > The question
> > I guess is whether that thread must conclude before the fix here can
> > be committed.
>
> Indeed. But it seems like there is a candidate patch on [1] though that
> thread has also stalled. I'll try to get some status on that thread but
> the question remains if this patch will be stalled until [1] is resolved.

Sorry for the delay in following up on this thread.  I've posted new
patches on the other.

FWIW, I think we should go ahead and apply the patches for the bug
reported here.  Anyone who tries to project an updated tuple's system
columns using RETURNING are likely to face problems one way or
another, especially if they have partitioned tables containing
partitions of varying table AMs, but at least they won't face the bug
discussed here.

I've attached patches for all affected branches.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Вложения

Re: problem with RETURNING and update row movement

От
Tom Lane
Дата:
Amit Langote <amitlangote09@gmail.com> writes:
> FWIW, I think we should go ahead and apply the patches for the bug
> reported here.  Anyone who tries to project an updated tuple's system
> columns using RETURNING are likely to face problems one way or
> another, especially if they have partitioned tables containing
> partitions of varying table AMs, but at least they won't face the bug
> discussed here.

Agreed, we should get this fixed in time for the next minor releases.
The issue no longer exists on HEAD, thanks to 86dc90056 having got
rid of per-target-relation variance in the contents of planSlot.
But we still need a fix for the back branches.

So I looked over the v13 patch, and found a couple of things
I didn't like:

* I think what you did in ExecProcessReturning is buggy.  It's
not a great idea to have completely different processes for
getting tableoid set in normal-relation vs foreign-relation
cases, and in this case the foreign-relation case was simply
wrong.  Maybe the bug isn't reachable for lack of support of
cross-partition motion with FDWs, but I'm not sure about that.
We really need to decouple the RETURNING expressions (which
will belong to the source relation) from the value injected
for tableoid (which will belong to the destination).

* I really disliked the API change that ExecInsert is responsible
for computing RETURNING except when it isn't.  That's confusing
and there's no good reason for it, since it's not really any
easier to deal with the case at the call site than inside ExecInsert.

In the attached revision I made ExecInsert handle RETURNING
calculations by asking the callers to pass in the ResultRelInfo
that should be used for the purpose.  We could alternatively
have taken the responsibility for RETURNING out of ExecInsert
altogether, making the callers call ExecProcessReturning.
I think that might have netted out slightly simpler than this.
But we're unlikely to apply such a change in HEAD, so it seemed
better to keep the division of responsibilities the same as it
is in other branches.

Thoughts?  (I've not looked at porting this to v12 or v11 yet.)

            regards, tom lane

diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9572ef0690..ff4a2d0e58 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -149,34 +149,40 @@ ExecCheckPlanOutput(Relation resultRel, List *targetList)
 /*
  * ExecProcessReturning --- evaluate a RETURNING list
  *
- * resultRelInfo: current result rel
+ * projectReturning: the projection to evaluate
+ * resultRelOid: result relation's OID
  * tupleSlot: slot holding tuple actually inserted/updated/deleted
  * planSlot: slot holding tuple returned by top subplan node
  *
+ * In cross-partition UPDATE cases, projectReturning and planSlot are as
+ * for the source partition, and tupleSlot must conform to that.  But
+ * resultRelOid is for the destination partition.
+ *
  * Note: If tupleSlot is NULL, the FDW should have already provided econtext's
  * scan tuple.
  *
  * Returns a slot holding the result tuple
  */
 static TupleTableSlot *
-ExecProcessReturning(ResultRelInfo *resultRelInfo,
+ExecProcessReturning(ProjectionInfo *projectReturning,
+                     Oid resultRelOid,
                      TupleTableSlot *tupleSlot,
                      TupleTableSlot *planSlot)
 {
-    ProjectionInfo *projectReturning = resultRelInfo->ri_projectReturning;
     ExprContext *econtext = projectReturning->pi_exprContext;

     /* Make tuple and any needed join variables available to ExecProject */
     if (tupleSlot)
         econtext->ecxt_scantuple = tupleSlot;
+    else
+        Assert(econtext->ecxt_scantuple);
     econtext->ecxt_outertuple = planSlot;

     /*
-     * RETURNING expressions might reference the tableoid column, so
-     * reinitialize tts_tableOid before evaluating them.
+     * RETURNING expressions might reference the tableoid column, so be sure
+     * we expose the desired OID, ie that of the real target relation.
      */
-    econtext->ecxt_scantuple->tts_tableOid =
-        RelationGetRelid(resultRelInfo->ri_RelationDesc);
+    econtext->ecxt_scantuple->tts_tableOid = resultRelOid;

     /* Compute the RETURNING expressions */
     return ExecProject(projectReturning);
@@ -368,6 +374,16 @@ ExecComputeStoredGenerated(EState *estate, TupleTableSlot *slot, CmdType cmdtype
  *        For INSERT, we have to insert the tuple into the target relation
  *        and insert appropriate tuples into the index relations.
  *
+ *        slot contains the new tuple value to be stored.
+ *        planSlot is the output of the ModifyTable's subplan; we use it
+ *        to access "junk" columns that are not going to be stored.
+ *        In a cross-partition UPDATE, srcSlot is the slot that held the
+ *        updated tuple for the source relation; otherwise it's NULL.
+ *
+ *        returningRelInfo is the resultRelInfo for the source relation of a
+ *        cross-partition UPDATE; otherwise it's the current result relation.
+ *        We use it to process RETURNING lists, for reasons explained below.
+ *
  *        Returns RETURNING result if any, otherwise NULL.
  * ----------------------------------------------------------------
  */
@@ -375,6 +391,8 @@ static TupleTableSlot *
 ExecInsert(ModifyTableState *mtstate,
            TupleTableSlot *slot,
            TupleTableSlot *planSlot,
+           TupleTableSlot *srcSlot,
+           ResultRelInfo *returningRelInfo,
            EState *estate,
            bool canSetTag)
 {
@@ -677,8 +695,40 @@ ExecInsert(ModifyTableState *mtstate,
         ExecWithCheckOptions(WCO_VIEW_CHECK, resultRelInfo, slot, estate);

     /* Process RETURNING if present */
-    if (resultRelInfo->ri_projectReturning)
-        result = ExecProcessReturning(resultRelInfo, slot, planSlot);
+    if (returningRelInfo->ri_projectReturning)
+    {
+        /*
+         * In a cross-partition UPDATE with RETURNING, we have to use the
+         * source partition's RETURNING list, because that matches the output
+         * of the planSlot, while the destination partition might have
+         * different resjunk columns.  This means we have to map the
+         * destination slot back to the source's format so we can apply that
+         * RETURNING list.  This is expensive, but it should be an uncommon
+         * corner case, so we won't spend much effort on making it fast.
+         */
+        if (returningRelInfo != resultRelInfo)
+        {
+            Relation    srcRelationDesc = returningRelInfo->ri_RelationDesc;
+            AttrMap    *map;
+
+            map = build_attrmap_by_name_if_req(RelationGetDescr(resultRelationDesc),
+                                               RelationGetDescr(srcRelationDesc));
+            if (map)
+            {
+                /* We assume we can use srcSlot to hold the converted tuple */
+                TupleTableSlot *origSlot = slot;
+
+                slot = execute_attr_map_slot(map, slot, srcSlot);
+                slot->tts_tid = origSlot->tts_tid;
+                slot->tts_tableOid = origSlot->tts_tableOid;
+                free_attrmap(map);
+            }
+        }
+
+        result = ExecProcessReturning(returningRelInfo->ri_projectReturning,
+                                      RelationGetRelid(resultRelationDesc),
+                                      slot, planSlot);
+    }

     return result;
 }
@@ -1027,7 +1077,9 @@ ldelete:;
             }
         }

-        rslot = ExecProcessReturning(resultRelInfo, slot, planSlot);
+        rslot = ExecProcessReturning(resultRelInfo->ri_projectReturning,
+                                     RelationGetRelid(resultRelationDesc),
+                                     slot, planSlot);

         /*
          * Before releasing the target tuple again, make sure rslot has a
@@ -1203,6 +1255,7 @@ lreplace:;
         {
             bool        tuple_deleted;
             TupleTableSlot *ret_slot;
+            TupleTableSlot *orig_slot = slot;
             TupleTableSlot *epqslot = NULL;
             PartitionTupleRouting *proute = mtstate->mt_partition_tuple_routing;
             int            map_index;
@@ -1309,6 +1362,7 @@ lreplace:;
                                            mtstate->rootResultRelInfo, slot);

             ret_slot = ExecInsert(mtstate, slot, planSlot,
+                                  orig_slot, resultRelInfo,
                                   estate, canSetTag);

             /* Revert ExecPrepareTupleRouting's node change. */
@@ -1505,7 +1559,9 @@ lreplace:;

     /* Process RETURNING if present */
     if (resultRelInfo->ri_projectReturning)
-        return ExecProcessReturning(resultRelInfo, slot, planSlot);
+        return ExecProcessReturning(resultRelInfo->ri_projectReturning,
+                                    RelationGetRelid(resultRelationDesc),
+                                    slot, planSlot);

     return NULL;
 }
@@ -2154,7 +2210,9 @@ ExecModifyTable(PlanState *pstate)
              * ExecProcessReturning by IterateDirectModify, so no need to
              * provide it here.
              */
-            slot = ExecProcessReturning(resultRelInfo, NULL, planSlot);
+            slot = ExecProcessReturning(resultRelInfo->ri_projectReturning,
+                                        RelationGetRelid(resultRelInfo->ri_RelationDesc),
+                                        NULL, planSlot);

             estate->es_result_relation_info = saved_resultRelInfo;
             return slot;
@@ -2244,6 +2302,7 @@ ExecModifyTable(PlanState *pstate)
                     slot = ExecPrepareTupleRouting(node, estate, proute,
                                                    resultRelInfo, slot);
                 slot = ExecInsert(node, slot, planSlot,
+                                  NULL, estate->es_result_relation_info,
                                   estate, node->canSetTag);
                 /* Revert ExecPrepareTupleRouting's state change. */
                 if (proute)
diff --git a/src/test/regress/expected/update.out b/src/test/regress/expected/update.out
index bf939d79f6..807005c40a 100644
--- a/src/test/regress/expected/update.out
+++ b/src/test/regress/expected/update.out
@@ -445,6 +445,46 @@ UPDATE range_parted set c = 95 WHERE a = 'b' and b > 10 and c > 100 returning (r
  part_c_1_100   | b | 17 |  95 | 19 |
 (6 rows)

+-- The following tests computing RETURNING when the source and the destination
+-- partitions of a UPDATE row movement operation have different tuple
+-- descriptors, which has been shown to be problematic in the cases where the
+-- RETURNING targetlist contains non-target relation attributes that are
+-- computed by referring to the source partition plan's output tuple.  Also,
+-- a trigger on the destination relation may change the tuple, which must be
+-- reflected in the RETURNING output, so we test that too.
+CREATE TABLE part_c_1_c_20 (LIKE range_parted);
+ALTER TABLE part_c_1_c_20 DROP a, DROP b, ADD a text, ADD b bigint;
+ALTER TABLE range_parted ATTACH PARTITION part_c_1_c_20 FOR VALUES FROM ('c', 1) TO ('c', 20);
+CREATE FUNCTION trigfunc () RETURNS TRIGGER LANGUAGE plpgsql as $$ BEGIN NEW.e := 'in trigfunc()'; RETURN NEW; END;
$$;
+CREATE TRIGGER trig BEFORE INSERT ON part_c_1_c_20 FOR EACH ROW EXECUTE FUNCTION trigfunc();
+UPDATE range_parted r set a = 'c' FROM (VALUES ('a', 1), ('a', 10), ('b', 12)) s(x, y) WHERE s.x = r.a AND s.y = r.b
RETURNINGtableoid::regclass, *; 
+   tableoid    | a | b  |  c  | d |       e       | x | y
+---------------+---+----+-----+---+---------------+---+----
+ part_c_1_c_20 | c |  1 |   1 | 1 | in trigfunc() | a |  1
+ part_c_1_c_20 | c | 10 | 200 | 1 | in trigfunc() | a | 10
+ part_c_1_c_20 | c | 12 |  96 | 1 | in trigfunc() | b | 12
+(3 rows)
+
+DROP TRIGGER trig ON part_c_1_c_20;
+DROP FUNCTION trigfunc;
+:init_range_parted;
+CREATE FUNCTION trigfunc () RETURNS TRIGGER LANGUAGE plpgsql as $$ BEGIN RETURN NULL; END; $$;
+CREATE TRIGGER trig BEFORE INSERT ON part_c_1_c_20 FOR EACH ROW EXECUTE FUNCTION trigfunc();
+UPDATE range_parted r set a = 'c' FROM (VALUES ('a', 1), ('a', 10), ('b', 12)) s(x, y) WHERE s.x = r.a AND s.y = r.b
RETURNINGtableoid::regclass, *; 
+ tableoid | a | b | c | d | e | x | y
+----------+---+---+---+---+---+---+---
+(0 rows)
+
+:show_data;
+   partname   | a | b  |  c  | d  | e
+--------------+---+----+-----+----+---
+ part_c_1_100 | b | 13 |  97 |  2 |
+ part_d_15_20 | b | 15 | 105 | 16 |
+ part_d_15_20 | b | 17 | 105 | 19 |
+(3 rows)
+
+DROP TABLE part_c_1_c_20;
+DROP FUNCTION trigfunc;
 -- Transition tables with update row movement
 :init_range_parted;
 CREATE FUNCTION trans_updatetrigfunc() RETURNS trigger LANGUAGE plpgsql AS
diff --git a/src/test/regress/sql/update.sql b/src/test/regress/sql/update.sql
index 8c558a7bc7..dc7274a4bb 100644
--- a/src/test/regress/sql/update.sql
+++ b/src/test/regress/sql/update.sql
@@ -236,6 +236,31 @@ DROP VIEW upview;
 UPDATE range_parted set c = 95 WHERE a = 'b' and b > 10 and c > 100 returning (range_parted), *;
 :show_data;

+-- The following tests computing RETURNING when the source and the destination
+-- partitions of a UPDATE row movement operation have different tuple
+-- descriptors, which has been shown to be problematic in the cases where the
+-- RETURNING targetlist contains non-target relation attributes that are
+-- computed by referring to the source partition plan's output tuple.  Also,
+-- a trigger on the destination relation may change the tuple, which must be
+-- reflected in the RETURNING output, so we test that too.
+CREATE TABLE part_c_1_c_20 (LIKE range_parted);
+ALTER TABLE part_c_1_c_20 DROP a, DROP b, ADD a text, ADD b bigint;
+ALTER TABLE range_parted ATTACH PARTITION part_c_1_c_20 FOR VALUES FROM ('c', 1) TO ('c', 20);
+CREATE FUNCTION trigfunc () RETURNS TRIGGER LANGUAGE plpgsql as $$ BEGIN NEW.e := 'in trigfunc()'; RETURN NEW; END;
$$;
+CREATE TRIGGER trig BEFORE INSERT ON part_c_1_c_20 FOR EACH ROW EXECUTE FUNCTION trigfunc();
+UPDATE range_parted r set a = 'c' FROM (VALUES ('a', 1), ('a', 10), ('b', 12)) s(x, y) WHERE s.x = r.a AND s.y = r.b
RETURNINGtableoid::regclass, *; 
+
+DROP TRIGGER trig ON part_c_1_c_20;
+DROP FUNCTION trigfunc;
+
+:init_range_parted;
+CREATE FUNCTION trigfunc () RETURNS TRIGGER LANGUAGE plpgsql as $$ BEGIN RETURN NULL; END; $$;
+CREATE TRIGGER trig BEFORE INSERT ON part_c_1_c_20 FOR EACH ROW EXECUTE FUNCTION trigfunc();
+UPDATE range_parted r set a = 'c' FROM (VALUES ('a', 1), ('a', 10), ('b', 12)) s(x, y) WHERE s.x = r.a AND s.y = r.b
RETURNINGtableoid::regclass, *; 
+:show_data;
+
+DROP TABLE part_c_1_c_20;
+DROP FUNCTION trigfunc;

 -- Transition tables with update row movement
 :init_range_parted;

Re: problem with RETURNING and update row movement

От
Amit Langote
Дата:
On Thu, Apr 22, 2021 at 9:37 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Langote <amitlangote09@gmail.com> writes:
> > FWIW, I think we should go ahead and apply the patches for the bug
> > reported here.  Anyone who tries to project an updated tuple's system
> > columns using RETURNING are likely to face problems one way or
> > another, especially if they have partitioned tables containing
> > partitions of varying table AMs, but at least they won't face the bug
> > discussed here.
>
> Agreed, we should get this fixed in time for the next minor releases.

Thanks for taking a look at this.

> The issue no longer exists on HEAD, thanks to 86dc90056 having got
> rid of per-target-relation variance in the contents of planSlot.
> But we still need a fix for the back branches.
>
> So I looked over the v13 patch, and found a couple of things
> I didn't like:
>
> * I think what you did in ExecProcessReturning is buggy.  It's
> not a great idea to have completely different processes for
> getting tableoid set in normal-relation vs foreign-relation
> cases, and in this case the foreign-relation case was simply
> wrong.  Maybe the bug isn't reachable for lack of support of
> cross-partition motion with FDWs, but I'm not sure about that.

The two cases I see are some callers passing a valid 'tupleSlot' to
ExecProcessReturning() and some (just one!) not.  The latter case
applies only to a foreign relations that are directly-modified and
relies on the FDWs having set econtext->ecxt_scantuple to the correct
slot.  In the 1st case, the callers should already have made sure that
the correct tableoid is passed through 'tableSlot', although
Fujita-san had noticed in [1] that my earlier patch failed to do that
in the cross-partition update case.  Along with fixing the problem of
my patch, he also proposed that we set the tableoid of the slot
assigned to ecxt_scantuple only in the 2nd case to save cycles.

I'm fine with leaving it the way it is as your updated patch does, but
I don't really see a bug being introduced.  Actually, the code was
like what Fujita-san proposed we do before b8d71745eac changed it to
the current way:

@@ -166,20 +166,15 @@ ExecProcessReturning(ResultRelInfo *resultRelInfo,
    /* Make tuple and any needed join variables available to ExecProject */
    if (tupleSlot)
        econtext->ecxt_scantuple = tupleSlot;
-   else
-   {
-       HeapTuple   tuple;
-
-       /*
-        * RETURNING expressions might reference the tableoid column, so
-        * initialize t_tableOid before evaluating them.
-        */
-       Assert(!TupIsNull(econtext->ecxt_scantuple));
-       tuple = ExecFetchSlotHeapTuple(econtext->ecxt_scantuple, true, NULL);
-       tuple->t_tableOid = RelationGetRelid(resultRelInfo->ri_RelationDesc);
-   }
    econtext->ecxt_outertuple = planSlot;

+   /*
+    * RETURNING expressions might reference the tableoid column, so
+    * reinitialize tts_tableOid before evaluating them.
+    */
+   econtext->ecxt_scantuple->tts_tableOid =
+       RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
    /* Compute the RETURNING expressions */
    return ExecProject(projectReturning);
 }

> We really need to decouple the RETURNING expressions (which
> will belong to the source relation) from the value injected
> for tableoid (which will belong to the destination).

Yeah, that makes sense.

> * I really disliked the API change that ExecInsert is responsible
> for computing RETURNING except when it isn't.  That's confusing
> and there's no good reason for it, since it's not really any
> easier to deal with the case at the call site than inside ExecInsert.
>
> In the attached revision I made ExecInsert handle RETURNING
> calculations by asking the callers to pass in the ResultRelInfo
> that should be used for the purpose.

That seems fine to me.

>  We could alternatively
> have taken the responsibility for RETURNING out of ExecInsert
> altogether, making the callers call ExecProcessReturning.
> I think that might have netted out slightly simpler than this.
> But we're unlikely to apply such a change in HEAD, so it seemed
> better to keep the division of responsibilities the same as it
> is in other branches.

I agree.

> Thoughts?

The patch looks good to me, except one thing:

       /*
+        * In a cross-partition UPDATE with RETURNING, we have to use the
+        * source partition's RETURNING list, because that matches the output
+        * of the planSlot, while the destination partition might have
+        * different resjunk columns.  This means we have to map the
+        * destination slot back to the source's format so we can apply that
+        * RETURNING list.  This is expensive, but it should be an uncommon
+        * corner case, so we won't spend much effort on making it fast.
+        */
+       if (returningRelInfo != resultRelInfo)
+       {

I think we should also add slot != srcSlot to this condition.  The
uncommon corner case is the source and/or the destination partitions
having different column order than the root parent, thus requiring the
tuple to be converted during tuple routing using slots appropriate to
each relation, which causes 'slot' to end up different than 'srcSlot'.
But in the common case, where tuple descriptors match between all
tables involved, 'slot' should be the same as 'srcSlot'.

>  (I've not looked at porting this to v12 or v11 yet.)

I did that; patches attached.  (I haven't changed them to incorporate
the above comment though.)

--
Amit Langote
EDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/CAPmGK14QXD5Te_vwGgpuVWXRcrC%2Bd8FyWse0aHSqnDDSeeCRFQ%40mail.gmail.com

Вложения

Re: problem with RETURNING and update row movement

От
Tom Lane
Дата:
Amit Langote <amitlangote09@gmail.com> writes:
> I think we should also add slot != srcSlot to this condition.

Good idea, should save useless comparisons of identical tupdescs.
Done.

>> (I've not looked at porting this to v12 or v11 yet.)

> I did that; patches attached.  (I haven't changed them to incorporate
> the above comment though.)

Thanks for that, saved me some work.  I've pushed these.

            regards, tom lane



Re: problem with RETURNING and update row movement

От
Amit Langote
Дата:
On Fri, Apr 23, 2021 at 12:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Amit Langote <amitlangote09@gmail.com> writes:
> > I think we should also add slot != srcSlot to this condition.
>
> Good idea, should save useless comparisons of identical tupdescs.
> Done.
>
> >> (I've not looked at porting this to v12 or v11 yet.)
>
> > I did that; patches attached.  (I haven't changed them to incorporate
> > the above comment though.)
>
> Thanks for that, saved me some work.  I've pushed these.

Thanks for working on closing this and that other issue.

-- 
Amit Langote
EDB: http://www.enterprisedb.com



Re: problem with RETURNING and update row movement

От
Etsuro Fujita
Дата:
On Fri, Apr 23, 2021 at 10:08 AM Amit Langote <amitlangote09@gmail.com> wrote:
> On Fri, Apr 23, 2021 at 12:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Amit Langote <amitlangote09@gmail.com> writes:
> > >> (I've not looked at porting this to v12 or v11 yet.)
> >
> > > I did that; patches attached.  (I haven't changed them to incorporate
> > > the above comment though.)
> >
> > Thanks for that, saved me some work.  I've pushed these.
>
> Thanks for working on closing this and that other issue.

Thanks to both of you for working on these!

Best regards,
Etsuro Fujita