Обсуждение: Generated column is not updated (Postgres 13)

Поиск
Список
Период
Сортировка

Generated column is not updated (Postgres 13)

От
Vitaly Ustinov
Дата:
Hello,

I would like to report the following:
- a generated column is never updated if you pass the whole record to a stored procedure (an immutable function);
- however, it works fine if you pass individual columns to the function;
- both options work fine on Postgres 12.

I consider it a backward compatibility issue.
Please correct me if I am wrong, but I was not able to find anything related to this in the release notes for 13.x.

The background of this issue.
During the upgrade from 12.5 to 13.3 with pg_upgrade I had to drop a generated column and some other stuff. After the upgrade was successfully done, I needed to restore everything back. When I was adding the generated column all of a sudden I got a "Segmentation fault". I retried a few times with slightly different variants of the code, but each time I would get the same result - server crashed. This has been in production for many months now and never caused any issue.

2021-05-18 19:30:46 GMT [235415-9] LOG:  server process (PID 235429) was terminated by signal 11: Segmentation fault
2021-05-18 19:30:46 GMT [235415-10] DETAIL:  Failed process was running: ALTER TABLE ordering.requests_3pp ADD unique_hash bytea GENERATED ALWAYS AS (fn_ordering.calc_3pp_req_hash(requests_3pp.*)) STORED NOT NULL;
2021-05-18 19:30:46 GMT [235415-11] LOG:  terminating any other active server processes
2021-05-18 19:30:46 GMT [235422-1] WARNING:  terminating connection because of crash of another server process
2021-05-18 19:30:46 GMT [235422-2] DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2021-05-18 19:30:46 GMT [235422-3] HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2021-05-18 19:30:46 GMT [235415-12] LOG:  archiver process (PID 235423) exited with exit code 2
2021-05-18 19:30:46 GMT [235473-1] postgres@scas_acceptance FATAL:  the database system is in recovery mode
2021-05-18 19:30:46 GMT [235415-13] LOG:  all server processes terminated; reinitializing
2021-05-18 19:30:46 GMT [235474-1] LOG:  database system was interrupted; last known up at 2021-05-18 19:29:47 GMT
2021-05-18 19:30:46 GMT [235474-2] LOG:  database system was not properly shut down; automatic recovery in progress
2021-05-18 19:30:46 GMT [235474-3] LOG:  redo starts at C5B/E70000A0
2021-05-18 19:30:46 GMT [235474-4] LOG:  redo done at C5B/E706FF60
2021-05-18 19:30:46 GMT [235415-14] LOG:  database system is ready to accept connections


I started digging, each time simplifying something, and eventually I managed to make it working. I created a simple procedure which returns "id::text::bytea", then I added a new column and then I recreated the procedure with the initial business logic. And that's the reason I am reporting it, because it turned out that with Postgres 13 a generated column is never updated if you pass the whole record to the stored procedure. Like I said, it works fine in production on Postgres 12.5. As a workaround I will use a trigger, calling the same function until it's fixed.

I've run my tests against 13.2 and 13.3 (see below). But actually I think the best and easiest way of reproducing it would be using the official Docker images from Docker Hub. You can be absolutely sure it's executed in the exact same environment as I have, and that I don't use any specific configuration options etc. However, it's at your choice. This is how you can create the containers and start "psql" inside them:

Postgres 13.3:
$ docker pull postgres:13.3$ docker run -d --name pg13 -e POSTGRES_PASSWORD=postgres postgres:13.3
$ docker exec -it -u postgres pg13 psql

Postgres 12.7:
$ docker pull postgres:12.7
$ docker run -d --name pg12 -e POSTGRES_PASSWORD=postgres postgres:12.7
$ docker exec -it -u postgres pg12 psql

And then you can run the following SQL test case:
-----------------------------------------------------------
create table t(a text, b int);
insert into t values ('A', 1), ('B', 2), ('C', 3);

create or replace function calc_gen_plain(text, int) returns text
language sql immutable as
$$ select $1||'-'||$2; $$;

create or replace function calc_gen_rec_sql(t) returns text
language sql immutable as
$$ select $1.a||'-'||$1.b; $$;

create or replace function calc_gen_rec_plpgsql(t) returns text
language plpgsql immutable as
$$ begin
  raise notice '[plpgsql] a=%, b=%', $1.a, $1.b;
  return $1.a||'-'||$1.b;
end; $$;

alter table t add gen_val text not null generated always as (calc_gen_plain(a, b)) stored;
alter table t add gen_rec1 text not null generated always as (calc_gen_rec_sql(t)) stored;
alter table t add gen_rec2 text not null generated always as (calc_gen_rec_plpgsql(t)) stored;
select t.* from t;

update t set a = chr(ascii(a) + 3), b = b + 3;
select t.* from t;
-----------------------------------------------------------

As a result, after the UPDATE command I expect all generated columns to contain the same value (within each row), and that's what I actually get with Postgres 12:

 a | b | gen_val | gen_rec1 | gen_rec2
---+---+---------+----------+----------
 D | 4 | D-4     | D-4      | D-4
 E | 5 | E-5     | E-5      | E-5
 F | 6 | F-6     | F-6      | F-6

But with Postgres 13 the two last columns are not updated.
Moreover, the "raise notice" statement in the calc_gen_rec_plpgsql() function is not executed for the "UPDATE" command, so I think the function is simply not called at all.

 a | b | gen_val | gen_rec1 | gen_rec2
---+---+---------+----------+----------
 D | 4 | D-4     | A-1      | A-1
 E | 5 | E-5     | B-2      | B-2
 F | 6 | F-6     | C-3      | C-3

Please look into the attachments for detailed output.
Besides, below you will find some info about my environments where I've got the initial "Segmentation fault" issue.

Thank you!


Postgres 13.2 installed from standard binary packages.

$ apt list --installed | grep postgres

postgresql-13/now 13.2-1.pgdg18.04+1 amd64 [installed,upgradable to: 13.3-1.pgdg18.04+1]
postgresql-13-cron/now 1.3.0-2.pgdg18.04+1 amd64 [installed,upgradable to: 1.3.1-1.pgdg18.04+1]
postgresql-13-mysql-fdw/bionic-pgdg,now 2.5.5-2.pgdg18.04+1 amd64 [installed]
postgresql-13-repack/bionic-pgdg,now 1.4.6-1.pgdg18.04+1 amd64 [installed]
postgresql-client-13/now 13.2-1.pgdg18.04+1 amd64 [installed,upgradable to: 13.3-1.pgdg18.04+1]
postgresql-client-common/now 225.pgdg18.04+1 all [installed,upgradable to: 226.pgdg18.04+1]
postgresql-common/now 225.pgdg18.04+1 all [installed,upgradable to: 226.pgdg18.04+1]

$ uname -a
Linux elxajw3dxz1 5.4.0-73-generic #82~18.04.1-Ubuntu SMP Fri Apr 16 15:10:02 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.5 LTS
Release: 18.04
Codename: bionic

Postgres 13.3 installed from binary packages "13.3-2PGDG.rhel8"

postgresql13-13.3-2PGDG.rhel8.x86_64.rpm
postgresql13-contrib-13.3-2PGDG.rhel8.x86_64.rpm
postgresql13-libs-13.3-2PGDG.rhel8.x86_64.rpm
postgresql13-server-13.3-2PGDG.rhel8.x86_64.rpm

$ uname -a
Linux seliiudb01107 4.18.0-193.19.1.el8_2.x86_64 #1 SMP Wed Aug 26 15:29:02 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: RedHatEnterprise
Description: Red Hat Enterprise Linux release 8.2 (Ootpa)
Release: 8.2
Codename: Ootpa



Regards,
Vitaly Ustinov
Вложения

Re: Generated column is not updated (Postgres 13)

От
Tom Lane
Дата:
Vitaly Ustinov <vitaly@ustinov.ca> writes:
> I would like to report the following:
> - a generated column is never updated if you pass the whole record to a
> stored procedure (an immutable function);

TBH, I think that this is insane and needs to be forbidden.  What value
are you expecting that the function would see in the column of the
whole-row var that corresponds to the generated column?  It surely
cannot be passed the value that it hasn't computed yet.

You could perhaps argue that it'd be okay to pass NULL.  The problem
with that, though, is that it'd violate the NOT NULL constraint that
exists for the generated column.  Quite aside from any confusion that
ensues at the user level, I'm afraid that that could result in C code
crashes --- there are, for example, places in tuple deforming that
assume that NOT NULL constraints are truthful.

Anyway, I tried your test case on a debug build and observed assertion
failures, which I traced to ATRewriteTable not being careful enough
to zero out the whole tuple each time through.  Conceivably we could
fix that as per the attached quick hack.  However, I think we ought
to disallow the case instead.  I observe that we already disallow
generated columns depending on each other:

regression=# create table foo (f1 int);
CREATE TABLE
regression=# alter table foo add f2 int generated always as (f1+1) stored;
ALTER TABLE
regression=# alter table foo add f3 int generated always as (f2+1) stored;
ERROR:  cannot use generated column "f2" in column generation expression
DETAIL:  A generated column cannot reference another generated column.

But a whole-row var violates this concept completely: it makes the
generated column depend, not only on every other column, but on itself
too.  Also, even if you don't mind null-for-not-yet-computed-value,
that would expose the computation order of the generated columns.

            regards, tom lane

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index ebc62034d2..4178cde3b2 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5699,16 +5699,6 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
                                                table_slot_callbacks(oldrel));
             newslot = MakeSingleTupleTableSlot(newTupDesc,
                                                table_slot_callbacks(newrel));
-
-            /*
-             * Set all columns in the new slot to NULL initially, to ensure
-             * columns added as part of the rewrite are initialized to NULL.
-             * That is necessary as tab->newvals will not contain an
-             * expression for columns with a NULL default, e.g. when adding a
-             * column without a default together with a column with a default
-             * requiring an actual rewrite.
-             */
-            ExecStoreAllNullTuple(newslot);
         }
         else
         {
@@ -5747,9 +5737,21 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)

             if (tab->rewrite > 0)
             {
+                /*
+                 * Set all columns in the new slot to NULL initially, to
+                 * ensure columns added as part of the rewrite are initialized
+                 * to NULL.  That is necessary as tab->newvals will not
+                 * contain an expression for columns with a NULL default, e.g.
+                 * when adding a column without a default together with a
+                 * column with a default requiring an actual rewrite.
+                 * Moreover, we must do it each time through, so that the slot
+                 * contents are sane in case any generated expression contains
+                 * a whole-row Var (which will read this same slot).
+                 */
+                ExecStoreAllNullTuple(newslot);
+
                 /* Extract data from old tuple */
                 slot_getallattrs(oldslot);
-                ExecClearTuple(newslot);

                 /* copy attributes */
                 memcpy(newslot->tts_values, oldslot->tts_values,
@@ -5782,12 +5784,12 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
                                        &newslot->tts_isnull[ex->attnum - 1]);
                 }

-                ExecStoreVirtualTuple(newslot);
-
                 /*
                  * Now, evaluate any expressions whose inputs come from the
                  * new tuple.  We assume these columns won't reference each
-                 * other, so that there's no ordering dependency.
+                 * other, so that there's no ordering dependency.  However,
+                 * because we allow whole-row references in GENERATED
+                 * expressions, that's not strictly true ...
                  */
                 econtext->ecxt_scantuple = newslot;


Re: Generated column is not updated (Postgres 13)

От
"David G. Johnston"
Дата:
On Wed, May 19, 2021 at 3:03 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Vitaly Ustinov <vitaly@ustinov.ca> writes:
> I would like to report the following:
> - a generated column is never updated if you pass the whole record to a
> stored procedure (an immutable function);

You could perhaps argue that it'd be okay to pass NULL.  The problem
with that, though, is that it'd violate the NOT NULL constraint that
exists for the generated column.  Quite aside from any confusion that
ensues at the user level, I'm afraid that that could result in C code
crashes --- there are, for example, places in tuple deforming that
assume that NOT NULL constraints are truthful.

Knowing why version 12 worked would be helpful in judging whether to live with past decisions, fix poorly made previous decisions, or just move forward with the new behavior in v14 and leave the past alone.  I agree that the better design choice would have us forbid this, though that seems problematic in its own right.  The function itself is defined correctly and aside from the fact it is in a generated expression it is also called correctly.  I suppose the error would be of the form "cannot form a valid whole-row var due to unspecified generated column data"?

FWIW I can definitely see where the OP is coming from with this - the expression at first blush, if one assumes PostgreSQL can handle the nuances, seems like a perfectly reasonable semantic way to express the programmer's desire.  Combined with the fact it used to work makes me want to lean toward keeping it working even if it takes come code hackery.

David J.

Re: Generated column is not updated (Postgres 13)

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> FWIW I can definitely see where the OP is coming from with this - the
> expression at first blush, if one assumes PostgreSQL can handle the
> nuances, seems like a perfectly reasonable semantic way to express the
> programmer's desire.  Combined with the fact it used to work makes me want
> to lean toward keeping it working even if it takes come code hackery.

I dunno, I think it "used to work" only for exceedingly small values
of "work".  For me, the given test case hits the same assertion failure
in all versions back to v12, which is unsurprising because the code in
ATRewriteTable is about the same.

Also, considering only the table-rewrite code path is a mistake.
The fundamental problem here is that the behavior is ill-defined and
necessarily implementation-dependent; which makes it likely that other
code paths behave inconsistently with ALTER TABLE ADD COLUMN.

            regards, tom lane



Re: Generated column is not updated (Postgres 13)

От
Tom Lane
Дата:
I wrote:
> ... I think we ought
> to disallow the case instead.  I observe that we already disallow
> generated columns depending on each other: ...
> But a whole-row var violates this concept completely: it makes the
> generated column depend, not only on every other column, but on itself
> too.  Also, even if you don't mind null-for-not-yet-computed-value,
> that would expose the computation order of the generated columns.

After actually looking at the code involved, I'm even more on the
warpath.  Not only is it failing to reject whole-row vars, but it's
failing to reject system columns.  That is (a) infeasible to support,
given that we don't know the values of the system columns at the time
we compute generated expressions, and (b) just plain ludicrous in
expressions that are required to be immutable.

I see that there is actually a regression test case that believes
that "tableoid" should be allowed, but I think that is nonsense.

In the first place, it's impossible to claim that tableoid is an
immutable expression.  Consider, say, "tableoid > 30000".  Do you
think such a column is likely to survive dump-and-reload unchanged?
Also, while that example is artificial, I'm having a hard time
coming up with realistic immutable use-cases for generation
expressions involving tableoid.

In the second place, there are a bunch of implementation dependencies
that we'd have to fix if we want to consider that supported.  I think
it's mostly accidental that the case seems to work in the mainline
INSERT code path.  It's not hard to find cases where it does not work,
for example

regression=# create table foo (f1 int);
CREATE TABLE
regression=# insert into foo values (1);
INSERT 0 1
regression=# alter table foo add column f2 oid GENERATED ALWAYS AS (tableoid) STORED;
ALTER TABLE
regression=# table foo;
 f1 | f2
----+----
  1 |  0
(1 row)

So I think we should just forbid tableoid along with other system
columns, as attached.

            regards, tom lane

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 431e62e389..53ff26774a 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -3020,15 +3020,29 @@ check_nested_generated_walker(Node *node, void *context)
         AttrNumber    attnum;

         relid = rt_fetch(var->varno, pstate->p_rtable)->relid;
+        if (!OidIsValid(relid))
+            return false;        /* XXX shouldn't we raise an error? */
+
         attnum = var->varattno;

-        if (OidIsValid(relid) && AttributeNumberIsValid(attnum) && get_attgenerated(relid, attnum))
+        if (attnum > 0 && get_attgenerated(relid, attnum))
             ereport(ERROR,
-                    (errcode(ERRCODE_SYNTAX_ERROR),
+                    (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
                      errmsg("cannot use generated column \"%s\" in column generation expression",
                             get_attname(relid, attnum, false)),
                      errdetail("A generated column cannot reference another generated column."),
                      parser_errposition(pstate, var->location)));
+        if (attnum == 0)
+            ereport(ERROR,
+                    (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                     errmsg("cannot use whole-row variable in column generation expression"),
+                     parser_errposition(pstate, var->location)));
+        if (attnum < 0)
+            ereport(ERROR,
+                    (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                     errmsg("cannot use system column \"%s\" in column generation expression",
+                            get_attname(relid, attnum, false)),
+                     parser_errposition(pstate, var->location)));

         return false;
     }
diff --git a/src/test/regress/expected/generated.out b/src/test/regress/expected/generated.out
index 675773f0c1..896b568165 100644
--- a/src/test/regress/expected/generated.out
+++ b/src/test/regress/expected/generated.out
@@ -46,6 +46,16 @@ ERROR:  cannot use generated column "b" in column generation expression
 LINE 1: ...AYS AS (a * 2) STORED, c int GENERATED ALWAYS AS (b * 3) STO...
                                                              ^
 DETAIL:  A generated column cannot reference another generated column.
+-- a whole-row var is a self-reference on steroids
+CREATE TABLE gtest_err_2c (a int PRIMARY KEY, b int GENERATED ALWAYS AS (length(gtest_err_2c::text)) STORED);
+ERROR:  cannot use whole-row variable in column generation expression
+LINE 1: ...nt PRIMARY KEY, b int GENERATED ALWAYS AS (length(gtest_err_...
+                                                             ^
+-- no system columns, either, as those aren't known when generation occurs
+CREATE TABLE gtest_err_2d (a int PRIMARY KEY, b oid GENERATED ALWAYS AS (tableoid) STORED);
+ERROR:  cannot use system column "tableoid" in column generation expression
+LINE 1: ...2d (a int PRIMARY KEY, b oid GENERATED ALWAYS AS (tableoid) ...
+                                                             ^
 -- invalid reference
 CREATE TABLE gtest_err_3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (c * 2) STORED);
 ERROR:  column "c" does not exist
@@ -452,19 +462,6 @@ SELECT * FROM gtest4;

 DROP TABLE gtest4;
 DROP TYPE double_int;
--- using tableoid is allowed
-CREATE TABLE gtest_tableoid (
-  a int PRIMARY KEY,
-  b bool GENERATED ALWAYS AS (tableoid <> 0) STORED
-);
-INSERT INTO gtest_tableoid VALUES (1), (2);
-SELECT * FROM gtest_tableoid;
- a | b
----+---
- 1 | t
- 2 | t
-(2 rows)
-
 -- drop column behavior
 CREATE TABLE gtest10 (a int PRIMARY KEY, b int, c int GENERATED ALWAYS AS (b * 2) STORED);
 ALTER TABLE gtest10 DROP COLUMN b;
diff --git a/src/test/regress/sql/generated.sql b/src/test/regress/sql/generated.sql
index 63251c443a..69c334742f 100644
--- a/src/test/regress/sql/generated.sql
+++ b/src/test/regress/sql/generated.sql
@@ -17,6 +17,11 @@ CREATE TABLE gtest_err_1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) S
 -- references to other generated columns, including self-references
 CREATE TABLE gtest_err_2a (a int PRIMARY KEY, b int GENERATED ALWAYS AS (b * 2) STORED);
 CREATE TABLE gtest_err_2b (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED, c int GENERATED ALWAYS AS (b *
3)STORED); 
+-- a whole-row var is a self-reference on steroids
+CREATE TABLE gtest_err_2c (a int PRIMARY KEY, b int GENERATED ALWAYS AS (length(gtest_err_2c::text)) STORED);
+
+-- no system columns, either, as those aren't known when generation occurs
+CREATE TABLE gtest_err_2d (a int PRIMARY KEY, b oid GENERATED ALWAYS AS (tableoid) STORED);

 -- invalid reference
 CREATE TABLE gtest_err_3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (c * 2) STORED);
@@ -215,14 +220,6 @@ SELECT * FROM gtest4;
 DROP TABLE gtest4;
 DROP TYPE double_int;

--- using tableoid is allowed
-CREATE TABLE gtest_tableoid (
-  a int PRIMARY KEY,
-  b bool GENERATED ALWAYS AS (tableoid <> 0) STORED
-);
-INSERT INTO gtest_tableoid VALUES (1), (2);
-SELECT * FROM gtest_tableoid;
-
 -- drop column behavior
 CREATE TABLE gtest10 (a int PRIMARY KEY, b int, c int GENERATED ALWAYS AS (b * 2) STORED);
 ALTER TABLE gtest10 DROP COLUMN b;

Re: Generated column is not updated (Postgres 13)

От
Vitaly Ustinov
Дата:
Hi,

Thank you very much for the quick response and feedback. I completely
understand your point, Tom. And I can go back to using triggers
instead. After all, this whole "generated columns" feature is just
syntax sugar. In my real case, the function accepts a row containing
dozens of columns and returns a SHA-1 hash that must be unique,
following pretty sophisticated business logic. Something like: if type
= X and subtype = Y then combine these fields, else if ... and so on.
That's why it's so convenient to pass the whole row.

For the record, I think that passing NULL as a value for all generated
columns would not be such a bad idea, because that's exactly what NULL
represents - an unknown value. And I agree that it would be insane to
rely on the order of calculation, if someone decided to read a value
that is still being computed. It reminds me of the famous "mutating
table" issue while using triggers.

As to the "NOT NULL" and other sorts of constraints - it's also fine,
because integrity constraints are applied later. Just to illustrate my
idea:

create table foo(
  id serial,
  val text,
  hash bytea not null unique
);

insert into foo(val) values('A');

If I had a "before insert on foo for each row" trigger, what would be
the initial "hash" value in it? It would be NULL.
Can I temporarily assign "NEW.hash := NULL" in this trigger, until I
have not yet reached the "return NEW" statement? Yes, I can.
Can I temporarily assign a non-unique value to "NEW.hash"? Yes, I can.

Anyway, I trust your discretion. Thanks!

Regards,
Vitaly Ustinov

Regards,
Vitaly Ustinov


On Wed, May 19, 2021 at 9:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I wrote:
> > ... I think we ought
> > to disallow the case instead.  I observe that we already disallow
> > generated columns depending on each other: ...
> > But a whole-row var violates this concept completely: it makes the
> > generated column depend, not only on every other column, but on itself
> > too.  Also, even if you don't mind null-for-not-yet-computed-value,
> > that would expose the computation order of the generated columns.
>
> After actually looking at the code involved, I'm even more on the
> warpath.  Not only is it failing to reject whole-row vars, but it's
> failing to reject system columns.  That is (a) infeasible to support,
> given that we don't know the values of the system columns at the time
> we compute generated expressions, and (b) just plain ludicrous in
> expressions that are required to be immutable.
>
> I see that there is actually a regression test case that believes
> that "tableoid" should be allowed, but I think that is nonsense.
>
> In the first place, it's impossible to claim that tableoid is an
> immutable expression.  Consider, say, "tableoid > 30000".  Do you
> think such a column is likely to survive dump-and-reload unchanged?
> Also, while that example is artificial, I'm having a hard time
> coming up with realistic immutable use-cases for generation
> expressions involving tableoid.
>
> In the second place, there are a bunch of implementation dependencies
> that we'd have to fix if we want to consider that supported.  I think
> it's mostly accidental that the case seems to work in the mainline
> INSERT code path.  It's not hard to find cases where it does not work,
> for example
>
> regression=# create table foo (f1 int);
> CREATE TABLE
> regression=# insert into foo values (1);
> INSERT 0 1
> regression=# alter table foo add column f2 oid GENERATED ALWAYS AS (tableoid) STORED;
> ALTER TABLE
> regression=# table foo;
>  f1 | f2
> ----+----
>   1 |  0
> (1 row)
>
> So I think we should just forbid tableoid along with other system
> columns, as attached.
>
>                         regards, tom lane
>



Re: Generated column is not updated (Postgres 13)

От
Vitaly Ustinov
Дата:
Tom Lane wrote:

> TBH, I think that this is insane and needs to be forbidden. What value
> are you expecting that the function would see in the column of the
> whole-row var that corresponds to the generated column?  It surely
> cannot be passed the value that it hasn't computed yet.

Perhaps this problem has already been addressed and resolved.
https://www.postgresql.org/docs/13/trigger-definition.html

> Stored generated columns are computed after BEFORE triggers and
> before AFTER triggers. Therefore, the generated value can be inspected
> in AFTER triggers. In BEFORE triggers, the OLD row contains the old
> generated value, as one would expect, but the NEW row does not yet
> contain the new generated value and should not be accessed.
> In the C language interface, the content of the column is undefined at
> this point; a higher-level programming language should prevent access
> to a stored generated column in the NEW row in a BEFORE trigger.

To bottom-line, this is the responsibility of the app developers.
Disclaimer: we warned you.

Tom Lane wrote:

> However, I think we ought to disallow the case instead. I observe that
> we already disallow generated columns depending on each other.

Right, and a function getting a whole-row var can bypass this limitation.
But let's take a look at the same documentation page again and check
what it says about interdependent triggers:

> If more than one trigger is defined for the same event on the same
> relation, the triggers will be fired in alphabetical order by trigger name.
> In the case of BEFORE and INSTEAD OF triggers, the possibly-modified
> row returned by each trigger becomes the input to the next trigger.

Similar to the described behavior, a solution could be to calculate
the generated
columns in alphabetical order. But I'd rather use the same formulation "should
not be accessed" and shift the responsibility to the app developers.

Regards,
Vitaly Ustinov



Re: Generated column is not updated (Postgres 13)

От
Tom Lane
Дата:
After looking at this further, I see that there already is a check
for system columns in GENERATED expressions; it's just in the
parser (scanNSItemForColumn), not where I was looking for it.
And it explicitly exempts tableoid.  I continue to think that
that's a bad idea and we're gonna regret it, but at least the
issue in ATRewriteTable turns out to be trivial to fix.
So I did that and upgraded the test scenario to be a bit
more realistic, in 0001 below.  0002 then just disallows
whole-row variables.  (I added an errdetail trying to
explain that restriction, too.)

It's worth pointing out here that what seems to me to be a
"more realistic" test scenario involves a regclass constant
for the table's own OID.  If that doesn't scare you, it should,
because it implies all kinds of constraints on the order in which
these expressions are processed during CREATE or ALTER TABLE.
The test passes check-world, which includes dump/reload, but I am
sorely afraid that there are ways in which this sort of thing
would fail.  Do we *really* want to buy into supporting it?

            regards, tom lane

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index ebc62034d2..85dcc33063 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -5761,6 +5761,14 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
                 foreach(lc, dropped_attrs)
                     newslot->tts_isnull[lfirst_int(lc)] = true;

+                /*
+                 * Constraints and GENERATED expressions might reference the
+                 * tableoid column, so fill tts_tableOid with the desired
+                 * value.  (We must do this each time, because it gets
+                 * overwritten with newrel's OID during storing.)
+                 */
+                newslot->tts_tableOid = RelationGetRelid(oldrel);
+
                 /*
                  * Process supplied expressions to replace selected columns.
                  *
@@ -5804,11 +5812,6 @@ ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode)
                                        &newslot->tts_isnull[ex->attnum - 1]);
                 }

-                /*
-                 * Constraints might reference the tableoid column, so
-                 * initialize t_tableOid before evaluating them.
-                 */
-                newslot->tts_tableOid = RelationGetRelid(oldrel);
                 insertslot = newslot;
             }
             else
diff --git a/src/test/regress/expected/generated.out b/src/test/regress/expected/generated.out
index 675773f0c1..71542e95d5 100644
--- a/src/test/regress/expected/generated.out
+++ b/src/test/regress/expected/generated.out
@@ -64,6 +64,7 @@ ERROR:  both identity and generation expression specified for column "b" of tabl
 LINE 1: ...t PRIMARY KEY, b int GENERATED ALWAYS AS identity GENERATED ...
                                                              ^
 -- reference to system column not allowed in generated column
+-- (except tableoid, which we test below)
 CREATE TABLE gtest_err_6a (a int PRIMARY KEY, b bool GENERATED ALWAYS AS (xmin <> 37) STORED);
 ERROR:  cannot use system column "xmin" in column generation expression
 LINE 1: ...a (a int PRIMARY KEY, b bool GENERATED ALWAYS AS (xmin <> 37...
@@ -455,14 +456,16 @@ DROP TYPE double_int;
 -- using tableoid is allowed
 CREATE TABLE gtest_tableoid (
   a int PRIMARY KEY,
-  b bool GENERATED ALWAYS AS (tableoid <> 0) STORED
+  b bool GENERATED ALWAYS AS (tableoid = 'gtest_tableoid'::regclass) STORED
 );
 INSERT INTO gtest_tableoid VALUES (1), (2);
+ALTER TABLE gtest_tableoid ADD COLUMN
+  c regclass GENERATED ALWAYS AS (tableoid) STORED;
 SELECT * FROM gtest_tableoid;
- a | b
----+---
- 1 | t
- 2 | t
+ a | b |       c
+---+---+----------------
+ 1 | t | gtest_tableoid
+ 2 | t | gtest_tableoid
 (2 rows)

 -- drop column behavior
diff --git a/src/test/regress/sql/generated.sql b/src/test/regress/sql/generated.sql
index 63251c443a..914197608b 100644
--- a/src/test/regress/sql/generated.sql
+++ b/src/test/regress/sql/generated.sql
@@ -29,6 +29,7 @@ CREATE TABLE gtest_err_5a (a int PRIMARY KEY, b int DEFAULT 5 GENERATED ALWAYS A
 CREATE TABLE gtest_err_5b (a int PRIMARY KEY, b int GENERATED ALWAYS AS identity GENERATED ALWAYS AS (a * 2) STORED);

 -- reference to system column not allowed in generated column
+-- (except tableoid, which we test below)
 CREATE TABLE gtest_err_6a (a int PRIMARY KEY, b bool GENERATED ALWAYS AS (xmin <> 37) STORED);

 -- various prohibited constructs
@@ -218,9 +219,11 @@ DROP TYPE double_int;
 -- using tableoid is allowed
 CREATE TABLE gtest_tableoid (
   a int PRIMARY KEY,
-  b bool GENERATED ALWAYS AS (tableoid <> 0) STORED
+  b bool GENERATED ALWAYS AS (tableoid = 'gtest_tableoid'::regclass) STORED
 );
 INSERT INTO gtest_tableoid VALUES (1), (2);
+ALTER TABLE gtest_tableoid ADD COLUMN
+  c regclass GENERATED ALWAYS AS (tableoid) STORED;
 SELECT * FROM gtest_tableoid;

 -- drop column behavior
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index 431e62e389..ba3e1ecf45 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -3020,15 +3020,26 @@ check_nested_generated_walker(Node *node, void *context)
         AttrNumber    attnum;

         relid = rt_fetch(var->varno, pstate->p_rtable)->relid;
+        if (!OidIsValid(relid))
+            return false;        /* XXX shouldn't we raise an error? */
+
         attnum = var->varattno;

-        if (OidIsValid(relid) && AttributeNumberIsValid(attnum) && get_attgenerated(relid, attnum))
+        if (attnum > 0 && get_attgenerated(relid, attnum))
             ereport(ERROR,
-                    (errcode(ERRCODE_SYNTAX_ERROR),
+                    (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
                      errmsg("cannot use generated column \"%s\" in column generation expression",
                             get_attname(relid, attnum, false)),
                      errdetail("A generated column cannot reference another generated column."),
                      parser_errposition(pstate, var->location)));
+        /* A whole-row Var is necessarily self-referential, so forbid it */
+        if (attnum == 0)
+            ereport(ERROR,
+                    (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
+                     errmsg("cannot use whole-row variable in column generation expression"),
+                     errdetail("This would cause the generated column to depend on its own value."),
+                     parser_errposition(pstate, var->location)));
+        /* System columns were already checked in the parser */

         return false;
     }
diff --git a/src/test/regress/expected/generated.out b/src/test/regress/expected/generated.out
index 71542e95d5..c2e5676196 100644
--- a/src/test/regress/expected/generated.out
+++ b/src/test/regress/expected/generated.out
@@ -46,6 +46,13 @@ ERROR:  cannot use generated column "b" in column generation expression
 LINE 1: ...AYS AS (a * 2) STORED, c int GENERATED ALWAYS AS (b * 3) STO...
                                                              ^
 DETAIL:  A generated column cannot reference another generated column.
+-- a whole-row var is a self-reference on steroids, so disallow that too
+CREATE TABLE gtest_err_2c (a int PRIMARY KEY,
+    b int GENERATED ALWAYS AS (num_nulls(gtest_err_2c)) STORED);
+ERROR:  cannot use whole-row variable in column generation expression
+LINE 2:     b int GENERATED ALWAYS AS (num_nulls(gtest_err_2c)) STOR...
+                                                 ^
+DETAIL:  This would cause the generated column to depend on its own value.
 -- invalid reference
 CREATE TABLE gtest_err_3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (c * 2) STORED);
 ERROR:  column "c" does not exist
diff --git a/src/test/regress/sql/generated.sql b/src/test/regress/sql/generated.sql
index 914197608b..b7eb072671 100644
--- a/src/test/regress/sql/generated.sql
+++ b/src/test/regress/sql/generated.sql
@@ -17,6 +17,9 @@ CREATE TABLE gtest_err_1 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) S
 -- references to other generated columns, including self-references
 CREATE TABLE gtest_err_2a (a int PRIMARY KEY, b int GENERATED ALWAYS AS (b * 2) STORED);
 CREATE TABLE gtest_err_2b (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED, c int GENERATED ALWAYS AS (b *
3)STORED); 
+-- a whole-row var is a self-reference on steroids, so disallow that too
+CREATE TABLE gtest_err_2c (a int PRIMARY KEY,
+    b int GENERATED ALWAYS AS (num_nulls(gtest_err_2c)) STORED);

 -- invalid reference
 CREATE TABLE gtest_err_3 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (c * 2) STORED);

Re: Generated column is not updated (Postgres 13)

От
Vitaly Ustinov
Дата:
In a BEFORE trigger (PL/pgSQL) I can access not yet computed generated
columns via the "NEW" whole-row var. I can read from it (NULL), and I
can write to it, but the assigned value will be ignored.

This behavior seems obvious and well thought through to me. It's not
something app developers should do, because it wouldn't make sense,
but at least it's not forbidden and the server process does not crash
with a seg fault.

Is it okay with you that the patch 0002 introduces some inconsistency with that?

Regards,
Vitaly



Re: Generated column is not updated (Postgres 13)

От
Tom Lane
Дата:
Vitaly Ustinov <vitaly@ustinov.ca> writes:
> In a BEFORE trigger (PL/pgSQL) I can access not yet computed generated
> columns via the "NEW" whole-row var. I can read from it (NULL), and I
> can write to it, but the assigned value will be ignored.

NEW is not really a whole-row var in the sense that we're discussing
here.  In any case, yes, nodeModifyTable runs computation of generated
columns after firing BEFORE triggers.  I suppose that's to prevent
the triggers from interfering with those values.

> Is it okay with you that the patch 0002 introduces some inconsistency with that?

In what way would this introduce (new) inconsistency?  I didn't move
the computation of those values from where they were.  Moreover, so
far as I can see, ATRewriteTable doesn't fire BEFORE triggers at all.

BTW, while looking around nodeModifyTable, I noted some other gaps in our
support for "tableoid" in GENERATED expressions: the FDW code paths in
ExecInsert and ExecUpdate fail to fill tts_tableOid till after running
ExecComputeStoredGenerated.  So they have the same bug that 0001 fixes
in ATRewriteTable.  It can be fixed the same way, ie move the code
that fills tts_tableOid; but I remain very concerned about how many
other gaps there are.

            regards, tom lane