Обсуждение: BUG #14124: ON UPDATE CASCADE failure on repeated foreign key column

Поиск
Список
Период
Сортировка

BUG #14124: ON UPDATE CASCADE failure on repeated foreign key column

От
targen@gmail.com
Дата:
The following bug has been logged on the website:

Bug reference:      14124
Logged by:          Manuel Gómez
Email address:      targen@gmail.com
PostgreSQL version: 9.4.7
Operating system:   Ubuntu Linux 14.04
Description:

Foreign keys with ON UPDATE CASCADE generate invalid UPDATE statements if a
column is used multiple times on the referring side of the foreign key.

postgres=# create table foo (x int, y int, unique (x, y));
CREATE TABLE

postgres=# create table bar (z int, foreign key (z, z) references foo (x, y)
on update cascade);
CREATE TABLE

postgres=# insert into foo values (42, 42);
INSERT 0 1

postgres=# insert into bar values (42);
INSERT 0 1

postgres=# update foo set x = 27, y = 27;
ERROR:  multiple assignments to same column "z"
CONTEXT:  SQL statement "UPDATE ONLY "bar" SET "z" = $1, "z" = $2 WHERE $3
OPERATOR(pg_catalog.=) "z" AND $4 OPERATOR(pg_catalog.=) "z""

I don't know whether this is expected behavior.  Indeed it would make sense
for the process to fail if the two referenced columns are set to different
values, as it would be unclear, in this example, whether z should take its
value from x or y.  However, the cascading update should work when they are
indeed equal, and not produce a malformed statement.

I can work around this with manually defined triggers, but this can probably
be fixed.  I could provide more complete context for the practical situation
in which this issue arises if motivation is required.

Re: BUG #14124: ON UPDATE CASCADE failure on repeated foreign key column

От
Tom Lane
Дата:
targen@gmail.com writes:
> postgres=# create table bar (z int, foreign key (z, z) references foo (x, y)
> on update cascade);

Hmm.  Such a constraint is specifically forbidden by the SQL standard,
see SQL:2011 11.8 <referential constraint definition> syntax rule 8:

    Each referencing column shall identify a column of the referencing
    table, and the same column shall not be identified more than once.

Can you produce a convincing use-case for this structure?

> postgres=# update foo set x = 27, y = 27;
> ERROR:  multiple assignments to same column "z"
> CONTEXT:  SQL statement "UPDATE ONLY "bar" SET "z" = $1, "z" = $2 WHERE $3
> OPERATOR(pg_catalog.=) "z" AND $4 OPERATOR(pg_catalog.=) "z""

> I don't know whether this is expected behavior.  Indeed it would make sense
> for the process to fail if the two referenced columns are set to different
> values, as it would be unclear, in this example, whether z should take its
> value from x or y.  However, the cascading update should work when they are
> indeed equal, and not produce a malformed statement.

It would be difficult to support that, AFAICS, because the FK enforcement
queries do not depend on things like whether the specific values involved
are equal.  Nor would one generally expect that an FK constraint should be
allowed to restrict what can be stored in the referenced table.
Moreover, it's not even real clear to me that there is a well defined
notion of "equal" in such a situation.  If x and y are of different types,
it's possible they would have different equality rules.  (This is somewhat
constrained by the fact that z would have to have equality operators in
common with both; but I do not think that's quite sufficient to remove the
concern.)

It could be argued that we should hew strictly to the letter of the SQL
standard and forbid foreign-key constraints with duplicate referencing
columns.  Or maybe allow it only if the FK action is not one where we'd
run into this problem.  I'm not terribly excited about trying to support
the case, though.

            regards, tom lane

Re: BUG #14124: ON UPDATE CASCADE failure on repeated foreign key column

От
Manuel Gómez
Дата:
On Mon, May 2, 2016 at 7:39 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hmm.  Such a constraint is specifically forbidden by the SQL standard,
> see SQL:2011 11.8 <referential constraint definition> syntax rule 8:
>
>         Each referencing column shall identify a column of the referencing
>         table, and the same column shall not be identified more than once.

That’s a shame.  I suppose it’s a reasonable rule to simplify
semantics, especially given that it can be worked around by adding
another column guaranteed by a CHECK constraint to be identical,
although the performance overhead is regrettable, of course.

> Can you produce a convincing use-case for this structure?

We have a certain sort of object in our database that comes in two
flavors: a primitive sort of object, and a derived sort of object.
Derived objects are always associated through a derivation
relationship to some primitive object, and primitive objects are not
associated to any object through derivation.  Additionally, these
objects are attributed with a label that is independent of whether
they are primitive or derived, but a derived object's label must match
the label of the primitive object to which it is associated through
derivation.  Many other database objects must make references to
objects of either flavor, so it’s rather important that they all live
in a single table, or it would be complicated to associate other
database objects to these using foreign key constraints.  Our solution
is thus:

create type label_type as enum ( 'foo', 'bar', 'baz'
);

create table T ( id uuid primary key, prototype uuid not null, label label_type not null, unique (id, label,
prototype),foreign key (prototype, label, prototype)   references T (id, label, prototype)   on update cascade 
);

create function set_T_id() returns trigger language 'plpgsql' as $$ begin   if new.id is null then     new.id :=
uuid_generate_v4();  end if;   if new.prototype is null then     new.prototype := new.id;   end if;   return new; end; 
$$;

create trigger set_T_id_trigger before insert on T
for each row execute procedure set_T_id();

The prototype field serves as the association between an object and
the one from whom it has been derived.  Objects of the primitive
flavor are represented by rows for which id = prototype (in this
sense, being primitive means they are their own prototype); objects of
the derived flavor are represented by rows for which id <> prototype,
and for such derived objects, the associated primitive object is
represented by another row whose id is the value of the prototype
attribute.  The trigger allows us to easily insert either primitive or
derived objects: primitive objects are inserted by not specifying a
prototype identifier, and derived objects specify the identifier of
their prototype; both get their own identifiers generated
automatically if it was not specified.

This is still a somewhat simplified example as our actual T table has
several more columns, but in any case, many other tables have foreign
keys to T(id).  This matches up quite precisely with the semantics we
require for the concept represented by T.  However, the «on update
cascade» bit causes issues:

ERROR:  multiple assignments to same column "prototype"
CONTEXT:  SQL statement "UPDATE ONLY "T" SET "prototype" = $1, "label"
= $2, "prototype" = $3 WHERE $4 OPERATOR(pg_catalog.=) "prototype" AND
$5::pg_catalog.anyenum OPERATOR(pg_catalog.=)
"type"::pg_catalog.anyenum AND $6 OPERATOR(pg_catalog.=) "prototype""
SQL statement "update "T"     set "prototype" = "prototype id parameter"     where "id" = "T id variable""
PL/pgSQL function "create derived T"(uuid) line […] at SQL statement

Indeed, we have worked around this for now by removing «on update
cascade», and we could further avoid this by not performing this
update or by implementing the logic for this update in a custom
trigger.  This isn’t a show-stopper for us.  The only reason we
perform that update in the «create derived T» function is that we have
another similar «create primitive T» function, and creating a derived
T must first perform the same steps to create a primitive T (in terms
of other features of our schema not present in this simplified
version), so we simply call one function from the other, and at the
end, make the new T derived.  In any case, this is not really relevant
to the crux of the issue; it’s just context for that specific context
for the error and we can easily work around it.

> It would be difficult to support that, AFAICS, because the FK enforcement
> queries do not depend on things like whether the specific values involved
> are equal.  Nor would one generally expect that an FK constraint should be
> allowed to restrict what can be stored in the referenced table.

Not generally, but this can certainly happen with the combination of
FK and CHECK.  I don’t mean to imply this is a reasonable way to
design a schema, only that it’s possible. :)

> Moreover, it's not even real clear to me that there is a well defined
> notion of "equal" in such a situation.  If x and y are of different types,
> it's possible they would have different equality rules.  (This is somewhat
> constrained by the fact that z would have to have equality operators in
> common with both; but I do not think that's quite sufficient to remove the
> concern.)
>
> It could be argued that we should hew strictly to the letter of the SQL
> standard and forbid foreign-key constraints with duplicate referencing
> columns.  Or maybe allow it only if the FK action is not one where we'd
> run into this problem.  I'm not terribly excited about trying to support
> the case, though.

Indeed, I had not considered issues with missing equality operators,
and looking at the logic in ATAddForeignKeyConstraint and
RI_FKey_cascade_upd, I see that it’s a problem.  It could certainly be
supported by checking for appropriate operators or even cheaply
restricting this to all the types matching up exactly.

As this is all sadly nonstandard, I understand if there’s no intent to
change the current state of things.  It would certainly help to
clarify these things in the documentation, but that would require
deciding if and how to support such nonstandard foreign key
constraints.