Обсуждение: [PROPOSAL] ON DELETE SET NULL () for Foreign Key Constraints

Поиск
Список
Период
Сортировка

[PROPOSAL] ON DELETE SET NULL () for Foreign Key Constraints

От
Paul Martinez
Дата:
Hello!

I have a proposal for a feature to add to Postgres. I believe it is a natural
extension to the current standard SQL ON DELETE SET NULL behavior when using
composite foreign keys. The basic idea is that you can specify which columns to
set to NULL in the DELETE trigger created by a foreign key constraint. (There
could be similar support for UPDATE triggers and SET DEFAULT actions.)


PROBLEM:

Consider the following basic schema:

CREATE TABLE users (id primary key);
CREATE TABLE posts (
    id primary key,
    content text,
    author_id int REFERENCES users(id) ON DELETE SET NULL
);

When a user is deleted all of their posts will have the author_id field set to
NULL.

Consider the same schema in a multi-tenant application. In a multi-tenant
application a tenant_id column will frequently be denormalized across every
table. In certain cases these tables will actually use the tenant_id as part of
a composite primary key. (This is the approach recommended by Citus.)

So in a multi-tenant architecture, our schema may look like this:

CREATE TABLE tenants (id primary key);
CREATE TABLE users (
    tenant_id int,
    id int,
    PRIMARY KEY (tenant_id, id)
);
CREATE TABLE posts (
    tenant_id int,
    id int,
    content text,
    author_id int,
    PRIMARY KEY (tenant_id, id),
    FOREIGN KEY (tenant_id, author_id)
        REFERENCES users(tenant_id, id) ON DELETE SET NULL
);

In this situation we can no longer delete a user that has created a post
because Postgres would try to set both the author_id AND the tenant_id columns
of a corresponding post to NULL, and tenant_id is not nullable.

INSERT INTO tenants VALUES (1);
INSERT INTO users VALUES (1, 101);
INSERT INTO posts VALUES (1, 201, 'content', 101);
DELETE FROM users WHERE id = 101;
ERROR:  null value in column "tenant_id" violates not-null constraint
DETAIL:  Failing row contains (null, 201, content, null).


GENERAL SOLUTION:

My proposal is to allow specifying which columns to set null during the delete
trigger:

CREATE TABLE posts (
    ...,
    FOREIGN KEY (tenant_id, author_id)
        REFERENCES users(tenant_id, id) ON DELETE SET NULL (author_id)
)

If this feature is implemented the above DELETE statement would succeed, and
the record in posts would have the values (1, 201, 'content', NULL).

The grammar will be modified as follows:

table_constraint is:

  FOREIGN KEY ( column_name [, ... ] ) REFERENCES reftable [ (
refcolumn [, ... ] ) ]
      [ MATCH FULL | MATCH PARTIAL | MATCH SIMPLE ] [ ON DELETE action
] [ ON UPDATE action ] }

action is:

  NO ACTION |
  RESTRICT |
  CASCADE |
  SET DEFAULT [ ( defcolumn [, ...] ) ] |
  SET NULL [ ( nullcolumn [, ...] ) ]

This modification to the grammar makes it easy to support similary
functionality for UPDATE triggers and the SET DEFAULT action.

The columns that we want to set to null must be a subset of the columns that
make up the foreign key. From looking through the grammar it seems like this
syntax will also be supported in column constraints, and later validation could
ensure that only that constrained column appears in the column list.

To store this new information, a new field will need to be added to
pg_constraint to save which columns we're supposed to set null. Since a
constraint could have separate ON DELETE and ON UPDATE behavior, we'd need two
additional columns if both were supported.

Name            Type    References               Description
conkey          int2[]    pg_attribute.attnum   List of constrained columns
confkey         int2[]    pg_attribute.attnum   List of referenced columns
condeletecols   int2[]    pg_attribute.attnum   List of constrained
columns to set
                                                on deletion of referenced record
conupdatecols   int2[]    pg_attribute.attnum   List of constrained
columns to set
                                                on update of referenced record


POSSIBLE GENERALIZATIONS/EXTENSIONS:

The above solution solves the specific problem that I have been facing, but I
see two clear possibilities for how this functionality can be extended.

One, the set of columns that we set to null could be any set of columns in the
table with the constraint, rather than just a subset of columns that make up
the foreign key. This would be useful for columns that are only meaningful if
the foreign key is present. Imagine a timestamp column that indicates when the
foreign key was set and should be cleared when the foreign key is cleared. Or
maybe some columns have been denormalized from the referenced table and those
should be cleared.

While this extension is reasonable, it does add another layer of complexity to
the feature: what happens when one of those columns gets dropped from the
table? Should the entire foreign key be dropped? Should it error? Or should
that column simply be removed from the set of columns that are acted upon? If
the set of columns can only be a subset of the foreign key columns, then this
is already a solved problem, so that constraint will be enforced at least
for the initial implementation.

The second possible generalization is also having the same functionality for
ON UPDATE triggers and/or SET DEFAULT triggers:

ON DELETE SET NULL    (col [, ...] )
ON DELETE SET DEFAULT (col [, ...] )
ON UPDATE SET NULL    (col [, ...] )
ON UPDATE SET DEFAULT (col [, ...] )

While the potential use cases of the other three versions is not quite as
clear, from an engineering standpoint it might make more sense to support both
trigger types and both action types; supporting only one would likely result in
messier code.

Supporting both ON DELETE and ON UPDATE would require updating code in multiple
places in ri_triggers.c, but arguably it's better to keep those functions in
sync.



IMPLEMENTATION PLAN:

If the consensus is that this is a reasonable feature to implement, I'd love
to work on it over the next few weeks. I've made one tiny contribution to
Postgres, so I don't know everything that will need to be modified to complete
this feature. I'm also not sure what the preferred granularity of patches is
for something like this, but this is how I'm imagining breaking it down:

Update parse node and catalog type:

src/include/nodes/parsenodes.h
  - Extend Constraint struct
src/backend/nodes/copyfuncs.c
src/backend/nodes/equalfuncs.c
src/include/catalog/pg_constraint.h
# I'm not sure if I'd need to modify readfuncs.c?

Extend Parser and validate columns:

src/backend/parser/gram.y
  - Update key_action rule
src/backend/commands/tablecmds.c
  - Validate selected columns in ATAddForeignKeyConstraint

Update trigger definitions:

src/backend/utils/adt/ri_triggers.c


Other miscellaneous:

src/backend/utils/adt/ruleutils.c
  - Update pg_get_constraint_def

Documentation:

doc/src/sgml/ref/create_table.sgml
  - CREATE TABLE documentation
doc/src/sgml/catalogs.sgml
  - pg_constraint documentation

Other things:

I have no idea what I have to modify to support upgrading from an old version
of Postgres and handle the changes to the pg_constraint. If someone can point
me in the right direction that would be great.

I'm sure there are also a bunch of other things that'll I'll have to change
that I've missed. The above is the stuff I've found after digging through the
source code for a couple hours. Please let me know what else I'll have to do!



Let me know if this seems like a feature that would likely get merged. It's
something I'd really like to build! I'd welcome any feedback about the proposal,
and am happy to answer any questions.

- Paul


Re: [PROPOSAL] ON DELETE SET NULL () for Foreign Key Constraints

От
Tom Lane
Дата:
Paul Martinez <hellopfm@gmail.com> writes:
> I have a proposal for a feature to add to Postgres. I believe it is a natural
> extension to the current standard SQL ON DELETE SET NULL behavior when using
> composite foreign keys. The basic idea is that you can specify which columns to
> set to NULL in the DELETE trigger created by a foreign key constraint.

This seems like kind of a kluge, because it can only work in MATCH SIMPLE
mode, not MATCH FULL or MATCH PARTIAL.  (We don't have MATCH PARTIAL atm,
but it's in the spec so I imagine somebody will get around to implementing
it someday.  Anyway MATCH FULL is there now.)  In the latter two modes,
setting a subset of the referencing columns to null isn't sufficient to
make the row pass the constraint.

            regards, tom lane


Re: [PROPOSAL] ON DELETE SET NULL () for Foreign Key Constraints

От
Paul Martinez
Дата:
On Sat, Jan 19, 2019 at 5:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Paul Martinez <hellopfm@gmail.com> writes:
> > I have a proposal for a feature to add to Postgres. I believe it is a natural
> > extension to the current standard SQL ON DELETE SET NULL behavior when using
> > composite foreign keys. The basic idea is that you can specify which columns to
> > set to NULL in the DELETE trigger created by a foreign key constraint.
>
> This seems like kind of a kluge, because it can only work in MATCH SIMPLE
> mode, not MATCH FULL or MATCH PARTIAL.  (We don't have MATCH PARTIAL atm,
> but it's in the spec so I imagine somebody will get around to implementing
> it someday.  Anyway MATCH FULL is there now.)  In the latter two modes,
> setting a subset of the referencing columns to null isn't sufficient to
> make the row pass the constraint.

I don't see why this is an issue. Currently Postgres allows you to combine
the foreign key features in non-sensical ways. For example, you can create a
not nullable column that has ON DELETE SET NULL behavior:

CREATE TABLE foo (a int PRIMARY KEY);
CREATE TABLE bar (b int NOT NULL REFERENCES foo(a) ON DELETE SET NULL);
INSERT INTO foo VALUES (1);
INSERT INTO bar VALUES (1);
DELETE FROM foo WHERE a = 1;
ERROR:  null value in column "b" violates not-null constraint

The incongruence of MATCH FULL with this feature doesn't seem like a problem.
We could raise an error when MATCH FULL and a proper subset of the constrained
columns are supplied in the SET NULL clause if we were worried about users
mis-using the feature, but we don't raise an error for the similar logic error
above. I don't entirely understand the use-cases for MATCH PARTIAL, but it
doesn't seem like combining MATCH PARTIAL with this feature would be blatantly
wrong. This feature provides a real benefit when using MATCH SIMPLE, which is
the default behavior.

There are good reasons for using denormalizing tenant_ids in a multi-tenant
application beyond just performance. It actually improves referential integrity
because it makes mixing data between tenants impossible. Consider the following
example:

-- denormalized tenant_id, but simple primary keys
CREATE TABLE tenants (id int PRIMARY KEY);
CREATE TABLE users (tenant_id int REFERENCES tenants, id int PRIMARY KEY);
CREATE TABLE messages (
  tenant_id int REFERENCES tenants,
  id int PRIMARY KEY,
  from_id int REFERENCES users,
  to_id int REFERENCES users,
  content text
);
-- Create three tenants
INSERT INTO tenants VALUES (1), (2), (3);
-- Create users in tenants 1 and 2
INSERT INTO users VALUES (1, 101), (2, 102);
-- Create message in tenant 3 sent from user in tenant 1 to user in tenant 2
INSERT INTO messages VALUES (3, 201, 101, 102, 'poor referential integrity');

If you create the users and messages tables with composite primary keys the
last query will fail:

-- composite primary keys of the form (tenant_id, id)
CREATE TABLE cpk_users (
  tenant_id int REFERENCES tenants,
  id int,
  PRIMARY KEY (tenant_id, id)
);
CREATE TABLE cpk_messages (
  tenant_id int REFERENCES tenants,
  id int,
  from_id int,
  to_id int,
  content text,
  PRIMARY KEY (tenant_id, id),
  FOREIGN KEY (tenant_id, from_id) REFERENCES cpk_users,
  FOREIGN KEY (tenant_id, to_id) REFERENCES cpk_users
);
-- Create cpk_users in tenants 1 and 2
INSERT INTO cpk_users VALUES (1, 101), (2, 102);
-- Create cpk_message in tenant 3 sent from user in tenant 1 to user in tenant 2
INSERT INTO cpk_messages VALUES (3, 201, 101, 102, 'great referential
integrity');
ERROR:  insert or update on table "cpk_messages" violates foreign key
        constraint "cpk_messages_tenant_id_from_id_fkey"
DETAIL:  Key (tenant_id, from_id)=(3, 101) is not present in table "cpk_users".

So there are strong reasons in favor of using composite primary keys. Postgres
has great support for composite primary and foreign keys, but SET NULL behavior
that would have worked fine in the schema using simple primary keys no longer
works in the composite primary key schema. Users could manually implement
triggers to get the desired behavior (as I have done in the use-case that led
me to think of this feature), but it'd be great if switching to composite
primary keys didn't force the user to make compromises elsewhere.

- Paul

On Sat, Jan 19, 2019 at 5:12 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Paul Martinez <hellopfm@gmail.com> writes:
> > I have a proposal for a feature to add to Postgres. I believe it is a natural
> > extension to the current standard SQL ON DELETE SET NULL behavior when using
> > composite foreign keys. The basic idea is that you can specify which columns to
> > set to NULL in the DELETE trigger created by a foreign key constraint.
>
> This seems like kind of a kluge, because it can only work in MATCH SIMPLE
> mode, not MATCH FULL or MATCH PARTIAL.  (We don't have MATCH PARTIAL atm,
> but it's in the spec so I imagine somebody will get around to implementing
> it someday.  Anyway MATCH FULL is there now.)  In the latter two modes,
> setting a subset of the referencing columns to null isn't sufficient to
> make the row pass the constraint.
>
>                         regards, tom lane