Обсуждение: Allow foreign keys to reference a superset of unique columns

Поиск
Список
Период
Сортировка

Allow foreign keys to reference a superset of unique columns

От
Kaiting Chen
Дата:
Hi everyone:

I'd like to propose a change to PostgreSQL to allow the creation of a foreign
key constraint referencing a superset of uniquely constrained columns.

As it currently stands:

  CREATE TABLE foo (a integer PRIMARY KEY, b integer);
  CREATE TABLE bar (x integer, y integer,
    FOREIGN KEY (x, y) REFERENCES foo(a, b));
  > ERROR:  there is no unique constraint matching given keys for referenced
    table "foo"

Despite the fact that in "foo", the combination of columns (a, b) is guaranteed
to be unique by virtue of being a superset of the primary key (a).

This capability has been requested before, at least once in 2004 [1] and again
in 2021 [2].

To illustrate when it'd be useful to define such a foreign key constraint,
consider a database that will store graphs (consisting of nodes and edges) where
graphs are discrete and intergraph edges are prohibited:

  CREATE TABLE graph (
    id INTEGER PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY
  );

  CREATE TABLE node (
    id INTEGER PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
    graph_id INTEGER NOT NULL REFERENCES graph(id)
  );

  CREATE TABLE edge (
    graph_id INTEGER,
    source_id INTEGER,
    FOREIGN KEY (source_id, graph_id) REFERENCES node(id, graph_id),

    target_id INTEGER,
    FOREIGN KEY (target_id, graph_id) REFERENCES node(id, graph_id),

    PRIMARY KEY (graph_id, source_id, target_id)
  );

This schema is unsupported by PostgreSQL absent this constraint:

  ALTER TABLE node ADD UNIQUE (id, graph_id);

However, this constraint, as well as its underlying unique index, is superfluous
as node(id) itself is unique. Its addition serves no semantic purpose but incurs
cost of additional on disk storage and update time. Note the prohibition of
intergraph edges isn't enforceable on "edge" without the composite foreign keys
(or triggers).

An alternative approach is to redefine node's PRIMARY KEY as (id, graph_id).
However, this would force every table referring to "node" to duplicate both
columns into their schema, even when a singular "node_id" would suffice. This is
undesirable if there are many tables referring to "node" that have no such
intergraph restrictions and few that do.

While it can be argued that this schema contains some degree of denormalization,
it isn't uncommon and a recent patch was merged to support exactly this kind of
design [3].

In that case, the SET NULL and SET DEFAULT referential actions gained support
for an explicit column list to accomodate this type of design.

A problem evinced by Tom Lane in the 2004 discussion on this was that, were this
permitted, the index supporting a foreign key constraint could be ambiguous:

  > I think one reason for this is that otherwise it's not clear which
  > unique constraint the FK constraint depends on.  Consider
  >
  >         create table a (f1 int unique, f2 int unique);
  >
  >         create table b (f1 int, f2 int,
  >                         foreign key (f1,f2) references a(f1,f2));
  >
  > How would you decide which constraint to make the FK depend on?
  > It'd be purely arbitrary.

I propose that this problem is addressed, with an extension to the SQL standard,
as follows in the definition of a foreign key constraint:

1. Change the restriction on the referenced columns in a foreign key constraint
   to:

   The referenced columns must be a *superset* (the same, or a strict superset)
   of the columns of a non-deferrable unique or primary key index on the
   referenced table.

2. The FOREIGN KEY constraint syntax gains a [ USING INDEX index_name ] clause
   optionally following the referenced column list.

   The index specified by this clause is used to support the foreign key
   constraint, and it must be a non-deferrable unique or primary key index on
   the referenced table compatible with the referenced columns.

   Here, compatible means that the columns of the index are a subset (the same,
   or a strict subset) of the referenced columns.

3. If the referenced columns are the same as the columns of such a unique (or
   primary key) index on the referenced table, then the behavior in PostgreSQL
   doesn't change.

4. If the referenced columns are a strict superset of the columns of such an
   index on the referenced table, then:
   1. If the primary key of the referenced table is a strict subset of the
      referenced columns, then its index is used to support the foreign key if
      no USING INDEX clause is present.
   2. Otherwise, the USING INDEX clause is required.

I believe that this scheme is unambiguous and should stably round trip a dump
and restore. In my previous example, the foreign key contraints could then be
defined as:

    FOREIGN KEY (source_id, graph_id) REFERENCES node(id, graph_id),
    FOREIGN KEY (target_id, graph_id) REFERENCES node(id, graph_id),

Or alternatively:

    FOREIGN KEY (source_id, graph_id) REFERENCES node(id, graph_id)
      USING INDEX node_pkey,
    FOREIGN KEY (target_id, graph_id) REFERENCES node(id, graph_id)
      USING INDEX node_pkey,

Also, the addition of a USING INDEX clause may be useful in its own right. It's
already possible to create multiple unique indexes on a table, with the same set
of columns, but differing storage parameters of index tablespaces. In this
situation, when multiple indexes can support a foreign key constraint, which
index is chosen appears to be determined in OID order. This clause would allow a
user to unambiguously specify which index to use.

I've attached a first draft patch that implements what I've described and would
love some feedback. For all referenced columns that aren't covered by the chosen
index, it chooses the opclass to use by finding the default opclass for the
column's type for the same access method as the chosen index. (Would it be
useful to allow the specification of opclasses in the referenced column list of
a foreign key constraint, similar to the column list in CREATE INDEX?)

Also, in pg_get_constraintdef_worker(), it adds a USING INDEX clause only when a
foreign key constraint is supported by an index that:

1. Isn't a primary key, and
2. Has a different number of key columns than the number of constrained columns

I *think* that this should ensure that a USING INDEX clause is added only when
required.

[1]
https://www.postgresql.org/message-id/flat/CAF%2B2_SGhbc6yGUoNFjDOgjq1VpKpE5WZfOo0M%2BUwcPH%3DmddNMg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/1092734724.2627.4.camel%40dicaprio.akademie1.de
[3]
https://www.postgresql.org/message-id/flat/CACqFVBZQyMYJV%3DnjbSMxf%2BrbDHpx%3DW%3DB7AEaMKn8dWn9OZJY7w%40mail.gmail.com

Вложения

Re: Allow foreign keys to reference a superset of unique columns

От
Peter Eisentraut
Дата:
On 07.06.22 20:59, Kaiting Chen wrote:
> 2. The FOREIGN KEY constraint syntax gains a [ USING INDEX index_name ] clause
>     optionally following the referenced column list.
> 
>     The index specified by this clause is used to support the foreign key
>     constraint, and it must be a non-deferrable unique or primary key index on
>     the referenced table compatible with the referenced columns.
> 
>     Here, compatible means that the columns of the index are a subset (the same,
>     or a strict subset) of the referenced columns.

I think this should be referring to constraint name, not an index name.



Re: Allow foreign keys to reference a superset of unique columns

От
David Rowley
Дата:
On Fri, 10 Jun 2022 at 15:08, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> On 07.06.22 20:59, Kaiting Chen wrote:
> > 2. The FOREIGN KEY constraint syntax gains a [ USING INDEX index_name ] clause
> >     optionally following the referenced column list.
> >
> >     The index specified by this clause is used to support the foreign key
> >     constraint, and it must be a non-deferrable unique or primary key index on
> >     the referenced table compatible with the referenced columns.
> >
> >     Here, compatible means that the columns of the index are a subset (the same,
> >     or a strict subset) of the referenced columns.
>
> I think this should be referring to constraint name, not an index name.

Can you explain why you think that?

My thoughts are that it should be an index name. I'm basing that on
the fact that transformFkeyCheckAttrs() look for valid unique indexes
rather than constraints.  The referenced table does not need any
primary key or unique constraints to be referenced by a foreign key.
It just needs a unique index matching the referencing columns.

It would seem very strange to me if we required a unique or primary
key constraint to exist only when this new syntax is being used. Maybe
I'm missing something though?

David



Re: Allow foreign keys to reference a superset of unique columns

От
Peter Eisentraut
Дата:
On 10.06.22 05:47, David Rowley wrote:
>> I think this should be referring to constraint name, not an index name.
> Can you explain why you think that?

If you wanted to specify this feature in the SQL standard (I'm not 
proposing that, but it seems plausible), then you need to deal in terms 
of constraints, not indexes.  Maybe referring to an index directly could 
be a backup option if desired, but I don't see why that would be 
necessary, since you can easily create a real constraint on top of an index.



Re: Allow foreign keys to reference a superset of unique columns

От
David Rowley
Дата:
On Fri, 10 Jun 2022 at 16:14, Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> On 10.06.22 05:47, David Rowley wrote:
> >> I think this should be referring to constraint name, not an index name.
> > Can you explain why you think that?
>
> If you wanted to specify this feature in the SQL standard (I'm not
> proposing that, but it seems plausible), then you need to deal in terms
> of constraints, not indexes.  Maybe referring to an index directly could
> be a backup option if desired, but I don't see why that would be
> necessary, since you can easily create a real constraint on top of an index.

That's a good point, but, if we invented syntax for specifying a
constraint name, would that not increase the likelihood that we'd end
up with something that conflicts with some future extension to the SQL
standard?

We already have USING INDEX as an extension to ADD CONSTRAINT.

David



Re: Allow foreign keys to reference a superset of unique columns

От
Kaiting Chen
Дата:
On Fri, Jun 10, 2022 at 12:14 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> On 10.06.22 05:47, David Rowley wrote:
> >> I think this should be referring to constraint name, not an index name.
> > Can you explain why you think that?
>
> If you wanted to specify this feature in the SQL standard (I'm not
> proposing that, but it seems plausible), then you need to deal in terms
> of constraints, not indexes.  Maybe referring to an index directly could
> be a backup option if desired, but I don't see why that would be
> necessary, since you can easily create a real constraint on top of an index.

I think that there's a subtle difference between specifying a
constraint or an index in that the ALTER TABLE ADD CONSTRAINT USING
INDEX command prohibits the creation of a constraint using an index
where the key columns are associated with non default opclasses. As
far as I can tell, a foreign key constraint *can* pick an index with
non default opclasses. So specifying a constraint name in the FOREIGN
KEY syntax would result in a strange situation where the foreign key
constraint could implicitly pick a supporting index with non default
opclasses to use, but there'd be no way to explicitly specify that
index.



Re: Allow foreign keys to reference a superset of unique columns

От
Tom Lane
Дата:
Kaiting Chen <ktchen14@gmail.com> writes:
> I'd like to propose a change to PostgreSQL to allow the creation of a foreign
> key constraint referencing a superset of uniquely constrained columns.

TBH, I think this is a fundamentally bad idea and should be rejected
outright.  It fuzzes the semantics of the FK relationship, and I'm
not convinced that there are legitimate use-cases.  Your example
schema could easily be dismissed as bad design that should be done
some other way.

For one example of where the semantics get fuzzy, it's not
very clear how the extra-baggage columns ought to participate in
CASCADE updates.  Currently, if we have
   CREATE TABLE foo (a integer PRIMARY KEY, b integer);
then an update that changes only foo.b doesn't need to update
referencing tables, and I think we even have optimizations that
assume that if no unique-key columns are touched then RI checks
need not be made.  But if you did
   CREATE TABLE bar (x integer, y integer,
                     FOREIGN KEY (x, y) REFERENCES foo(a, b) ON UPDATE CASCADE);
then perhaps you expect bar.y to be updated ... or maybe you don't?

Another example is that I think the idea is only well-defined when
the subset column(s) are a primary key, or at least all marked NOT NULL.
Otherwise they're not as unique as you're claiming.  But then the FK
constraint really has to be dependent on a PK constraint not just an
index definition, since indexes in themselves don't enforce not-nullness.
That gets back to Peter's complaint that referring to an index isn't
good enough.

Anyway, seeing that the patch touches neither ri_triggers.c nor any
regression tests, I find it hard to believe that such semantic
questions have been thought through.

It's also unclear to me how this ought to interact with the
information_schema views concerning foreign keys.  We generally
feel that we don't want to present any non-SQL-compatible data
in information_schema, for fear that it will confuse applications
that expect to see SQL-spec behavior there.  So do we leave such
FKs out of the views altogether, or show only the columns involving
the associated unique constraint?  Neither answer seems pleasant.

FWIW, the patch is currently failing to apply per the cfbot.
I think you don't need to manually update backend/nodes/ anymore,
but the gram.y changes look to have been sideswiped by some
other recent commit.

            regards, tom lane



Re: Allow foreign keys to reference a superset of unique columns

От
Wolfgang Walther
Дата:
Kaiting Chen:
> I'd like to propose a change to PostgreSQL to allow the creation of a foreign
> key constraint referencing a superset of uniquely constrained columns.

+1

Tom Lane:
> TBH, I think this is a fundamentally bad idea and should be rejected
> outright.  It fuzzes the semantics of the FK relationship, and I'm
> not convinced that there are legitimate use-cases.  Your example
> schema could easily be dismissed as bad design that should be done
> some other way.

I had to add quite a few unique constraints on a superset of already 
uniquely constrained columns in the past, just to be able to support FKs 
to those columns. I think those cases most often come up when dealing 
with slightly denormalized schemas, e.g. for efficiency.

One other use-case I had recently, was along the followling lines, in 
abstract terms:

CREATE TABLE classes (class INT PRIMARY KEY, ...);

CREATE TABLE instances (
   instance INT PRIMARY KEY,
   class INT REFERENCES classes,
   ...
);

Think about classes and instances as in OOP. So the table classes 
contains some definitions for different types of object and the table 
instances realizes them into concrete objects.

Now, assume you have some property of a class than is best modeled as a 
table like this:

CREATE TABLE classes_prop (
   property INT PRIMARY KEY,
   class INT REFERNECES classes,
   ...
);

Now, assume you need to store data for each of those classes_prop rows 
for each instance. You'd do the following:

CREATE TABLE instances_prop (
   instance INT REFERENCES instances,
   property INT REFERENCES classes_prop,
   ...
);

However, this does not ensure that the instance and the property you're 
referencing in instances_prop are actually from the same class, so you 
add a class column:

CREATE TABLE instances_prop (
   instance INT,
   class INT,
   property INT,
   FOREIGN KEY (instance, class) REFERENCES instances,
   FOREIGN KEY (property, class) REFERENCES classes_prop,
   ...
);

But this won't work, without creating some UNIQUE constraints on those 
supersets of the PK column first.

> For one example of where the semantics get fuzzy, it's not
> very clear how the extra-baggage columns ought to participate in
> CASCADE updates.  Currently, if we have
>     CREATE TABLE foo (a integer PRIMARY KEY, b integer);
> then an update that changes only foo.b doesn't need to update
> referencing tables, and I think we even have optimizations that
> assume that if no unique-key columns are touched then RI checks
> need not be made.  But if you did
>     CREATE TABLE bar (x integer, y integer,
>                       FOREIGN KEY (x, y) REFERENCES foo(a, b) ON UPDATE CASCADE);
> then perhaps you expect bar.y to be updated ... or maybe you don't?

In all use-cases I had so far, I would expect bar.y to be updated, too.

I think it would not even be possible to NOT update bar.y, because the 
FK would then not match anymore. foo.a is the PK, so the value in bar.x 
already forces bar.y to be the same as foo.b at all times.

bar.y is a little bit like a generated value in that sense, it should 
always match foo.b. I think it would be great, if we could actually go a 
step further, too: On an update to bar.x to a new value, if foo.a=bar.x 
exists, I would like to set bar.y automatically to the new foo.b. 
Otherwise those kind of updates always have to either query foo before, 
or add a trigger to do the same.

In the classes/instances example above, when updating 
instances_prop.property to a new value, instances_prop.class would be 
updated automatically to match classes_prop.class. This would fail, when 
the class is different than the class required by the FK to instances, 
though, providing exactly the safe-guard that this constraint was 
supposed to provide, without incurring additional overhead in update 
statements.

In the foo/bar example above, which is just a bit of denormalization, 
this automatic update would also be helpful - because rejecting the 
update on the grounds that the columns don't match doesn't make sense here.

> Another example is that I think the idea is only well-defined when
> the subset column(s) are a primary key, or at least all marked NOT NULL.
> Otherwise they're not as unique as you're claiming.

I fail to see why. My understanding is that rows with NULL values in the 
referenced table can't participate in FK matches anyway, because both 
MATCH SIMPLE and MATCH FULL wouldn't require a match when any/all of the 
columns in the referencing table are NULL. MATCH PARTIAL is not 
implemented, so I can't tell whether the semantics would be different there.

I'm not sure whether a FK on a superset of unique columns would be 
useful with MATCH SIMPLE. Maybe it could be forced to be MATCH FULL, if 
MATCH SIMPLE is indeed not well-defined.

> It's also unclear to me how this ought to interact with the
> information_schema views concerning foreign keys.  We generally
> feel that we don't want to present any non-SQL-compatible data
> in information_schema, for fear that it will confuse applications
> that expect to see SQL-spec behavior there.  So do we leave such
> FKs out of the views altogether, or show only the columns involving
> the associated unique constraint?  Neither answer seems pleasant.

Instead of tweaking FKs, maybe it would be possible to define a UNIQUE 
constraint re-using an existing index that guarantees uniqueness on a 
subset of columns already? This would allow to create those FK 
relationships by creating another unique constraint - without the 
overhead of creating yet another index.

So roughly something like this:

ALTER TABLE foo ADD UNIQUE (a, b) USING INDEX foo_pk;

This should give a consistent output for information_schema views?

Best

Wolfgang



Re: Allow foreign keys to reference a superset of unique columns

От
James Coleman
Дата:
On Fri, Sep 2, 2022 at 5:42 AM Wolfgang Walther <walther@technowledgy.de> wrote:
>
> Kaiting Chen:
> > I'd like to propose a change to PostgreSQL to allow the creation of a foreign
> > key constraint referencing a superset of uniquely constrained columns.
>
> +1
>
> Tom Lane:
> > TBH, I think this is a fundamentally bad idea and should be rejected
> > outright.  It fuzzes the semantics of the FK relationship, and I'm
> > not convinced that there are legitimate use-cases.  Your example
> > schema could easily be dismissed as bad design that should be done
> > some other way.
>
> I had to add quite a few unique constraints on a superset of already
> uniquely constrained columns in the past, just to be able to support FKs
> to those columns. I think those cases most often come up when dealing
> with slightly denormalized schemas, e.g. for efficiency.
>
> One other use-case I had recently, was along the followling lines, in
> abstract terms:
>
> CREATE TABLE classes (class INT PRIMARY KEY, ...);
>
> CREATE TABLE instances (
>    instance INT PRIMARY KEY,
>    class INT REFERENCES classes,
>    ...
> );
>
> Think about classes and instances as in OOP. So the table classes
> contains some definitions for different types of object and the table
> instances realizes them into concrete objects.
>
> Now, assume you have some property of a class than is best modeled as a
> table like this:
>
> CREATE TABLE classes_prop (
>    property INT PRIMARY KEY,
>    class INT REFERNECES classes,
>    ...
> );
>
> Now, assume you need to store data for each of those classes_prop rows
> for each instance. You'd do the following:
>
> CREATE TABLE instances_prop (
>    instance INT REFERENCES instances,
>    property INT REFERENCES classes_prop,
>    ...
> );
>
> However, this does not ensure that the instance and the property you're
> referencing in instances_prop are actually from the same class, so you
> add a class column:
>
> CREATE TABLE instances_prop (
>    instance INT,
>    class INT,
>    property INT,
>    FOREIGN KEY (instance, class) REFERENCES instances,
>    FOREIGN KEY (property, class) REFERENCES classes_prop,
>    ...
> );
>
> But this won't work, without creating some UNIQUE constraints on those
> supersets of the PK column first.

If I'm following properly this sounds like an overengineered EAV
schema, and neither of those things inspires me to think "this is a
use case I want to support".

That being said, I know that sometimes examples that have been
abstracted enough to share aren't always the best, so perhaps there's
something underlying this that's a more valuable example.

> > For one example of where the semantics get fuzzy, it's not
> > very clear how the extra-baggage columns ought to participate in
> > CASCADE updates.  Currently, if we have
> >     CREATE TABLE foo (a integer PRIMARY KEY, b integer);
> > then an update that changes only foo.b doesn't need to update
> > referencing tables, and I think we even have optimizations that
> > assume that if no unique-key columns are touched then RI checks
> > need not be made.  But if you did
> >     CREATE TABLE bar (x integer, y integer,
> >                       FOREIGN KEY (x, y) REFERENCES foo(a, b) ON UPDATE CASCADE);
> > then perhaps you expect bar.y to be updated ... or maybe you don't?
>
> In all use-cases I had so far, I would expect bar.y to be updated, too.
>
> I think it would not even be possible to NOT update bar.y, because the
> FK would then not match anymore. foo.a is the PK, so the value in bar.x
> already forces bar.y to be the same as foo.b at all times.
>
> bar.y is a little bit like a generated value in that sense, it should
> always match foo.b. I think it would be great, if we could actually go a
> step further, too: On an update to bar.x to a new value, if foo.a=bar.x
> exists, I would like to set bar.y automatically to the new foo.b.
> Otherwise those kind of updates always have to either query foo before,
> or add a trigger to do the same.

Isn't this actually contradictory to the behavior you currently have
with a multi-column foreign key? In the example above then an update
to bar.x is going to update the rows in foo that match bar.x = foo.a
and bar.y = foo.b *using the old values of bar.x and bar.y* to be the
new values. You seem to be suggesting that instead it should look for
other rows that already match the *new value* of only one of the
columns in the constraint. If I'm understanding the example correctly,
that seems like a *very* bad idea.

James Coleman



Re: Allow foreign keys to reference a superset of unique columns

От
Wolfgang Walther
Дата:
James Coleman:
> If I'm following properly this sounds like an overengineered EAV
> schema, and neither of those things inspires me to think "this is a
> use case I want to support".
> 
> That being said, I know that sometimes examples that have been
> abstracted enough to share aren't always the best, so perhaps there's
> something underlying this that's a more valuable example.

Most my use-cases are slightly denormalized and I was looking for an 
example that didn't require those kind of FKS only because of the 
denormalization. So that's why it might have been a bit artifical or 
abstracted too much.

Take another example: I deal with multi-tenancy systems, for which I 
want to use RLS to separate the data between tenants:

CREATE TABLE tenants (tenant INT PRIMARY KEY);

Each tenant can create multiple users and groups:

CREATE TABLE users (
   "user" INT PRIMARY KEY,
   tenant INT NOT NULL REFERENCES tenants
);

CREATE TABLLE groups (
   "group" INT PRIMARY KEY,
   tenant INT NOT NULL REFERENCES tenants
);

Users can be members of groups. The simple approach would be:

CREATE TABLE members (
   PRIMARY KEY ("user", "group"),
   "user" INT REFERENCES users,
   "group" INT REFERENCES groups
);

But this falls short in two aspects:
- To make RLS policies simpler to write and quicker to execute, I want 
to add "tenant" columns to **all** other tables. A slightly denormalized 
schema for efficiency.
- The schema above does not ensure that users can only be members in 
groups of the same tenant. Our business model requires to separate 
tenants cleanly, but as written above, cross-tenant memberships would be 
allowed.

In comes the "tenant" column which solves both of this:

CREATE TABLE members (
   PRIMARY KEY ("user", "group"),
   tenant INT REFERENCES tenants,
   "user" INT,
   "group" INT,
   FOREIGN KEY ("user", tenant) REFERENCES users ("user", tenant),
   FOREIGN KEY ("group", tenant) REFERENCES groups ("group", tenant)
);

This is not possible to do right now, without adding more UNIQUE 
constraints to the users and groups tables - on a superset of already 
unique columns.

>> bar.y is a little bit like a generated value in that sense, it should
>> always match foo.b. I think it would be great, if we could actually go a
>> step further, too: On an update to bar.x to a new value, if foo.a=bar.x
>> exists, I would like to set bar.y automatically to the new foo.b.
>> Otherwise those kind of updates always have to either query foo before,
>> or add a trigger to do the same.
> 
> Isn't this actually contradictory to the behavior you currently have
> with a multi-column foreign key? In the example above then an update
> to bar.x is going to update the rows in foo that match bar.x = foo.a
> and bar.y = foo.b *using the old values of bar.x and bar.y* to be the
> new values.

No, I think there was a misunderstanding. An update to bar should not 
update rows in foo. An update to bar.x should update bar.y implicitly, 
to match the new value of foo.b.

> You seem to be suggesting that instead it should look for
> other rows that already match the *new value* of only one of the
> columns in the constraint.

Yes. I think basically what I'm suggesting is, that for an FK to a 
superset of unique columns, all the FK-logic should still be done on the 
already unique set of columns only - and then the additional columns 
should be mirrored into the referencing table. The referencing table can 
then put additional constraints on this column. In the members example 
above, this additional constraint is the fact that the tenant column 
can't be filled with two different values for the users and groups FKs. 
But this could also be a CHECK constraint to allow FKs only to a subset 
of rows in the target table:

CREATE TYPE foo_type AS ENUM ('A', 'B', 'C');

CREATE TABLE foo (
   f INT PRIMARY KEY,
   type foo_type
);

CREATE TABLE bar (
   b INT PRIMARY KEY,
   f INT,
   ftype foo_type CHECK (ftype <> 'C'),
   FOREIGN KEY (f, ftype) REFERENCES foo (f, type);
);

In this example, the additional ftype column is just used to enforce 
that bar can only reference rows with type A or B, but not C. Assume:

INSERT INTO foo VALUES (1, 'A'), (2, 'B'), (3, 'C');

In this case, it would be nice to be able to do the following, i.e. 
derive the value for bar.ftype automatically:

INSERT INTO bar (b, f) VALUES (10, 1); -- bar.ftype is then 'A'
UPDATE bar SET f = 2 WHERE b = 10; -- bar.ftype is then 'B'

And it would throw errors in the following cases, because the 
automatically derived value fails the CHECK constraint:

INSERT INTO bar (b, f) VALUES (20, 3);
UPDATE bar SET f = 3 WHERE b = 10;

Note: This "automatically derived columns" extension would be a separate 
feature. Really nice to have, but the above mentioned FKs to supersets 
of unique columns would be very valuable without it already.

Best

Wolfgang



Re: Allow foreign keys to reference a superset of unique columns

От
James Coleman
Дата:
On Sun, Sep 25, 2022 at 4:49 AM Wolfgang Walther
<walther@technowledgy.de> wrote:
>
> James Coleman:
> > If I'm following properly this sounds like an overengineered EAV
> > schema, and neither of those things inspires me to think "this is a
> > use case I want to support".
> >
> > That being said, I know that sometimes examples that have been
> > abstracted enough to share aren't always the best, so perhaps there's
> > something underlying this that's a more valuable example.
>
> Most my use-cases are slightly denormalized and I was looking for an
> example that didn't require those kind of FKS only because of the
> denormalization. So that's why it might have been a bit artifical or
> abstracted too much.
>
> Take another example: I deal with multi-tenancy systems, for which I
> want to use RLS to separate the data between tenants:
>
> CREATE TABLE tenants (tenant INT PRIMARY KEY);
>
> Each tenant can create multiple users and groups:
>
> CREATE TABLE users (
>    "user" INT PRIMARY KEY,
>    tenant INT NOT NULL REFERENCES tenants
> );
>
> CREATE TABLLE groups (
>    "group" INT PRIMARY KEY,
>    tenant INT NOT NULL REFERENCES tenants
> );
>
> Users can be members of groups. The simple approach would be:
>
> CREATE TABLE members (
>    PRIMARY KEY ("user", "group"),
>    "user" INT REFERENCES users,
>    "group" INT REFERENCES groups
> );
>
> But this falls short in two aspects:
> - To make RLS policies simpler to write and quicker to execute, I want
> to add "tenant" columns to **all** other tables. A slightly denormalized
> schema for efficiency.
> - The schema above does not ensure that users can only be members in
> groups of the same tenant. Our business model requires to separate
> tenants cleanly, but as written above, cross-tenant memberships would be
> allowed.
>
> In comes the "tenant" column which solves both of this:
>
> CREATE TABLE members (
>    PRIMARY KEY ("user", "group"),
>    tenant INT REFERENCES tenants,
>    "user" INT,
>    "group" INT,
>    FOREIGN KEY ("user", tenant) REFERENCES users ("user", tenant),
>    FOREIGN KEY ("group", tenant) REFERENCES groups ("group", tenant)
> );
>
> This is not possible to do right now, without adding more UNIQUE
> constraints to the users and groups tables - on a superset of already
> unique columns.

Thanks, that's a more interesting use case IMO (and doesn't smell in
the way the other did).

> >> bar.y is a little bit like a generated value in that sense, it should
> >> always match foo.b. I think it would be great, if we could actually go a
> >> step further, too: On an update to bar.x to a new value, if foo.a=bar.x
> >> exists, I would like to set bar.y automatically to the new foo.b.
> >> Otherwise those kind of updates always have to either query foo before,
> >> or add a trigger to do the same.
> >
> > Isn't this actually contradictory to the behavior you currently have
> > with a multi-column foreign key? In the example above then an update
> > to bar.x is going to update the rows in foo that match bar.x = foo.a
> > and bar.y = foo.b *using the old values of bar.x and bar.y* to be the
> > new values.
>
> No, I think there was a misunderstanding. An update to bar should not
> update rows in foo. An update to bar.x should update bar.y implicitly,
> to match the new value of foo.b.
>
> > You seem to be suggesting that instead it should look for
> > other rows that already match the *new value* of only one of the
> > columns in the constraint.
>
> Yes. I think basically what I'm suggesting is, that for an FK to a
> superset of unique columns, all the FK-logic should still be done on the
> already unique set of columns only - and then the additional columns
> should be mirrored into the referencing table. The referencing table can
> then put additional constraints on this column. In the members example
> above, this additional constraint is the fact that the tenant column
> can't be filled with two different values for the users and groups FKs.

If we have a declared constraint on x,y where x is unique based on an
index including on x I do not think we should have that fk constraint
work differently than a constraint on x,y where there is a unique
index on x,y. That would seem to be incredibly confusing behavior
(even if it would be useful for some specific use case).

> But this could also be a CHECK constraint to allow FKs only to a subset
> of rows in the target table:

Are you suggesting a check constraint that queries another table?
Because check constraints are not supposed to do that (I assume this
is technically possible to declare via a function, just like it is
technically possible to do in a functional index, but like in the
index case it's a bad idea since it's not actually guaranteed).

> CREATE TYPE foo_type AS ENUM ('A', 'B', 'C');
>
> CREATE TABLE foo (
>    f INT PRIMARY KEY,
>    type foo_type
> );
>
> CREATE TABLE bar (
>    b INT PRIMARY KEY,
>    f INT,
>    ftype foo_type CHECK (ftype <> 'C'),
>    FOREIGN KEY (f, ftype) REFERENCES foo (f, type);
> );
>
> In this example, the additional ftype column is just used to enforce
> that bar can only reference rows with type A or B, but not C. Assume:
>
> INSERT INTO foo VALUES (1, 'A'), (2, 'B'), (3, 'C');
>
> In this case, it would be nice to be able to do the following, i.e.
> derive the value for bar.ftype automatically:
>
> INSERT INTO bar (b, f) VALUES (10, 1); -- bar.ftype is then 'A'
> UPDATE bar SET f = 2 WHERE b = 10; -- bar.ftype is then 'B'
>
> And it would throw errors in the following cases, because the
> automatically derived value fails the CHECK constraint:
>
> INSERT INTO bar (b, f) VALUES (20, 3);
> UPDATE bar SET f = 3 WHERE b = 10;
>
> Note: This "automatically derived columns" extension would be a separate
> feature. Really nice to have, but the above mentioned FKs to supersets
> of unique columns would be very valuable without it already.

This "derive the value automatically" is not what foreign key
constraints do right now at all, right? And if fact it's contradictory
to existing behavior, no?

This part just seems like a very bad idea. Unless I'm misunderstanding
I think we should reject this part of the proposals on this thread as
something we would not even consider implementing.

James Coleman



Re: Allow foreign keys to reference a superset of unique columns

От
James Coleman
Дата:
Hello Kaiting,

The use case you're looking to handle seems interesting to me.

On Wed, Jul 27, 2022 at 3:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Kaiting Chen <ktchen14@gmail.com> writes:
> > I'd like to propose a change to PostgreSQL to allow the creation of a foreign
> > key constraint referencing a superset of uniquely constrained columns.
>
> TBH, I think this is a fundamentally bad idea and should be rejected
> outright.  It fuzzes the semantics of the FK relationship, and I'm
> not convinced that there are legitimate use-cases.  Your example
> schema could easily be dismissed as bad design that should be done
> some other way.
>
> For one example of where the semantics get fuzzy, it's not
> very clear how the extra-baggage columns ought to participate in
> CASCADE updates.  Currently, if we have
>    CREATE TABLE foo (a integer PRIMARY KEY, b integer);
> then an update that changes only foo.b doesn't need to update
> referencing tables, and I think we even have optimizations that
> assume that if no unique-key columns are touched then RI checks
> need not be made.  But if you did
>    CREATE TABLE bar (x integer, y integer,
>                      FOREIGN KEY (x, y) REFERENCES foo(a, b) ON UPDATE CASCADE);
> then perhaps you expect bar.y to be updated ... or maybe you don't?
>
> Another example is that I think the idea is only well-defined when
> the subset column(s) are a primary key, or at least all marked NOT NULL.
> Otherwise they're not as unique as you're claiming.  But then the FK
> constraint really has to be dependent on a PK constraint not just an
> index definition, since indexes in themselves don't enforce not-nullness.
> That gets back to Peter's complaint that referring to an index isn't
> good enough.
>
> Anyway, seeing that the patch touches neither ri_triggers.c nor any
> regression tests, I find it hard to believe that such semantic
> questions have been thought through.
>
> It's also unclear to me how this ought to interact with the
> information_schema views concerning foreign keys.  We generally
> feel that we don't want to present any non-SQL-compatible data
> in information_schema, for fear that it will confuse applications
> that expect to see SQL-spec behavior there.  So do we leave such
> FKs out of the views altogether, or show only the columns involving
> the associated unique constraint?  Neither answer seems pleasant.
>
> FWIW, the patch is currently failing to apply per the cfbot.
> I think you don't need to manually update backend/nodes/ anymore,
> but the gram.y changes look to have been sideswiped by some
> other recent commit.

As I was reading through the email chain I had this thought: could you
get the same benefit (or 90% of it anyway) by instead allowing the
creation of a uniqueness constraint that contains more columns than
the index backing it? So long as the index backing it still guaranteed
the uniqueness on a subset of columns that would seem to be safe.

Tom notes the additional columns being nullable is something to think
about. But if we go the route of allowing unique constraints with
backing indexes having a subset of columns from the constraint I don't
think the nullability is an issue since it's already the case that a
unique constraint can be declared on columns that are nullable. Indeed
it's also the case that we already support a foreign key constraint
backed by a unique constraint including nullable columns.

Because such an approach would, I believe, avoid changing the foreign
key code (or semantics) at all, I believe that would address Tom's
concerns about information_schema and fuzziness of semantics.

After writing down that idea I noticed Wolfgang Walther had commented
similarly, but it appears that that idea got lost (or at least not
responded to).

I'd be happy to sign up to review an updated patch if you're
interested in continuing this effort. If so, could you register the
patch in the CF app (if not there already)?

Thanks,
James Coleman



Re: Allow foreign keys to reference a superset of unique columns

От
Wolfgang Walther
Дата:
James Coleman:
> If we have a declared constraint on x,y where x is unique based on an
> index including on x I do not think we should have that fk constraint
> work differently than a constraint on x,y where there is a unique
> index on x,y. That would seem to be incredibly confusing behavior
> (even if it would be useful for some specific use case).

I don't think it's behaving differently from how it does now. See below. 
But I can see how that could be confusing. Maybe it's just about 
describing the feature in a better way than I did so far. Or maybe it 
needs a different syntax.

Anyway, I don't think it's just a specific use case. In every use case I 
had for $subject so far, the immediate next step was to write some 
triggers to fetch those derived values from the referenced table.

Ultimately it's a question of efficiency: We can achieve the same thing 
in two ways today:
- We can either **not** add the additional column (members.tenant, 
bar.ftype in my examples) to the referencing table at all, and add 
constraint triggers that do all those checks instead. This adds 
complexity to write the triggers and more complicated RLS policies etc, 
and also is potentially slower when executing those more complicated 
queries.
- Or we can add the additional column, but also add an additional unique 
index on the referenced table, and then make it part of the FK. This 
removes some of the constraint triggers and makes RLS policies simpler 
and likely faster to execute queries. It comes at a cost of additional 
cost of storage, though - and this is something that $subject tries to 
address.

Still, even when $subject is allowed, in practice we need some of the 
triggers to fetch those dependent values. Considering that the current 
FK triggers already do the same kind of queries at the same times, it'd 
be more efficient to have those FK queries fetch those dependent values.

>> But this could also be a CHECK constraint to allow FKs only to a subset
>> of rows in the target table:
> 
> Are you suggesting a check constraint that queries another table?

No. I was talking about the CHECK constraint in my example in the next 
paragraph of that mail. The CHECK constraint on bar.ftype is a regular 
CHECK constraint, but because of how ftype is updated automatically, it 
effectively behaves like some kind of additional constraint on the FK 
itself.

> This "derive the value automatically" is not what foreign key
> constraints do right now at all, right? And if fact it's contradictory
> to existing behavior, no?

I don't think it's contradicting. Maybe a better way to put my idea is this:

For a foreign key to a superset of unique columns, the already-unique 
columns should behave according to the specified ON UPDATE clause. 
However, the extra columns should always behave as they were ON UPDATE 
CASCADE. And additionally, they should behave similar to something like 
ON INSERT CASCADE. Although that INSERT is about the referencing table, 
not the referenced table, so the analogy isn't 100%.

I guess this would also be a more direct answer to Tom's earlier 
question about what to expect in the ON UPDATE scenario.

Best

Wolfgang



Re: Allow foreign keys to reference a superset of unique columns

От
James Coleman
Дата:
On Mon, Sep 26, 2022 at 2:28 AM Wolfgang Walther
<walther@technowledgy.de> wrote:
>
> James Coleman:
> > If we have a declared constraint on x,y where x is unique based on an
> > index including on x I do not think we should have that fk constraint
> > work differently than a constraint on x,y where there is a unique
> > index on x,y. That would seem to be incredibly confusing behavior
> > (even if it would be useful for some specific use case).
>
> I don't think it's behaving differently from how it does now. See below.
> But I can see how that could be confusing. Maybe it's just about
> describing the feature in a better way than I did so far. Or maybe it
> needs a different syntax.
>
> Anyway, I don't think it's just a specific use case. In every use case I
> had for $subject so far, the immediate next step was to write some
> triggers to fetch those derived values from the referenced table.
>
> Ultimately it's a question of efficiency: We can achieve the same thing
> in two ways today:
> - We can either **not** add the additional column (members.tenant,
> bar.ftype in my examples) to the referencing table at all, and add
> constraint triggers that do all those checks instead. This adds
> complexity to write the triggers and more complicated RLS policies etc,
> and also is potentially slower when executing those more complicated
> queries.
> - Or we can add the additional column, but also add an additional unique
> index on the referenced table, and then make it part of the FK. This
> removes some of the constraint triggers and makes RLS policies simpler
> and likely faster to execute queries. It comes at a cost of additional
> cost of storage, though - and this is something that $subject tries to
> address.
>
> Still, even when $subject is allowed, in practice we need some of the
> triggers to fetch those dependent values. Considering that the current
> FK triggers already do the same kind of queries at the same times, it'd
> be more efficient to have those FK queries fetch those dependent values.
>
> >> But this could also be a CHECK constraint to allow FKs only to a subset
> >> of rows in the target table:
> >
> > Are you suggesting a check constraint that queries another table?
>
> No. I was talking about the CHECK constraint in my example in the next
> paragraph of that mail. The CHECK constraint on bar.ftype is a regular
> CHECK constraint, but because of how ftype is updated automatically, it
> effectively behaves like some kind of additional constraint on the FK
> itself.

Ah, OK.

> > This "derive the value automatically" is not what foreign key
> > constraints do right now at all, right? And if fact it's contradictory
> > to existing behavior, no?
>
> I don't think it's contradicting. Maybe a better way to put my idea is this:
>
> For a foreign key to a superset of unique columns, the already-unique
> columns should behave according to the specified ON UPDATE clause.
> However, the extra columns should always behave as they were ON UPDATE
> CASCADE. And additionally, they should behave similar to something like
> ON INSERT CASCADE. Although that INSERT is about the referencing table,
> not the referenced table, so the analogy isn't 100%.
>
> I guess this would also be a more direct answer to Tom's earlier
> question about what to expect in the ON UPDATE scenario.

So the broader point I'm trying to make is that, as I understand it,
indexes backing foreign key constraints is an implementation detail.
The SQL standard details the behavior of foreign key constraints
regardless of implementation details like a backing index. That means
that the behavior of two column foreign key constraints is defined in
a single way whether or not there's a backing index at all or whether
such a backing index, if present, contains one or two columns.

I understand that for the use case you're describing this isn't the
absolute most efficient way to implement the desired data semantics.
But it would be incredibly confusing (and, I think, a violation of the
SQL standard) to have one foreign key constraint work in a different
way from another such constraint when both are indistinguishable at
the constraint level (the backing index isn't an attribute of the
constraint; it's merely an implementation detail).

James Coleman



Re: Allow foreign keys to reference a superset of unique columns

От
Wolfgang Walther
Дата:
James Coleman:
> So the broader point I'm trying to make is that, as I understand it,
> indexes backing foreign key constraints is an implementation detail.
> The SQL standard details the behavior of foreign key constraints
> regardless of implementation details like a backing index. That means
> that the behavior of two column foreign key constraints is defined in
> a single way whether or not there's a backing index at all or whether
> such a backing index, if present, contains one or two columns.
> 
> I understand that for the use case you're describing this isn't the
> absolute most efficient way to implement the desired data semantics.
> But it would be incredibly confusing (and, I think, a violation of the
> SQL standard) to have one foreign key constraint work in a different
> way from another such constraint when both are indistinguishable at
> the constraint level (the backing index isn't an attribute of the
> constraint; it's merely an implementation detail).

Ah, thanks, I understand better now.

The two would only be indistinguishable at the constraint level, if 
$subject was implemented by allowing to create unique constraints on a 
superset of unique columns, backed by a different index (the suggestion 
we both made independently). But if it was possible to reference a 
superset of unique columns, where there was only a unique constraint put 
on a subset of the referenced columns (the idea originally introduced in 
this thread), then there would be a difference, right?

That's if it was only the backing index that is not part of the SQL 
standard, and not also the fact that a foreign key should reference a 
primary key or unique constraint?

Anyway, I can see very well how that would be quite confusing overall. 
It would probably be wiser to allow something roughly like this (if at 
all, of course):

CREATE TABLE bar (
   b INT PRIMARY KEY,
   f INT,
   ftype foo_type GENERATED ALWAYS AS REFERENCE TO foo.type,
   FOREIGN KEY (f, ftype) REFERENCES foo (f, type)
);

It likely wouldn't work exactly like that, but given a foreign key to 
foo, the GENERATED clause could be used to fetch the value through the 
same triggers that form that FK for efficiency. My main point for now 
is: With a much more explicit syntax anything near that, this would 
certainly be an entirely different feature than $subject **and** it 
would be possible to implement on top of $subject. If at all.

So no need for me to distract this thread from $subject anymore. I think 
the idea of allowing to create unique constraints on a superset of the 
columns of an already existing unique index is a good one, so let's 
discuss this further.

Best

Wolfgang



Re: Allow foreign keys to reference a superset of unique columns

От
Wolfgang Walther
Дата:
James Coleman:
> As I was reading through the email chain I had this thought: could you
> get the same benefit (or 90% of it anyway) by instead allowing the
> creation of a uniqueness constraint that contains more columns than
> the index backing it? So long as the index backing it still guaranteed
> the uniqueness on a subset of columns that would seem to be safe.
> 
> Tom notes the additional columns being nullable is something to think
> about. But if we go the route of allowing unique constraints with
> backing indexes having a subset of columns from the constraint I don't
> think the nullability is an issue since it's already the case that a
> unique constraint can be declared on columns that are nullable. Indeed
> it's also the case that we already support a foreign key constraint
> backed by a unique constraint including nullable columns.
> 
> Because such an approach would, I believe, avoid changing the foreign
> key code (or semantics) at all, I believe that would address Tom's
> concerns about information_schema and fuzziness of semantics.


Could we create this additional unique constraint implicitly, when using 
FOREIGN KEY ... REFERENCES on a superset of unique columns? This would 
make it easier to use, but still give proper information_schema output.

Best

Wolfgang



Re: Allow foreign keys to reference a superset of unique columns

От
James Coleman
Дата:
On Mon, Sep 26, 2022 at 9:59 AM Wolfgang Walther
<walther@technowledgy.de> wrote:
>
> James Coleman:
> > So the broader point I'm trying to make is that, as I understand it,
> > indexes backing foreign key constraints is an implementation detail.
> > The SQL standard details the behavior of foreign key constraints
> > regardless of implementation details like a backing index. That means
> > that the behavior of two column foreign key constraints is defined in
> > a single way whether or not there's a backing index at all or whether
> > such a backing index, if present, contains one or two columns.
> >
> > I understand that for the use case you're describing this isn't the
> > absolute most efficient way to implement the desired data semantics.
> > But it would be incredibly confusing (and, I think, a violation of the
> > SQL standard) to have one foreign key constraint work in a different
> > way from another such constraint when both are indistinguishable at
> > the constraint level (the backing index isn't an attribute of the
> > constraint; it's merely an implementation detail).
>
> Ah, thanks, I understand better now.
>
> The two would only be indistinguishable at the constraint level, if
> $subject was implemented by allowing to create unique constraints on a
> superset of unique columns, backed by a different index (the suggestion
> we both made independently). But if it was possible to reference a
> superset of unique columns, where there was only a unique constraint put
> on a subset of the referenced columns (the idea originally introduced in
> this thread), then there would be a difference, right?
>
> That's if it was only the backing index that is not part of the SQL
> standard, and not also the fact that a foreign key should reference a
> primary key or unique constraint?

I think that's not true: the SQL standard doesn't have the option of
"this foreign key is backed by this unique constraint", does it? So in
either case I believe we would be at minimum implementing an extension
to the standard (and as I argued already I think it would actually be
contradictory to the standard).

> Anyway, I can see very well how that would be quite confusing overall.
> It would probably be wiser to allow something roughly like this (if at
> all, of course):
>
> CREATE TABLE bar (
>    b INT PRIMARY KEY,
>    f INT,
>    ftype foo_type GENERATED ALWAYS AS REFERENCE TO foo.type,
>    FOREIGN KEY (f, ftype) REFERENCES foo (f, type)
> );
>
> It likely wouldn't work exactly like that, but given a foreign key to
> foo, the GENERATED clause could be used to fetch the value through the
> same triggers that form that FK for efficiency. My main point for now
> is: With a much more explicit syntax anything near that, this would
> certainly be an entirely different feature than $subject **and** it
> would be possible to implement on top of $subject. If at all.

Yeah, I think that would make more sense if one were proposing an
addition to the SQL standard (or an explicit extension to it that
Postgres would support indepently of the standard).

> So no need for me to distract this thread from $subject anymore. I think
> the idea of allowing to create unique constraints on a superset of the
> columns of an already existing unique index is a good one, so let's
> discuss this further.

Sounds good to me!

James Coleman



Re: Allow foreign keys to reference a superset of unique columns

От
James Coleman
Дата:
On Mon, Sep 26, 2022 at 10:04 AM Wolfgang Walther
<walther@technowledgy.de> wrote:
>
> James Coleman:
> > As I was reading through the email chain I had this thought: could you
> > get the same benefit (or 90% of it anyway) by instead allowing the
> > creation of a uniqueness constraint that contains more columns than
> > the index backing it? So long as the index backing it still guaranteed
> > the uniqueness on a subset of columns that would seem to be safe.
> >
> > Tom notes the additional columns being nullable is something to think
> > about. But if we go the route of allowing unique constraints with
> > backing indexes having a subset of columns from the constraint I don't
> > think the nullability is an issue since it's already the case that a
> > unique constraint can be declared on columns that are nullable. Indeed
> > it's also the case that we already support a foreign key constraint
> > backed by a unique constraint including nullable columns.
> >
> > Because such an approach would, I believe, avoid changing the foreign
> > key code (or semantics) at all, I believe that would address Tom's
> > concerns about information_schema and fuzziness of semantics.
>
>
> Could we create this additional unique constraint implicitly, when using
> FOREIGN KEY ... REFERENCES on a superset of unique columns? This would
> make it easier to use, but still give proper information_schema output.

Possibly. It'd be my preference to discuss that as a second patch
(could be in the same series); my intuition is that it'd be easier to
get agreement on the first part first, but of course I could be wrong
(if some committer, for example, thought the feature only made sense
as an implicit creation of such a constraint to back the use case
Kaiting opened with).

James Coleman



Re: Allow foreign keys to reference a superset of unique columns

От
David Rowley
Дата:
On Tue, 27 Sept 2022 at 06:08, James Coleman <jtc331@gmail.com> wrote:
>
> On Mon, Sep 26, 2022 at 9:59 AM Wolfgang Walther
> <walther@technowledgy.de> wrote:
> > So no need for me to distract this thread from $subject anymore. I think
> > the idea of allowing to create unique constraints on a superset of the
> > columns of an already existing unique index is a good one, so let's
> > discuss this further.
>
> Sounds good to me!

I don't see any immediate problems with allowing UNIQUE constraints to
be supported using a unique index which contains only a subset of
columns that are mentioned in the constraint.  There would be a few
things to think about.  e.g INSERT ON CONFLICT might need some
attention as a unique constraint can be specified for use as the
arbiter.

Perhaps the patch could be broken down as follows:

0001:

* Extend ALTER TABLE ADD CONSTRAINT UNIQUE syntax to allow a column
list when specifying USING INDEX.
* Add checks to ensure the index in USING INDEX contains only columns
mentioned in the column list.
* Do any required work for INSERT ON CONFLICT. I've not looked at the
code but maybe some adjustments are required for where it gets the
list of columns.
* Address any other places that assume the supporting index contains
all columns of the unique constraint.

0002:

* Adjust transformFkeyCheckAttrs() to have it look at UNIQUE
constraints as well as unique indexes
* Ensure information_schema.referential_constraints view still works correctly.

I think that would address all of Tom's concerns he mentioned in [1].
I wasn't quite sure I understood the NOT NULL concern there since
going by RI_FKey_pk_upd_check_required(), we don't enforce FKs when
the referenced table has a NULL in the FK's columns.

David

[1] https://www.postgresql.org/message-id/3057718.1658949103@sss.pgh.pa.us



Re: Allow foreign keys to reference a superset of unique columns

От
Kaiting Chen
Дата:
> For one example of where the semantics get fuzzy, it's not
> very clear how the extra-baggage columns ought to participate in
> CASCADE updates.  Currently, if we have
>    CREATE TABLE foo (a integer PRIMARY KEY, b integer);
> then an update that changes only foo.b doesn't need to update
> referencing tables, and I think we even have optimizations that
> assume that if no unique-key columns are touched then RI checks
> need not be made.  But if you did
>    CREATE TABLE bar (x integer, y integer,
>                      FOREIGN KEY (x, y) REFERENCES foo(a, b) ON UPDATE CASCADE);
> then perhaps you expect bar.y to be updated ... or maybe you don't?

I'd expect bar.y to be updated. In my mind, the FOREIGN KEY constraint should
behave the same, regardless of whether the underlying unique index on the
referenced side is an equivalent set to, or a strict subset of, the referenced
columns.

> Another example is that I think the idea is only well-defined when
> the subset column(s) are a primary key, or at least all marked NOT NULL.
> Otherwise they're not as unique as you're claiming.  But then the FK
> constraint really has to be dependent on a PK constraint not just an
> index definition, since indexes in themselves don't enforce not-nullness.
> That gets back to Peter's complaint that referring to an index isn't
> good enough.

I think that uniqueness should be guaranteed enough even if the subset columns
are nullable:

  CREATE TABLE foo (a integer UNIQUE, b integer);

  CREATE TABLE bar (
    x integer,
    y integer,
    FOREIGN KEY (x, y) REFERENCES foo(a, b)
  );

The unique index underlying foo.a guarantees that (foo.a, foo.b) is unique if
foo.a isn't NULL. That is, there can be multiple rows (NULL, 1) in foo. However,
such a row can't be the target of the foreign key constraint anyway. So, I'm
fairly certain that, where it matters, a unique index on a nullable subset of
the referenced columns guarantees a distinct referenced row.

> It's also unclear to me how this ought to interact with the
> information_schema views concerning foreign keys.  We generally
> feel that we don't want to present any non-SQL-compatible data
> in information_schema, for fear that it will confuse applications
> that expect to see SQL-spec behavior there.  So do we leave such
> FKs out of the views altogether, or show only the columns involving
> the associated unique constraint?  Neither answer seems pleasant.

Here's the information_schema output for this example:

  CREATE TABLE foo (a integer, b integer);

  CREATE UNIQUE INDEX ON foo (a, b);

  CREATE TABLE bar (
    x integer,
    y integer,
    FOREIGN KEY (x, y) REFERENCES foo(a, b)
  );

  # SELECT * FROM information_schema.referential_constraints
    WHERE constraint_name = 'bar_x_y_fkey';

  -[ RECORD 1 ]-------------+----------------------------------------------
  constraint_catalog        | kaitingc
  constraint_schema         | public
  constraint_name           | bar_x_y_fkey
  unique_constraint_catalog |
  unique_constraint_schema  |
  unique_constraint_name    |
  match_option              | NONE
  update_rule               | NO ACTION
  delete_rule               | NO ACTION

  # SELECT * FROM information_schema.key_column_usage
    WHERE constraint_name = 'bar_x_y_fkey';

  -[ RECORD 173 ]---------------+----------------------------------------------
  constraint_catalog            | kaitingc
  constraint_schema             | public
  constraint_name               | bar_x_y_fkey
  table_catalog                 | kaitingc
  table_schema                  | public
  table_name                    | bar
  column_name                   | x
  ordinal_position              | 1
  position_in_unique_constraint | 1
  -[ RECORD 174 ]---------------+----------------------------------------------
  constraint_catalog            | kaitingc
  constraint_schema             | public
  constraint_name               | bar_x_y_fkey
  table_catalog                 | kaitingc
  table_schema                  | public
  table_name                    | bar
  column_name                   | y
  ordinal_position              | 2
  position_in_unique_constraint | 2

It appears that currently in PostgreSQL, the unique_constraint_catalog, schema,
and name are NULL in referential_constraints when a unique index (without an
associated unique constraint) underlies the referenced columns. The behaviour
I'm proposing would have the same behavior vis-a-vis referential_constraints.

As for key_column_usage, I propose that position_in_unique_constraint be NULL if
the referenced column isn't indexed.

Re: Allow foreign keys to reference a superset of unique columns

От
Kaiting Chen
Дата:
> As I was reading through the email chain I had this thought: could you
> get the same benefit (or 90% of it anyway) by instead allowing the
> creation of a uniqueness constraint that contains more columns than
> the index backing it? So long as the index backing it still guaranteed
> the uniqueness on a subset of columns that would seem to be safe.

> After writing down that idea I noticed Wolfgang Walther had commented
> similarly, but it appears that that idea got lost (or at least not
> responded to).

Is it necessary to have the unique constraint at all? This currently works in
PostgreSQL:

  CREATE TABLE foo (a integer, b integer);

  CREATE UNIQUE INDEX ON foo (a, b);

  CREATE TABLE bar (
    x integer,
    y integer,
    FOREIGN KEY (x, y) REFERENCES foo(a, b)
  );

Where no unique constraint exists on foo (a, b). Forcing the creation of a
unique constraint in this case seems more confusing to me, as a user, than
allowing it without the definition of the unique constraint, given the existing
behavior.

> I'd be happy to sign up to review an updated patch if you're
> interested in continuing this effort. If so, could you register the
> patch in the CF app (if not there already)?

The patch should already be registered! Though it's still in a state that needs
a lot of work.

Re: Allow foreign keys to reference a superset of unique columns

От
Kaiting Chen
Дата:
> Could we create this additional unique constraint implicitly, when using
> FOREIGN KEY ... REFERENCES on a superset of unique columns? This would
> make it easier to use, but still give proper information_schema output.

Currently, a foreign key declared where the referenced columns have only a
unique index and not a unique constraint already populates the constraint
related columns of information_schema.referential_constraints with NULL. It
doesn't seem like this change would require a major deviation from the existing
behavior in information_schema:

  CREATE TABLE foo (a integer, b integer);

  CREATE UNIQUE INDEX ON foo (a, b);

  CREATE TABLE bar (
    x integer,
    y integer,
    FOREIGN KEY (x, y) REFERENCES foo(a, b)
  );

  # SELECT * FROM information_schema.referential_constraints
    WHERE constraint_name = 'bar_x_y_fkey';

  -[ RECORD 1 ]-------------+----------------------------------------------
  constraint_catalog        | kaitingc
  constraint_schema         | public
  constraint_name           | bar_x_y_fkey
  unique_constraint_catalog |
  unique_constraint_schema  |
  unique_constraint_name    |
  match_option              | NONE
  update_rule               | NO ACTION
  delete_rule               | NO ACTION

The only change would be to information_schema.key_column_usage:

  # SELECT * FROM information_schema.key_column_usage
    WHERE constraint_name = 'bar_x_y_fkey';

  -[ RECORD 173 ]---------------+----------------------------------------------
  constraint_catalog            | kaitingc
  constraint_schema             | public
  constraint_name               | bar_x_y_fkey
  table_catalog                 | kaitingc
  table_schema                  | public
  table_name                    | bar
  column_name                   | x
  ordinal_position              | 1
  position_in_unique_constraint | 1
  -[ RECORD 174 ]---------------+----------------------------------------------
  constraint_catalog            | kaitingc
  constraint_schema             | public
  constraint_name               | bar_x_y_fkey
  table_catalog                 | kaitingc
  table_schema                  | public
  table_name                    | bar
  column_name                   | y
  ordinal_position              | 2
  position_in_unique_constraint | 2

Where position_in_unique_constraint would have to be NULL for the referenced
columns that don't appear in the unique index. That column is already nullable:

  For a foreign-key constraint, ordinal position of the referenced column within
  its unique constraint (count starts at 1); otherwise null

So it seems like this would be a minor documentation change at most. Also,
should that documentation be updated to mention that it's actually the "ordinal
position of the referenced column within its unique index" (since it's a little
confusing that in referential_constraints, unique_constraint_name is NULL)?

Re: Allow foreign keys to reference a superset of unique columns

От
Tom Lane
Дата:
Kaiting Chen <ktchen14@gmail.com> writes:
>> Another example is that I think the idea is only well-defined when
>> the subset column(s) are a primary key, or at least all marked NOT NULL.
>> Otherwise they're not as unique as you're claiming.  But then the FK
>> constraint really has to be dependent on a PK constraint not just an
>> index definition, since indexes in themselves don't enforce not-nullness.
>> That gets back to Peter's complaint that referring to an index isn't
>> good enough.

> The unique index underlying foo.a guarantees that (foo.a, foo.b) is unique
> if foo.a isn't NULL. That is, there can be multiple rows (NULL, 1) in foo.
> However, such a row can't be the target of the foreign key constraint
> anyway.

You're ignoring the possibility of a MATCH PARTIAL FK constraint.
Admittedly, we don't implement those today, and there hasn't been
a lot of interest in doing so.  But they're in the SQL spec so we
should fix that someday.

I also wonder how this all interacts with the UNIQUE NULLS NOT
DISTINCT feature that we just got done implementing for v15.
I don't know if the spec says that an FK depending on such a
constraint should treat nulls as ordinary unique values --- but
it sure seems like that'd be a plausible user expectation.


The bottom line is there's zero chance you'll ever convince me that this
is a good idea.  I think the semantics are at best questionable, I think
it will break important optimizations, and I think the chances of
finding ourselves in conflict with some future SQL spec extension are
too high.  (Even if you can make the case that this isn't violating the
spec *today*, which I rather doubt so far as the information_schema is
concerned.  The fact that we've got legacy behaviors that are outside
the spec there isn't a great argument for adding more.)

Now, if you can persuade the SQL committee that this behavior should be
standardized, then two of those concerns would go away (since I don't
think you'll get squishy semantics past them).  But I think submitting
a patch now is way premature and mostly going to waste people's time.

            regards, tom lane



Re: Allow foreign keys to reference a superset of unique columns

От
Kaiting Chen
Дата:
>>> Another example is that I think the idea is only well-defined when
>>> the subset column(s) are a primary key, or at least all marked NOT NULL.
>>> Otherwise they're not as unique as you're claiming.  But then the FK
>>> constraint really has to be dependent on a PK constraint not just an
>>> index definition, since indexes in themselves don't enforce not-nullness.
>>> That gets back to Peter's complaint that referring to an index isn't
>>> good enough.

>> The unique index underlying foo.a guarantees that (foo.a, foo.b) is unique
>> if foo.a isn't NULL. That is, there can be multiple rows (NULL, 1) in foo.
>> However, such a row can't be the target of the foreign key constraint
>> anyway.

> You're ignoring the possibility of a MATCH PARTIAL FK constraint.
> Admittedly, we don't implement those today, and there hasn't been
> a lot of interest in doing so.  But they're in the SQL spec so we
> should fix that someday.

> I also wonder how this all interacts with the UNIQUE NULLS NOT
> DISTINCT feature that we just got done implementing for v15.
> I don't know if the spec says that an FK depending on such a
> constraint should treat nulls as ordinary unique values --- but
> it sure seems like that'd be a plausible user expectation.

I don't think that the UNIQUE NULLS DISTINCT/NOT DISTINCT patch will have any
impact on this proposal. Currently (and admittedly I haven't thought at all
about MATCH PARTIAL), a NULL in a referencing row precludes a reference at all:

  * If the foreign key constraint is declared MATCH SIMPLE, then no referenced
    row exists for the referencing row.
  * If the foreign key constraint is declared MATCH FULL, then the referencing
    row must not have a NULL in any of its referencing columns.

UNIQUE NULLS NOT DISTINCT is the current behavior, and this proposql shouldn't
have a problem with the current behavior. In the case of UNIQUE NULLS DISTINCT,
then NULLs behave, from a uniqueness perspective, as a singleton value and thus
shouldn't cause any additional semantic difficulties in regards to this
proposal.

I don't have access to a copy of the SQL specification and it doesn't look like
anyone implements MATCH PARTIAL. Based on what I can gather from the internet,
it appears that MATCH PARTIAL allows at most one referencing column to be NULL,
and guarantees that at least one row in the referenced table matches the
remaining columns; implicitly, multiple matches are allowed. If these are the
semantics of MATCH PARTIAL, then it seems to me that uniqueness of the
referenced rows aren't very important.

What other semantics and edge cases regarding this proposal should I consider?

Re: Allow foreign keys to reference a superset of unique columns

От
Kaiting Chen
Дата:
> So the broader point I'm trying to make is that, as I understand it,
> indexes backing foreign key constraints is an implementation detail.
> The SQL standard details the behavior of foreign key constraints
> regardless of implementation details like a backing index. That means
> that the behavior of two column foreign key constraints is defined in
> a single way whether or not there's a backing index at all or whether
> such a backing index, if present, contains one or two columns.

> I understand that for the use case you're describing this isn't the
> absolute most efficient way to implement the desired data semantics.
> But it would be incredibly confusing (and, I think, a violation of the
> SQL standard) to have one foreign key constraint work in a different
> way from another such constraint when both are indistinguishable at
> the constraint level (the backing index isn't an attribute of the
> constraint; it's merely an implementation detail).

It appears to me that the unique index backing a foreign key constraint *isn't*
an implementation detail in PostgreSQL; rather, the *unique constraint* is the
implementation detail. The reason I say this is because it's possible to create
a foreign key constraint where the uniqueness of referenced columns are
guaranteed by only a unique index and where no such unique constraint exists.

Specifically, this line in the documentation:

  The referenced columns must be the columns of a non-deferrable unique or
  primary key constraint in the referenced table.

Isn't true. In practice, the referenced columns must be the columns of a valid,
nondeferrable, nonfunctional, nonpartial, unique index. Whether or not a unique
constraint exists is immaterial to whether or not postgres will let you define
such a foreign key constraint.

Re: Allow foreign keys to reference a superset of unique columns

От
Kaiting Chen
Дата:
> For one example of where the semantics get fuzzy, it's not
> very clear how the extra-baggage columns ought to participate in
> CASCADE updates.  Currently, if we have
>    CREATE TABLE foo (a integer PRIMARY KEY, b integer);
> then an update that changes only foo.b doesn't need to update
> referencing tables, and I think we even have optimizations that
> assume that if no unique-key columns are touched then RI checks
> need not be made.

Regarding optimizations that skip RI checks on the PK side of the relationship,
I believe the relevant code is here (in trigger.c):

  if (TRIGGER_FIRED_BY_UPDATE(event) || TRIGGER_FIRED_BY_DELETE(event)) {
      ...

      switch (RI_FKey_trigger_type(trigger->tgfoid)) {
          ...

          case RI_TRIGGER_PK:
              ...

              /* Update or delete on trigger's PK table */
              if (!RI_FKey_pk_upd_check_required(trigger, rel, oldslot, newslot))
              {
                  /* skip queuing this event */
                  continue;
              }

              ...

And the checks done in RI_FKey_pk_upd_check_required() are:

  const RI_ConstraintInfo *riinfo;

  riinfo = ri_FetchConstraintInfo(trigger, pk_rel, true);

  /*
    * If any old key value is NULL, the row could not have been referenced by
    * an FK row, so no check is needed.
    */
  if (ri_NullCheck(RelationGetDescr(pk_rel), oldslot, riinfo, true) != RI_KEYS_NONE_NULL)
          return false;

  /* If all old and new key values are equal, no check is needed */
  if (newslot && ri_KeysEqual(pk_rel, oldslot, newslot, riinfo, true))
          return false;

  /* Else we need to fire the trigger. */
  return true;

The columns inspected by both ri_NullCheck() and ri_KeysEqual() are based on the
riinfo->pk_attnums:

  if (rel_is_pk)
          attnums = riinfo->pk_attnums;
  else
          attnums = riinfo->fk_attnums;

The check in ri_NullCheck() is, expectedly, a straightforward NULL check:

  for (int i = 0; i < riinfo->nkeys; i++)
  {
          if (slot_attisnull(slot, attnums[i]))
                  nonenull = false;
          else
                  allnull = false;
  }

The check in ri_KeysEqual() is a bytewise comparison:

  /* XXX: could be worthwhile to fetch all necessary attrs at once */
  for (int i = 0; i < riinfo->nkeys; i++)
  {
          Datum oldvalue;
          Datum newvalue;
          bool isnull;

          /*
            * Get one attribute's oldvalue. If it is NULL - they're not equal.
            */
          oldvalue = slot_getattr(oldslot, attnums[i], &isnull);
          if (isnull)
                  return false;

          /*
            * Get one attribute's newvalue. If it is NULL - they're not equal.
            */
          newvalue = slot_getattr(newslot, attnums[i], &isnull);
          if (isnull)
                  return false;

          if (rel_is_pk)
          {
                  /*
                    * If we are looking at the PK table, then do a bytewise
                    * comparison.  We must propagate PK changes if the value is
                    * changed to one that "looks" different but would compare as
                    * equal using the equality operator.  This only makes a
                    * difference for ON UPDATE CASCADE, but for consistency we treat
                    * all changes to the PK the same.
                    */
                  Form_pg_attribute att = TupleDescAttr(oldslot->tts_tupleDescriptor, attnums[i] - 1);

                  if (!datum_image_eq(oldvalue, newvalue, att->attbyval, att->attlen))
                          return false;
          }
          else
          {
                  /*
                    * For the FK table, compare with the appropriate equality
                    * operator.  Changes that compare equal will still satisfy the
                    * constraint after the update.
                    */
                  if (!ri_AttributesEqual(riinfo->ff_eq_oprs[i], RIAttType(rel, attnums[i]),
                                                                  oldvalue, newvalue))
                          return false;
          }
  }

It seems like neither optimization actually requires the presence of the unique
index. And, as my proposed patch would leave both riinfo->nkeys and
riinfo->pk_attnums exactly the same as before, I don't believe that it should
have any impact on these optimizations.

Re: Allow foreign keys to reference a superset of unique columns

От
Peter Eisentraut
Дата:
On 28.09.22 00:39, Kaiting Chen wrote:
> What other semantics and edge cases regarding this proposal should I 
> consider?

I'm not as pessimistic as others that it couldn't be made to work.  But 
it's the job of this proposal to figure this out.  Implementing it is 
probably not that hard in the end, but working out the specification in 
all details is the actual job.