Обсуждение: altering a column's collation leaves an invalid foreign key

Поиск
Список
Период
Сортировка

altering a column's collation leaves an invalid foreign key

От
Paul Jungwirth
Дата:
Dear hackers,

I was looking at how foreign keys deal with collations, and I came across this comment about not 
re-checking a foreign key if the column type changes in a compatible way:

   * Since we require that all collations share the same notion of
   * equality (which they do, because texteq reduces to bitwise
   * equality), we don't compare collation here.

But now that we have nondeterministic collations, isn't that out of date?

For instance I could make this foreign key:

paul=# create collation itext (provider = 'icu', locale = 'und-u-ks-level1', deterministic = false);
CREATE COLLATION
paul=# create table t1 (id text collate itext primary key);
CREATE TABLE
paul=# create table t2 (id text, parent_id text references t1);
CREATE TABLE

And then:

paul=# insert into t1 values ('a');
INSERT 0 1
paul=# insert into t2 values ('.', 'A');
INSERT 0 1

So far that behavior seems correct, because the user told us 'a' and 'A' were equivalent,
but now I can change the collation on the referenced table and the FK doesn't complain:

paul=# alter table t1 alter column id type text collate "C";
ALTER TABLE

The constraint claims to be valid, but I can't drop & add it:

paul=# alter table t2 drop constraint t2_parent_id_fkey;
ALTER TABLE
paul=# alter table t2 add constraint t2_parent_id_fkey foreign key (parent_id) references t1;
ERROR:  insert or update on table "t2" violates foreign key constraint "t2_parent_id_fkey"
DETAIL:  Key (parent_id)=(A) is not present in table "t1".

Isn't that a problem?

Perhaps if the previous collation was nondeterministic we should force a re-check.

(Tested on 17devel 697f8d266c and also 16.)

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com



Re: altering a column's collation leaves an invalid foreign key

От
Paul Jungwirth
Дата:
On 3/23/24 10:04, Paul Jungwirth wrote:
> Perhaps if the previous collation was nondeterministic we should force a re-check.

Here is a patch implementing this. It was a bit more fuss than I expected, so maybe someone has a 
better way.

We have had nondeterministic collations since v12, so perhaps it is something to back-patch?

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com
Вложения

Re: altering a column's collation leaves an invalid foreign key

От
jian he
Дата:
On Mon, Mar 25, 2024 at 2:47 PM Paul Jungwirth
<pj@illuminatedcomputing.com> wrote:
>
> On 3/23/24 10:04, Paul Jungwirth wrote:
> > Perhaps if the previous collation was nondeterministic we should force a re-check.
>
> Here is a patch implementing this. It was a bit more fuss than I expected, so maybe someone has a
> better way.
>

+ /* test follows the one in ri_FetchConstraintInfo() */
+ if (ARR_NDIM(arr) != 1 ||
+ ARR_HASNULL(arr) ||
+ ARR_ELEMTYPE(arr) != INT2OID)
+ elog(ERROR, "conkey is not a 1-D smallint array");
+ attarr = (AttrNumber *) ARR_DATA_PTR(arr);
+
+ /* stash a List of the collation Oids in our Constraint node */
+ for (i = 0; i < numkeys; i++)
+ con->old_collations = lappend_oid(con->old_collations,
+  list_nth_oid(changedCollationOids, attarr[i] - 1));

I don't understand the "ri_FetchConstraintInfo" comment.


+static void
+RememberCollationForRebuilding(AttrNumber attnum, AlteredTableInfo *tab)
+{
+ Oid typid;
+ int32 typmod;
+ Oid collid;
+ ListCell   *lc;
+
+ /* Fill in the list with InvalidOid if this is our first visit */
+ if (tab->changedCollationOids == NIL)
+ {
+ int len = RelationGetNumberOfAttributes(tab->rel);
+ int i;
+
+ for (i = 0; i < len; i++)
+ tab->changedCollationOids = lappend_oid(tab->changedCollationOids,
+ InvalidOid);
+ }
+
+ get_atttypetypmodcoll(RelationGetRelid(tab->rel), attnum,
+  &typid, &typmod, &collid);
+
+ lc = list_nth_cell(tab->changedCollationOids, attnum - 1);
+ lfirst_oid(lc) = collid;
+}

do we need to check if `collid` is a valid collation?
like:

if (!OidIsValid(collid))
{
lc = list_nth_cell(tab->changedCollationOids, attnum - 1);
lfirst_oid(lc) = collid;
}



Re: altering a column's collation leaves an invalid foreign key

От
jian he
Дата:
On Tue, Mar 26, 2024 at 1:00 PM jian he <jian.universality@gmail.com> wrote:
>
> On Mon, Mar 25, 2024 at 2:47 PM Paul Jungwirth
> <pj@illuminatedcomputing.com> wrote:
> >
> > On 3/23/24 10:04, Paul Jungwirth wrote:
> > > Perhaps if the previous collation was nondeterministic we should force a re-check.
> >
> > Here is a patch implementing this. It was a bit more fuss than I expected, so maybe someone has a
> > better way.
> >
>
> + /* test follows the one in ri_FetchConstraintInfo() */
> + if (ARR_NDIM(arr) != 1 ||
> + ARR_HASNULL(arr) ||
> + ARR_ELEMTYPE(arr) != INT2OID)
> + elog(ERROR, "conkey is not a 1-D smallint array");
> + attarr = (AttrNumber *) ARR_DATA_PTR(arr);
> +
> + /* stash a List of the collation Oids in our Constraint node */
> + for (i = 0; i < numkeys; i++)
> + con->old_collations = lappend_oid(con->old_collations,
> +  list_nth_oid(changedCollationOids, attarr[i] - 1));
>
> I don't understand the "ri_FetchConstraintInfo" comment.

I still don't understand this comment.

>
> +static void
> +RememberCollationForRebuilding(AttrNumber attnum, AlteredTableInfo *tab)
> +{
> + Oid typid;
> + int32 typmod;
> + Oid collid;
> + ListCell   *lc;
> +
> + /* Fill in the list with InvalidOid if this is our first visit */
> + if (tab->changedCollationOids == NIL)
> + {
> + int len = RelationGetNumberOfAttributes(tab->rel);
> + int i;
> +
> + for (i = 0; i < len; i++)
> + tab->changedCollationOids = lappend_oid(tab->changedCollationOids,
> + InvalidOid);
> + }
> +
> + get_atttypetypmodcoll(RelationGetRelid(tab->rel), attnum,
> +  &typid, &typmod, &collid);
> +
> + lc = list_nth_cell(tab->changedCollationOids, attnum - 1);
> + lfirst_oid(lc) = collid;
> +}
>
> do we need to check if `collid` is a valid collation?
> like:
>
> if (!OidIsValid(collid))
> {
> lc = list_nth_cell(tab->changedCollationOids, attnum - 1);
> lfirst_oid(lc) = collid;
> }
now I think RememberCollationForRebuilding is fine. no need further change.


in ATAddForeignKeyConstraint.
if (old_check_ok)
{
/*
* When a pfeqop changes, revalidate the constraint.  We could
* permit intra-opfamily changes, but that adds subtle complexity
* without any concrete benefit for core types.  We need not
* assess ppeqop or ffeqop, which RI_Initial_Check() does not use.
*/
old_check_ok = (pfeqop == lfirst_oid(old_pfeqop_item));
old_pfeqop_item = lnext(fkconstraint->old_conpfeqop,
old_pfeqop_item);
}
maybe we can do the logic right below it. like:

if (old_check_ok)
{

* All deterministic collations use bitwise equality to resolve
* ties, but if the previous collation was nondeterministic,
* we must re-check the foreign key, because some references
* that use to be "equal" may not be anymore. If we have
* InvalidOid here, either there was no collation or the
* attribute didn't change.
old_check_ok = (old_collation == InvalidOid ||
get_collation_isdeterministic(old_collation));
}
then we don't need to cram it with the other `if (old_check_ok){}`.


do we need to add an entry in
https://wiki.postgresql.org/wiki/PostgreSQL_17_Open_Items#Open_Issues
?



Re: altering a column's collation leaves an invalid foreign key

От
jian he
Дата:
On Fri, Apr 12, 2024 at 5:06 PM jian he <jian.universality@gmail.com> wrote:
>
> On Tue, Mar 26, 2024 at 1:00 PM jian he <jian.universality@gmail.com> wrote:
> >
> > On Mon, Mar 25, 2024 at 2:47 PM Paul Jungwirth
> > <pj@illuminatedcomputing.com> wrote:
> > >
> > > On 3/23/24 10:04, Paul Jungwirth wrote:
> > > > Perhaps if the previous collation was nondeterministic we should force a re-check.
> > >
> > > Here is a patch implementing this. It was a bit more fuss than I expected, so maybe someone has a
> > > better way.
I think I found a simple way.

the logic is:
* ATExecAlterColumnType changes one column once at a time.
* one column can only have one collation. so we don't need  to store a
list of collation oid.
* ATExecAlterColumnType we can get the new collation (targetcollid)
and original collation info.
* RememberAllDependentForRebuilding  will check the column dependent,
whether this column is referenced by a foreign key or not information
is recorded.
so AlteredTableInfo->changedConstraintOids have the primary key and
foreign key oids.
* ATRewriteCatalogs will call ATPostAlterTypeCleanup (see the comments
in ATRewriteCatalogs)
* for tab->changedConstraintOids (foreign key, primary key) will call
ATPostAlterTypeParse, so
for foreign key (con->contype == CONSTR_FOREIGN)  will call TryReuseForeignKey.
* in TryReuseForeignKey, we can pass the information that our primary
key old collation is nondeterministic
and old collation != new collation to the foreign key constraint.
so foreign key can act accordingly at ATAddForeignKeyConstraint (old_check_ok).


based on the above logic, I add one bool in struct AlteredTableInfo,
one bool in struct Constraint.
bool in AlteredTableInfo is for storing it, later passing it to struct
Constraint.
we need bool in Constraint because ATAddForeignKeyConstraint needs it.

Вложения

Re: altering a column's collation leaves an invalid foreign key

От
jian he
Дата:
On Sat, Apr 13, 2024 at 9:13 PM jian he <jian.universality@gmail.com> wrote:
>
> > > > Here is a patch implementing this. It was a bit more fuss than I expected, so maybe someone has a
> > > > better way.
> I think I found a simple way.
>
> the logic is:
> * ATExecAlterColumnType changes one column once at a time.
> * one column can only have one collation. so we don't need  to store a
> list of collation oid.
> * ATExecAlterColumnType we can get the new collation (targetcollid)
> and original collation info.
> * RememberAllDependentForRebuilding  will check the column dependent,
> whether this column is referenced by a foreign key or not information
> is recorded.
> so AlteredTableInfo->changedConstraintOids have the primary key and
> foreign key oids.
> * ATRewriteCatalogs will call ATPostAlterTypeCleanup (see the comments
> in ATRewriteCatalogs)
> * for tab->changedConstraintOids (foreign key, primary key) will call
> ATPostAlterTypeParse, so
> for foreign key (con->contype == CONSTR_FOREIGN)  will call TryReuseForeignKey.
> * in TryReuseForeignKey, we can pass the information that our primary
> key old collation is nondeterministic
> and old collation != new collation to the foreign key constraint.
> so foreign key can act accordingly at ATAddForeignKeyConstraint (old_check_ok).
>
>
> based on the above logic, I add one bool in struct AlteredTableInfo,
> one bool in struct Constraint.
> bool in AlteredTableInfo is for storing it, later passing it to struct
> Constraint.
> we need bool in Constraint because ATAddForeignKeyConstraint needs it.

I refactored the comments.
also added some extra tests hoping to make it more bullet proof, maybe
it's redundant.

Вложения

Re: altering a column's collation leaves an invalid foreign key

От
Tom Lane
Дата:
jian he <jian.universality@gmail.com> writes:
>> * in TryReuseForeignKey, we can pass the information that our primary
>> key old collation is nondeterministic
>> and old collation != new collation to the foreign key constraint.

I have a basic question about this: why are we allowing FKs to be
based on nondeterministic collations at all?  ISTM that that breaks
the assumption that there is exactly one referenced row for any
referencing row.

            regards, tom lane



Re: altering a column's collation leaves an invalid foreign key

От
jian he
Дата:
On Sat, Jun 8, 2024 at 4:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> jian he <jian.universality@gmail.com> writes:
> >> * in TryReuseForeignKey, we can pass the information that our primary
> >> key old collation is nondeterministic
> >> and old collation != new collation to the foreign key constraint.
>
> I have a basic question about this: why are we allowing FKs to be
> based on nondeterministic collations at all?  ISTM that that breaks
> the assumption that there is exactly one referenced row for any
> referencing row.
>

for FKs nondeterministic,
I think that would require the PRIMARY KEY collation to not be
indeterministic also.

for example:
CREATE COLLATION ignore_accent_case (provider = icu, deterministic =
false, locale = 'und-u-ks-level1');
DROP TABLE IF EXISTS fktable, pktable;
CREATE TABLE pktable (x text COLLATE ignore_accent_case PRIMARY KEY);
CREATE TABLE fktable (x text REFERENCES pktable on update cascade on
delete cascade);
INSERT INTO pktable VALUES ('A');
INSERT INTO fktable VALUES ('a');
INSERT INTO fktable VALUES ('A');
update pktable set x  = 'Å';
table fktable;



if FK is nondeterministic, then it looks PK more like FK.
the following example, one FK row is referenced by two PK rows.

DROP TABLE IF EXISTS fktable, pktable;
CREATE TABLE pktable (x text COLLATE "C" PRIMARY KEY);
CREATE TABLE fktable (x text COLLATE ignore_accent_case REFERENCES
pktable on update cascade on delete cascade);
INSERT INTO pktable VALUES ('A'), ('Å');
INSERT INTO fktable VALUES ('A');

begin; delete from pktable where x = 'Å'; TABLE fktable; rollback;
begin; delete from pktable where x = 'A'; TABLE fktable; rollback;