Обсуждение: Question about behavior of deletes with REPLICA IDENTITY NOTHING

Поиск
Список
Период
Сортировка

Question about behavior of deletes with REPLICA IDENTITY NOTHING

От
James Coleman
Дата:
Hello,

We recently noticed some behavior that seems reasonable but also
surprised our engineers based on the docs.

If we have this setup:
create table items(i int);
insert into items(i) values (1);
create publication test_pub for all tables;

Then when we:
delete from items where i = 1;

we get:
ERROR:  cannot delete from table "items" because it does not have a
replica identity and publishes deletes
HINT:  To enable deleting from the table, set REPLICA IDENTITY using
ALTER TABLE.

Fair enough. But if we do this:
alter table items replica identity nothing;

because the docs [1] say that NOTHING means "Records no information
about the old row." We still get the same error when we try the DELETE
again.

The publication docs [2] say "A published table must have a replica
identity configured in order to be able to replicate UPDATE and DELETE
operations, so that appropriate rows to update or delete can be
identified on the subscriber side."

We interpreted the intersection of these two docs to imply that if you
explicitly configured NOTHING that the publication would simply not
log anything about the original row. Part of the confusion I think was
fed by reading "must have a replica identity set" as "have selected
one of the options via ALTER TABLE REPLICA IDENTITY" -- i.e., as
meaning that a setting has been configured rather than being about a
subset of those possible configuration values/a specific key existing
on the table.

I'm wondering if this might be a surprise to anyone else, and if so,
is there a minor docs tweak that might avoid the confusion?

Thanks,
James Coleman

1: https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-REPLICA-IDENTITY
2: https://www.postgresql.org/docs/current/logical-replication-publication.html



Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

От
Laurenz Albe
Дата:
On Wed, 2024-02-07 at 15:12 -0500, James Coleman wrote:
> We recently noticed some behavior that seems reasonable but also
> surprised our engineers based on the docs.
>
> If we have this setup:
> create table items(i int);
> insert into items(i) values (1);
> create publication test_pub for all tables;
>
> Then when we:
> delete from items where i = 1;
>
> we get:
> ERROR:  cannot delete from table "items" because it does not have a
> replica identity and publishes deletes
> HINT:  To enable deleting from the table, set REPLICA IDENTITY using
> ALTER TABLE.
>
> Fair enough. But if we do this:
> alter table items replica identity nothing;
>
> because the docs [1] say that NOTHING means "Records no information
> about the old row." We still get the same error when we try the DELETE
> again.

Well, "REPLICA IDENTITY NOTHING" is the same as "has no replica identity".
So is "REPLICA IDENTITY DEFAULT" if there is no primary key, or
"REPLICA IDENTITY USING INDEX ..." if the index is dropped.

See "pg_class": the column "relreplident" is not nullable.

Yours,
Laurenz Albe



Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

От
James Coleman
Дата:
On Wed, Feb 7, 2024 at 3:22 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Wed, 2024-02-07 at 15:12 -0500, James Coleman wrote:
> > We recently noticed some behavior that seems reasonable but also
> > surprised our engineers based on the docs.
> >
> > If we have this setup:
> > create table items(i int);
> > insert into items(i) values (1);
> > create publication test_pub for all tables;
> >
> > Then when we:
> > delete from items where i = 1;
> >
> > we get:
> > ERROR:  cannot delete from table "items" because it does not have a
> > replica identity and publishes deletes
> > HINT:  To enable deleting from the table, set REPLICA IDENTITY using
> > ALTER TABLE.
> >
> > Fair enough. But if we do this:
> > alter table items replica identity nothing;
> >
> > because the docs [1] say that NOTHING means "Records no information
> > about the old row." We still get the same error when we try the DELETE
> > again.
>
> Well, "REPLICA IDENTITY NOTHING" is the same as "has no replica identity".
> So is "REPLICA IDENTITY DEFAULT" if there is no primary key, or
> "REPLICA IDENTITY USING INDEX ..." if the index is dropped.
>
> See "pg_class": the column "relreplident" is not nullable.

Right, I think the confusing point for us is that the docs for NOTHING
("Records no information about the old row") imply you can decide you
don't have to record anything if you don't want to do so, but the
publication feature is effectively overriding that and asserting that
you can't make that choice.

Regards,
James Coleman



Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

От
Peter Smith
Дата:
On Thu, Feb 8, 2024 at 9:04 AM James Coleman <jtc331@gmail.com> wrote:
>
> On Wed, Feb 7, 2024 at 3:22 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> >
> > On Wed, 2024-02-07 at 15:12 -0500, James Coleman wrote:
> > > We recently noticed some behavior that seems reasonable but also
> > > surprised our engineers based on the docs.
> > >
> > > If we have this setup:
> > > create table items(i int);
> > > insert into items(i) values (1);
> > > create publication test_pub for all tables;
> > >
> > > Then when we:
> > > delete from items where i = 1;
> > >
> > > we get:
> > > ERROR:  cannot delete from table "items" because it does not have a
> > > replica identity and publishes deletes
> > > HINT:  To enable deleting from the table, set REPLICA IDENTITY using
> > > ALTER TABLE.
> > >
> > > Fair enough. But if we do this:
> > > alter table items replica identity nothing;
> > >
> > > because the docs [1] say that NOTHING means "Records no information
> > > about the old row." We still get the same error when we try the DELETE
> > > again.
> >
> > Well, "REPLICA IDENTITY NOTHING" is the same as "has no replica identity".
> > So is "REPLICA IDENTITY DEFAULT" if there is no primary key, or
> > "REPLICA IDENTITY USING INDEX ..." if the index is dropped.
> >
> > See "pg_class": the column "relreplident" is not nullable.
>
> Right, I think the confusing point for us is that the docs for NOTHING
> ("Records no information about the old row") imply you can decide you
> don't have to record anything if you don't want to do so, but the
> publication feature is effectively overriding that and asserting that
> you can't make that choice.
>

Hi, I can see how the current docs could be interpreted in a way that
was not intended.

~~~

To emphasise the DEFAULT behaviour that Laurenze described, I felt
there could be another sentence about DEFAULT, the same as there is
already for the USING INDEX case.

BEFORE [1]
Records the old values of the columns of the primary key, if any. This
is the default for non-system tables.

SUGGESTION
Records the old values of the columns of the primary key, if any. This
is the default for non-system tables. If there is no primary key, the
behavior is the same as NOTHING.

~~~

If that is done, then would a publication docs tweak like the one
below clarify things sufficiently?

BEFORE [2]
If a table without a replica identity is added to a publication that
replicates UPDATE or DELETE operations then subsequent UPDATE or
DELETE operations will cause an error on the publisher.

SUGGESTION
If a table without a replica identity (or with replica identity
behavior equivalent to NOTHING) is added to a publication that
replicates UPDATE or DELETE operations then subsequent UPDATE or
DELETE operations will cause an error on the publisher.

======
[1] https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-REPLICA-IDENTITY
[2] https://www.postgresql.org/docs/current/logical-replication-publication.html

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

От
James Coleman
Дата:
On Wed, Feb 7, 2024 at 6:04 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Thu, Feb 8, 2024 at 9:04 AM James Coleman <jtc331@gmail.com> wrote:
> >
> > On Wed, Feb 7, 2024 at 3:22 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > >
> > > On Wed, 2024-02-07 at 15:12 -0500, James Coleman wrote:
> > > > We recently noticed some behavior that seems reasonable but also
> > > > surprised our engineers based on the docs.
> > > >
> > > > If we have this setup:
> > > > create table items(i int);
> > > > insert into items(i) values (1);
> > > > create publication test_pub for all tables;
> > > >
> > > > Then when we:
> > > > delete from items where i = 1;
> > > >
> > > > we get:
> > > > ERROR:  cannot delete from table "items" because it does not have a
> > > > replica identity and publishes deletes
> > > > HINT:  To enable deleting from the table, set REPLICA IDENTITY using
> > > > ALTER TABLE.
> > > >
> > > > Fair enough. But if we do this:
> > > > alter table items replica identity nothing;
> > > >
> > > > because the docs [1] say that NOTHING means "Records no information
> > > > about the old row." We still get the same error when we try the DELETE
> > > > again.
> > >
> > > Well, "REPLICA IDENTITY NOTHING" is the same as "has no replica identity".
> > > So is "REPLICA IDENTITY DEFAULT" if there is no primary key, or
> > > "REPLICA IDENTITY USING INDEX ..." if the index is dropped.
> > >
> > > See "pg_class": the column "relreplident" is not nullable.
> >
> > Right, I think the confusing point for us is that the docs for NOTHING
> > ("Records no information about the old row") imply you can decide you
> > don't have to record anything if you don't want to do so, but the
> > publication feature is effectively overriding that and asserting that
> > you can't make that choice.
> >
>
> Hi, I can see how the current docs could be interpreted in a way that
> was not intended.
>
> ~~~
>
> To emphasise the DEFAULT behaviour that Laurenze described, I felt
> there could be another sentence about DEFAULT, the same as there is
> already for the USING INDEX case.
>
> BEFORE [1]
> Records the old values of the columns of the primary key, if any. This
> is the default for non-system tables.
>
> SUGGESTION
> Records the old values of the columns of the primary key, if any. This
> is the default for non-system tables. If there is no primary key, the
> behavior is the same as NOTHING.
>
> ~~~
>
> If that is done, then would a publication docs tweak like the one
> below clarify things sufficiently?
>
> BEFORE [2]
> If a table without a replica identity is added to a publication that
> replicates UPDATE or DELETE operations then subsequent UPDATE or
> DELETE operations will cause an error on the publisher.
>
> SUGGESTION
> If a table without a replica identity (or with replica identity
> behavior equivalent to NOTHING) is added to a publication that
> replicates UPDATE or DELETE operations then subsequent UPDATE or
> DELETE operations will cause an error on the publisher.
>
> ======
> [1] https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-REPLICA-IDENTITY
> [2] https://www.postgresql.org/docs/current/logical-replication-publication.html
>
> Kind Regards,
> Peter Smith.
> Fujitsu Australia

Thanks for looking at this!

Yes, both of those changes together would make this unambiguous (and,
I think, easier to mentally parse).

Thanks,
James Coleman



Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

От
Peter Smith
Дата:
On Thu, Feb 8, 2024 at 11:12 AM James Coleman <jtc331@gmail.com> wrote:
>
> On Wed, Feb 7, 2024 at 6:04 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Thu, Feb 8, 2024 at 9:04 AM James Coleman <jtc331@gmail.com> wrote:
> > >
> > > On Wed, Feb 7, 2024 at 3:22 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > > >
> > > > On Wed, 2024-02-07 at 15:12 -0500, James Coleman wrote:
> > > > > We recently noticed some behavior that seems reasonable but also
> > > > > surprised our engineers based on the docs.
> > > > >
> > > > > If we have this setup:
> > > > > create table items(i int);
> > > > > insert into items(i) values (1);
> > > > > create publication test_pub for all tables;
> > > > >
> > > > > Then when we:
> > > > > delete from items where i = 1;
> > > > >
> > > > > we get:
> > > > > ERROR:  cannot delete from table "items" because it does not have a
> > > > > replica identity and publishes deletes
> > > > > HINT:  To enable deleting from the table, set REPLICA IDENTITY using
> > > > > ALTER TABLE.
> > > > >
> > > > > Fair enough. But if we do this:
> > > > > alter table items replica identity nothing;
> > > > >
> > > > > because the docs [1] say that NOTHING means "Records no information
> > > > > about the old row." We still get the same error when we try the DELETE
> > > > > again.
> > > >
> > > > Well, "REPLICA IDENTITY NOTHING" is the same as "has no replica identity".
> > > > So is "REPLICA IDENTITY DEFAULT" if there is no primary key, or
> > > > "REPLICA IDENTITY USING INDEX ..." if the index is dropped.
> > > >
> > > > See "pg_class": the column "relreplident" is not nullable.
> > >
> > > Right, I think the confusing point for us is that the docs for NOTHING
> > > ("Records no information about the old row") imply you can decide you
> > > don't have to record anything if you don't want to do so, but the
> > > publication feature is effectively overriding that and asserting that
> > > you can't make that choice.
> > >
> >
> > Hi, I can see how the current docs could be interpreted in a way that
> > was not intended.
> >
> > ~~~
> >
> > To emphasise the DEFAULT behaviour that Laurenze described, I felt
> > there could be another sentence about DEFAULT, the same as there is
> > already for the USING INDEX case.
> >
> > BEFORE [1]
> > Records the old values of the columns of the primary key, if any. This
> > is the default for non-system tables.
> >
> > SUGGESTION
> > Records the old values of the columns of the primary key, if any. This
> > is the default for non-system tables. If there is no primary key, the
> > behavior is the same as NOTHING.
> >
> > ~~~
> >
> > If that is done, then would a publication docs tweak like the one
> > below clarify things sufficiently?
> >
> > BEFORE [2]
> > If a table without a replica identity is added to a publication that
> > replicates UPDATE or DELETE operations then subsequent UPDATE or
> > DELETE operations will cause an error on the publisher.
> >
> > SUGGESTION
> > If a table without a replica identity (or with replica identity
> > behavior equivalent to NOTHING) is added to a publication that
> > replicates UPDATE or DELETE operations then subsequent UPDATE or
> > DELETE operations will cause an error on the publisher.
> >
> > ======
> > [1] https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-REPLICA-IDENTITY
> > [2] https://www.postgresql.org/docs/current/logical-replication-publication.html
> >
> > Kind Regards,
> > Peter Smith.
> > Fujitsu Australia
>
> Thanks for looking at this!
>
> Yes, both of those changes together would make this unambiguous (and,
> I think, easier to mentally parse).
>

OK, here then is a patch to do like that.

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Вложения

Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

От
Laurenz Albe
Дата:
On Thu, 2024-02-08 at 13:40 +1100, Peter Smith wrote:
> -   how to set the replica identity.  If a table without a replica identity is
> +   how to set the replica identity.  If a table without a replica identity
> +   (or with replica identity behavior the same as <literal>NOTHING</literal>) is
>     added to a publication that replicates <command>UPDATE</command>
>     or <command>DELETE</command> operations then
>     subsequent <command>UPDATE</command> or <command>DELETE</command>

I had the impression that the root of the confusion was the perceived difference
between "REPLICA IDENTITY NOTHING" and "no replica identity", and that change
doesn't improve that.

How about:

  If a table without a replica identity (explicitly set to <literal>NOTHING</literal>,
  or set to a primary key or index that doesn't exist) is added ...


Yours,
Laurenz Albe



Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

От
Ashutosh Bapat
Дата:
On Thu, Feb 8, 2024 at 9:57 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Thu, 2024-02-08 at 13:40 +1100, Peter Smith wrote:
> > -   how to set the replica identity.  If a table without a replica identity is
> > +   how to set the replica identity.  If a table without a replica identity
> > +   (or with replica identity behavior the same as <literal>NOTHING</literal>) is
> >     added to a publication that replicates <command>UPDATE</command>
> >     or <command>DELETE</command> operations then
> >     subsequent <command>UPDATE</command> or <command>DELETE</command>
>
> I had the impression that the root of the confusion was the perceived difference
> between "REPLICA IDENTITY NOTHING" and "no replica identity", and that change
> doesn't improve that.
>
> How about:
>
>   If a table without a replica identity (explicitly set to <literal>NOTHING</literal>,
>   or set to a primary key or index that doesn't exist) is added ...

Another possibility is just to improve the documentation of various
options as follows.

DEFAULT

If there is a primary key, record the old values of the columns of the
primary key. Otherwise it acts as NOTHING. This is the default for
non-system tables.

USING INDEX index_name

Records the old values of the columns covered by the named index, that
must be unique, not partial, not deferrable, and include only columns
marked NOT NULL. If this index is dropped, the behavior is the same as
NOTHING.

FULL

Records the old values of all columns in the row.

NOTHING

Records no information about the old row. This is equivalent to having
no replica identity. This is the default for system tables.


--
Best Wishes,
Ashutosh Bapat



Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

От
James Coleman
Дата:
On Wed, Feb 7, 2024 at 11:27 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Thu, 2024-02-08 at 13:40 +1100, Peter Smith wrote:
> > -   how to set the replica identity.  If a table without a replica identity is
> > +   how to set the replica identity.  If a table without a replica identity
> > +   (or with replica identity behavior the same as <literal>NOTHING</literal>) is
> >     added to a publication that replicates <command>UPDATE</command>
> >     or <command>DELETE</command> operations then
> >     subsequent <command>UPDATE</command> or <command>DELETE</command>
>
> I had the impression that the root of the confusion was the perceived difference
> between "REPLICA IDENTITY NOTHING" and "no replica identity", and that change
> doesn't improve that.
>
> How about:
>
>   If a table without a replica identity (explicitly set to <literal>NOTHING</literal>,
>   or set to a primary key or index that doesn't exist) is added ...

I think that would work also. I was reading the initial suggestion as
"(or with replica identity behavior the same as..." as defining what
"without a replica identity" meant, which would avoid the confusion.
But your proposal is more explicit and more succinct, so I think it's
the better option of the two.

Regards,
James Coleman



Re: Question about behavior of deletes with REPLICA IDENTITY NOTHING

От
James Coleman
Дата:
On Thu, Feb 8, 2024 at 4:47 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Thu, Feb 8, 2024 at 9:57 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> >
> > On Thu, 2024-02-08 at 13:40 +1100, Peter Smith wrote:
> > > -   how to set the replica identity.  If a table without a replica identity is
> > > +   how to set the replica identity.  If a table without a replica identity
> > > +   (or with replica identity behavior the same as <literal>NOTHING</literal>) is
> > >     added to a publication that replicates <command>UPDATE</command>
> > >     or <command>DELETE</command> operations then
> > >     subsequent <command>UPDATE</command> or <command>DELETE</command>
> >
> > I had the impression that the root of the confusion was the perceived difference
> > between "REPLICA IDENTITY NOTHING" and "no replica identity", and that change
> > doesn't improve that.
> >
> > How about:
> >
> >   If a table without a replica identity (explicitly set to <literal>NOTHING</literal>,
> >   or set to a primary key or index that doesn't exist) is added ...
>
> Another possibility is just to improve the documentation of various
> options as follows.
>
> DEFAULT
>
> If there is a primary key, record the old values of the columns of the
> primary key. Otherwise it acts as NOTHING. This is the default for
> non-system tables.
>
> USING INDEX index_name
>
> Records the old values of the columns covered by the named index, that
> must be unique, not partial, not deferrable, and include only columns
> marked NOT NULL. If this index is dropped, the behavior is the same as
> NOTHING.
>
> FULL
>
> Records the old values of all columns in the row.
>
> NOTHING
>
> Records no information about the old row. This is equivalent to having
> no replica identity. This is the default for system tables.

This is the simplest change, and it does solve the confusion, so I'd
be happy with it also. The other proposals have the benefit of having
all the information necessary on the publications page rather than
requiring the user to refer to the ALTER TABLE REPLICA IDENTITY page
to understand what's meant.

Regards,
James Coleman