RE: Conflict detection and logging in logical replication

Поиск
Список
Период
Сортировка
От Zhijie Hou (Fujitsu)
Тема RE: Conflict detection and logging in logical replication
Дата
Msg-id OS0PR01MB57168F2C0DA21A8DAAA06AED94A52@OS0PR01MB5716.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Conflict detection and logging in logical replication  (shveta malik <shveta.malik@gmail.com>)
Ответы Re: Conflict detection and logging in logical replication
Список pgsql-hackers
On Wednesday, July 10, 2024 5:39 PM shveta malik <shveta.malik@gmail.com> wrote:
> 
> On Wed, Jul 3, 2024 at 8:31 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com>
> wrote:
> >
> > On Wednesday, June 26, 2024 10:58 AM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> > >
> >
> > Hi,
> >
> > As suggested by Sawada-san in another thread[1].
> >
> > I am attaching the V4 patch set which tracks the delete_differ
> > conflict in logical replication.
> >
> > delete_differ means that the replicated DELETE is deleting a row that
> > was modified by a different origin.
> >
> 
> Thanks for the patch. I am still in process of review but please find few
> comments:

Thanks for the comments!

> 1) When I try to *insert* primary/unique key on pub, which already exists on
> sub, conflict gets detected. But when I try to *update* primary/unique key to a
> value on pub which already exists on sub, conflict is not detected. I get the
> error:
> 
> 2024-07-10 14:21:09.976 IST [647678] ERROR:  duplicate key value violates
> unique constraint "t1_pkey"
> 2024-07-10 14:21:09.976 IST [647678] DETAIL:  Key (pk)=(4) already exists.

Yes, I think the detection of this conflict is not added with the
intention to control the size of the patch in the first version.

> 
> This is because such conflict detection needs detection of constraint violation
> using the *new value* rather than *existing* value during UPDATE. INSERT
> conflict detection takes care of this case i.e. the columns of incoming row are
> considered as new values and it tries to see if all unique indexes are okay to
> digest such new values (all incoming columns) but update's logic is different.
> It searches based on oldTuple *only* and thus above detection is missing.

I think the logic is the same if we want to detect the unique violation
for UDPATE, we need to check if the new value of the UPDATE violates any
unique constraints same as the detection of insert_exists (e.g. check
the conflict around ExecInsertIndexTuples())

> 
> Shall we support such detection? If not, is it worth docuementing?

I am personally OK to support this detection. And
I think it's already documented that we only detect unique violation for
insert which mean update conflict is not detected.

> 2)
> Another case which might confuse user:
> 
> CREATE TABLE t1 (pk integer primary key, val1 integer, val2 integer);
> 
> On PUB: insert into t1 values(1,10,10); insert into t1 values(2,20,20);
> 
> On SUB: update t1 set pk=3 where pk=2;
> 
> Data on PUB: {1,10,10}, {2,20,20}
> Data on SUB: {1,10,10}, {3,20,20}
> 
> Now on PUB: update t1 set val1=200 where val1=20;
> 
> On Sub, I get this:
> 2024-07-10 14:44:00.160 IST [648287] LOG:  conflict update_missing detected
> on relation "public.t1"
> 2024-07-10 14:44:00.160 IST [648287] DETAIL:  Did not find the row to be
> updated.
> 2024-07-10 14:44:00.160 IST [648287] CONTEXT:  processing remote data for
> replication origin "pg_16389" during message type "UPDATE" for replication
> target relation "public.t1" in transaction 760, finished at 0/156D658
> 
> To user, it could be quite confusing, as val1=20 exists on sub but still he gets
> update_missing conflict and the 'DETAIL' is not sufficient to give the clarity. I
> think on HEAD as well (have not tested), we will get same behavior i.e. update
> will be ignored as we make search based on RI (pk in this case). So we are not
> worsening the situation, but now since we are detecting conflict, is it possible
> to give better details in 'DETAIL' section indicating what is actually missing?

I think It's doable to report the row value that cannot be found in the local
relation, but the concern is the potential risk of exposing some
sensitive data in the log. This may be OK, as we are already reporting the
key value for constraints violation, so if others also agree, we can add
the row value in the DETAIL as well.

Best Regards,
Hou zj


В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Parent/child context relation in pg_get_backend_memory_contexts()
Следующее
От: "Euler Taveira"
Дата:
Сообщение: Re: improve performance of pg_dump with many sequences