Re: long-standing data loss bug in initial sync of logical replication

Поиск
Список
Период
Сортировка
От Nitin Motiani
Тема Re: long-standing data loss bug in initial sync of logical replication
Дата
Msg-id CAH5HC95UsGrD1ez0vfNPCLP6vE3ptmaOkUECB5JsqLOCYjRW3w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: long-standing data loss bug in initial sync of logical replication  (vignesh C <vignesh21@gmail.com>)
Ответы Re: long-standing data loss bug in initial sync of logical replication
Список pgsql-hackers
On Wed, Jul 10, 2024 at 10:39 PM vignesh C <vignesh21@gmail.com> wrote:
>
> On Wed, 10 Jul 2024 at 12:28, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > The patch missed to use the ShareRowExclusiveLock for partitions, see
> > attached. I haven't tested it but they should also face the same
> > problem. Apart from that, I have changed the comments in a few places
> > in the patch.
>
> I could not hit the updated ShareRowExclusiveLock changes through the
> partition table, instead I could verify it using the inheritance
> table. Added a test for the same and also attaching the backbranch
> patch.
>

Hi,

I tested alternative-experimental-fix-lock.patch provided by Tomas
(replaces SUE with SRE in OpenTableList). I believe there are a couple
of scenarios the patch does not cover.

1. It doesn't handle the case of "ALTER PUBLICATION <pub> ADD TABLES
IN SCHEMA  <schema>".

I took crash-test.sh provided by Tomas and modified it to add all
tables in the schema to publication using the following command :

           ALTER PUBLICATION p ADD TABLES IN SCHEMA  public

The modified script is attached (crash-test-with-schema.sh). With this
script, I can reproduce the issue even with the patch applied. This is
because the code path to add a schema to the publication doesn't go
through OpenTableList.

I have also attached a script run-test-with-schema.sh to run
crash-test-with-schema.sh in a loop with randomly generated parameters
(modified from run.sh provided by Tomas).

2.  The second issue is a deadlock which happens when the alter
publication command is run for a comma separated list of tables.

I created another script create-test-tables-order-reverse.sh. This
script runs a command like the following :

            ALTER PUBLICATION p ADD TABLE test_2,test_1

Running the above script, I was able to get a deadlock error (the
output is attached in deadlock.txt). In the alter publication command,
I added the tables in the reverse order to increase the probability of
the deadlock. But it should happen with any order of tables.

I am not sure if the deadlock is a major issue because detecting the
deadlock is better than data loss. The schema issue is probably more
important. I didn't test it out with the latest patches sent by
Vignesh but since the code changes in that patch are also in
OpenTableList, I think the schema scenario won't be covered by those.

Thanks & Regards,
Nitin Motiani
Google

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Imran Zaheer
Дата:
Сообщение: Re: Windows: openssl & gssapi dislike each other
Следующее
От: Kirill Reshke
Дата:
Сообщение: Re: Allow non-superuser to cancel superuser tasks.