Re: BUG #18433: Logical replication timeout

Поиск
Список
Период
Сортировка
От Kostiantyn Tomakh
Тема Re: BUG #18433: Logical replication timeout
Дата
Msg-id CAJP09w7ShycVDaEuDOP5FFm8k=aJtj+NdjY5Cb5+TgMNNa46kQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #18433: Logical replication timeout  (Shlok Kyal <shlok.kyal.oss@gmail.com>)
Список pgsql-bugs
Hello, Shlok Kyal.
I found the solution for myself. I decided to migrate from PostgreSQL 13 to Postgresql 15. 
I used the following approach Source DB PostgreSQL 13 and destination Postgresql 15.
Fortunately, this problem exists if Destination DB is PostgreSQL 13.
There are two solutions to this issue:
1) Fix this problem.
2) Inform people that they can have problems if they use PostgreSQL 13 as Destination DB during Logical replication.
I think the best choose is the first option.
Shlok Kyal, Thank you very much for your help. We really appreciate it

пт, 10 мая 2024 г. в 09:05, Shlok Kyal <shlok.kyal.oss@gmail.com>:
Hi,

> I was able to reproduce the problem.
> I did it on docker based platform I hope you will be able to reproduce this problem too.

Thanks for providing the detailed steps to reproduce the issue. I was
able to reproduce the issue with the steps you provided.
I noticed that the issue regarding the increased table size on the
subscriber can happen in all versions till Postgres 13 and I was able
to reproduce that. This is a timing issue and hence you may not be
getting this issue in postgres 10.

This issue occurs because tablesync worker exits (due to UPDATE
command) and restarts again as seen in logs:
2024-05-01 16:26:15.384 GMT [40] LOG:  logical replication table
synchronization worker for subscription "db_name_public_subscription",
table "table" has started
2024-05-01 16:26:16.994 GMT [40] ERROR:  logical replication target
relation "public.table" has neither REPLICA IDENTITY index nor PRIMARY
KEY and published relation does not have REPLICA IDENTITY FULL
2024-05-01 16:26:20.393 GMT [41] LOG:  logical replication table
synchronization worker for subscription "db_name_public_subscription",
table "table" has started

Tablesync worker sync the initial data from publisher to subscriber
using COPY command. But in this case it exits (after copy phase is
completed) and restarts, so it will perform entire copy operation
again. And hence we can see the increased table size on the
subscriber.

This issue is not reproducible in Postgres 14 and above versions. This
issue was mitigated after the commit [1]. In this commit a new state
'FINISHEDCOPY' is introduced. So if the tablesync worker exits (after
copy phase is completed) and restarts, it donot not perform COPY
command again and proceeds directly to synchronize the WAL position
between tablesync worker and apply worker.

code:
+   else if (MyLogicalRepWorker->relstate == SUBREL_STATE_FINISHEDCOPY)
+   {
+       /*
+        * The COPY phase was previously done, but tablesync then crashed
+        * before it was able to finish normally.
+        */
+       StartTransactionCommand();
+
+       /*
+        * The origin tracking name must already exist. It was created first
+        * time this tablesync was launched.
+        */
+       originid = replorigin_by_name(originname, false);
+       replorigin_session_setup(originid);
+       replorigin_session_origin = originid;
+       *origin_startpos = replorigin_session_get_progress(false);
+
+       CommitTransactionCommand();
+
+       goto copy_table_done;
+   }

Backpatching commit [1] to Postgres 13 and Postgres 12 will mitigate this issue.
Thoughts?

[1] https://github.com/postgres/postgres/commit/ce0fdbfe9722867b7fad4d3ede9b6a6bfc51fb4e

Thanks and Regards,
Shlok Kyal

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Shlok Kyal
Дата:
Сообщение: Re: BUG #18433: Logical replication timeout
Следующее
От: Ugur Yilmaz
Дата:
Сообщение: Postgresql 16.3 installation error (setup file) on Windows 11