Re: BUG #18433: Logical replication timeout

Поиск
Список
Период
Сортировка
От Костянтин Томах
Тема Re: BUG #18433: Logical replication timeout
Дата
Msg-id CAJP09w6AstaFxBzFUatKSdr-ESG7MQ75FW6fwN4AvB+nucxc7w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #18433: Logical replication timeout  (Shlok Kyal <shlok.kyal.oss@gmail.com>)
Ответы Re: BUG #18433: Logical replication timeout  (Shlok Kyal <shlok.kyal.oss@gmail.com>)
Список pgsql-bugs
Hello, Shlok Kyal. Thank you very much for your answer.
Yes, we tried to increase wal_sender_timeout and wal_receiver_timeout from 30seconds up to 300 seconds for source and destination DB but we got the same error.
We set REPLICA IDENTITY FULL on Postgresql 13 because we got an error at destination DB "ERROR: logical replication target relation "public.table" has neither REPLICA IDENTITY index nor PRIMARY KEY and published relation does not have REPLICA IDENTITY FULL". We didn't have such errors on Postgresql 10.


ср, 17 апр. 2024 г. в 13:48, Shlok Kyal <shlok.kyal.oss@gmail.com>:
Hi,

On Mon, 15 Apr 2024 at 13:09, PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      18433
> Logged by:          Kostiantyn
> Email address:      tomahkvt@gmail.com
> PostgreSQL version: 13.14
> Operating system:   AWS RDS
> Description:
>
> On Postgresql 10 we used the following approach for the big tables:
> 1) Download schema from the source database instance
> 2) Deleted PK, FK, and Indexes for tables bigger than 100Gb from the
> schema
> 3)Upload the schema to the destination DB.
> 4) Configure Logical replication between source and destination DB.
> 5) When last table logical replication table synchronization worker for
> subscription "db_name_public_subscription", table "table_name" has
> finished
> 6) we created all the necessary PK, FK, and Indexes.
> This approach allowed to us upload data more quickly. This approach was
> working  great on PostgreSQL 10.
>
> We tried the same approach for Postgresql13, but we got an error.
> 1) Download schema from the source database instance
> 2) Deleted PK, FK, and Indexes for tables bigger than 100Gb from the
> schema
> 3)Upload the schema to the destination DB.
> 4) configurated identity replication full at source DB for tables bigger
> than 100Gb
> 5) Configured Logical replication between source and destination DB.
> 6) During catchup on this big table  process we got the following
> messages:
> Source DB
> 2024-04-08 15:38:34 UTC:(27994):replication_role@:[22047]:LOG: terminating
> walsender process due to replication timeout
> 2024-04-08 15:38:34 UTC:(27994):replication_role@:[22047]:CONTEXT: slot
> "db_name_public_subscription", output plugin "pgoutput", in the begin
> callback, associated LSN 13705/2E913050
> 2024-04-08 15:38:34 UTC:(27994):replication_role@:[22047]:STATEMENT:
> START_REPLICATION SLOT "db_name_public_subscription" LOGICAL 13702/C2C8FB30
> (proto_version '1', publication_names '"db_name_public_publication"')
> 2024-04-08 15:38:34 UTC:(36862):replication_role@:[22811]:LOG: terminating
> walsender process due to replication timeout
> 2024-04-08 15:38:34 UTC:(36862):replication_role@:[22811]:CONTEXT: slot
> "db_name_public_subscription_18989108_sync_17127", output plugin "pgoutput",
> in the begin callback, associated LSN 13703/27942B70
> 2024-04-08 15:38:34 UTC:(36862):replication_role@:[22811]:STATEMENT:
> START_REPLICATION SLOT "db_name_public_subscription_18989108_sync_17127"
> LOGICAL 13703/17622B58 (proto_version '1', publication_names
> '"db_name_public_publication"')
>
> One important point. If there is no request to source DB logical replication
> works fine for big tables.
> I saw the messages in PostgreSQL bugs like
> https://www.postgresql.org/message-id/flat/718213.1601410160%40sss.pgh.pa.us#7e61dd07661901b505bcbd74ce5f5f28
> But I also did some tests and increased wal_keep_size
> and max_slot_wal_keep_size to 1GB. And I set wal_sender_timeout to 1h but
> without success. The setup works in PG 13 only with a small amount of data.

I went through the issue and I think that the given logs are appearing
due to some delay in the Apply Worker process.
I could reproduce it in my development environment by applying delays
in Apply Worker.

I think this issue can be resolved by setting 'wal_sender_timeout' to
a greater value. Can you try setting 'wal_sender_timeout' to a greater
value?

Also, I noticed that while using Postgresql13 you are configuring the
table in sourceDB as REPLICA IDENTITY FULL but not doing the same in
Postgresql10. Is there any specific reason for it?
I pointed it out because REPLICA IDENTITY FULL has a different
behaviour and sets the entire row as key.

Thanks and Regards,
Shlok Kyal

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tender Wang
Дата:
Сообщение: Re: BUG #18377: Assert false in "partdesc->nparts >= pinfo->nparts", fileName="execPartition.c", lineNumber=1943
Следующее
От: "Mathews, Rob"
Дата:
Сообщение: CVE-2024-28849