Re: Intermittent Issue with WAL Segment Removal in Logical Replication

Поиск
Список
Период
Сортировка
От Kaushik Iska
Тема Re: Intermittent Issue with WAL Segment Removal in Logical Replication
Дата
Msg-id CAHYLuV=M2YTxecoc1MH=TbChei7pAyk2gNLHnCM_eGSnGhjeOQ@mail.gmail.com
обсуждение исходный текст
Ответ на Intermittent Issue with WAL Segment Removal in Logical Replication  (Kaushik Iska <kaushik@peerdb.io>)
Ответы Re: Intermittent Issue with WAL Segment Removal in Logical Replication
Re: Intermittent Issue with WAL Segment Removal in Logical Replication
Список pgsql-general
Hi all,

I'm including additional details, as I am able to reproduce this issue a little more reliably.

Postgres Version: POSTGRES_14_9.R20230830.01_07
Vendor: Google Cloud SQL
Logical Replication Protocol version 1

Here are the logs of attempt succeeding right after it fails:

2023-12-27 01:12:40.581 UTC [59790]: [6-1] db=postgres,user=postgres STATEMENT:  START_REPLICATION SLOT peerflow_slot_wal_testing_2 LOGICAL 6/5AE67D79 (proto_version '1', publication_names 'peerflow_pub_wal_testing_2') <- FAILS
2023-12-27 01:12:41.087 UTC [59790]: [7-1] db=postgres,user=postgres ERROR:  requested WAL segment 000000010000000600000059 has already been removed
2023-12-27 01:12:44.581 UTC [59794]: [3-1] db=postgres,user=postgres STATEMENT:  START_REPLICATION SLOT peerflow_slot_wal_testing_2 LOGICAL 6/5AE67D79 (proto_version '1', publication_names 'peerflow_pub_wal_testing_2')  <- SUCCEEDS
2023-12-27 01:12:44.582 UTC [59794]: [4-1] db=postgres,user=postgres LOG:  logical decoding found consistent point at 6/5A31F050

Happy to include any additional details of my setup.

Thanks,
Kaushik


On Tue, Dec 26, 2023 at 10:36 AM Kaushik Iska <kaushik@peerdb.io> wrote:

Dear PostgreSQL Community,

I am seeking guidance regarding a recurring issue we've encountered with WAL segment removal during logical replication using pgoutput plugin. We sporadically encounter an error indicating that a requested WAL segment has already been removed. This issue arises intermittently when executing START_REPLICATION. An example error message is as follows:

requested WAL segment 000000010000146000000AE has already been removed

Please note that this error is not specific to the segment mentioned above; it serves as an example of the type of error we are experiencing.

Additional Context:

  • max_slot_wal_keep_size is -1, logical_decoding_work_mem is 4 GB.

  • The error seems to appear randomly and is not consistent.

  • After a couple of retries, the replication process eventually succeeds.

  • For one of the users it seems to be happening every 16 hours or so.


Our approach involves starting with START_REPLICATION 0, replicating data in batches, and then restarting at the last LSN of the previous batch. We are trying to understand the root cause behind the intermittent removal of WAL segments during logical replication. Specifically, we are looking for insights into:

  • The potential reasons for the WAL segments being reported as removed.

  • Why this error occurs intermittently and why replication succeeds after several retries.

  • Any advice on troubleshooting and resolving this issue, or insights into whether it might be related to our specific replication setup or a characteristic of pgoutput, would be highly valuable.


Related Posts


Thank you very much for your time and assistance.

Thanks,

Kaushik Iska

В списке pgsql-general по дате отправления:

Предыдущее
От: Kirk Wolak
Дата:
Сообщение: Re: Read write performance check
Следующее
От: Adrian Klaver
Дата:
Сообщение: Re: Changing a schema's name with function1 calling function2