Re: Intermittent Issue with WAL Segment Removal in Logical Replication

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Intermittent Issue with WAL Segment Removal in Logical Replication
Дата
Msg-id 3b75e08a-2d53-fbb5-731a-3a8e5c71edd8@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Intermittent Issue with WAL Segment Removal in Logical Replication  (Kaushik Iska <kaushik@peerdb.io>)
Ответы Re: Intermittent Issue with WAL Segment Removal in Logical Replication
Список pgsql-general
On 12/27/23 16:31, Kaushik Iska wrote:
> Hi all,
> 
> I'm including additional details, as I am able to reproduce this issue a
> little more reliably.
> 
> Postgres Version: POSTGRES_14_9.R20230830.01_07
> Vendor: Google Cloud SQL
> Logical Replication Protocol version 1
> 

I don't know much about Google Cloud SQL internals. Is it relatively
close to Postgres (as e.g. RDS) or are the internals very different /
modified for cloud environments?

> Here are the logs of attempt succeeding right after it fails:
> 
> 2023-12-27 01:12:40.581 UTC [59790]: [6-1] db=postgres,user=postgres
> STATEMENT:  START_REPLICATION SLOT peerflow_slot_wal_testing_2 LOGICAL
> 6/5AE67D79 (proto_version '1', publication_names
> 'peerflow_pub_wal_testing_2') <- FAILS
> 2023-12-27 01:12:41.087 UTC [59790]: [7-1] db=postgres,user=postgres
> ERROR:  requested WAL segment 000000010000000600000059 has already been
> removed
> 2023-12-27 01:12:44.581 UTC [59794]: [3-1] db=postgres,user=postgres
> STATEMENT:  START_REPLICATION SLOT peerflow_slot_wal_testing_2 LOGICAL
> 6/5AE67D79 (proto_version '1', publication_names
> 'peerflow_pub_wal_testing_2')  <- SUCCEEDS
> 2023-12-27 01:12:44.582 UTC [59794]: [4-1] db=postgres,user=postgres
> LOG:  logical decoding found consistent point at 6/5A31F050
> 
> Happy to include any additional details of my setup.
> 

I personally don't see how could this fail and then succeed, unless
Google does something smart with the WAL segments under the hood. Surely
we try to open the same WAL segment (given the LSN is the same), so how
could it not exist and then exist?

As Ron already suggested, it might be useful to see information for the
replication slot peerflow_slot_wal_testing_2 (especially the restart_lsn
value). Also, maybe show the contents of pg_wal (especially for the
segment referenced in the error message).

Can you reproduce this outside Google cloud environment?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Need help
Следующее
От: Kaushik Iska
Дата:
Сообщение: Re: Intermittent Issue with WAL Segment Removal in Logical Replication