Re: Synchronizing slots from primary to standby
| От | Bertrand Drouvot | 
|---|---|
| Тема | Re: Synchronizing slots from primary to standby | 
| Дата | |
| Msg-id | ZYWdSIeAMQQcLmVT@ip-10-97-1-34.eu-west-3.compute.internal обсуждение исходный текст | 
| Ответ на | Re: Synchronizing slots from primary to standby (shveta malik <shveta.malik@gmail.com>) | 
| Ответы | Re: Synchronizing slots from primary to standby | 
| Список | pgsql-hackers | 
Hi,
On Fri, Dec 22, 2023 at 04:02:21PM +0530, shveta malik wrote:
> PFA v53. Changes are:
Thanks!
> patch002:
> 2) Addressed comments in [2] for v52-002.
> 3) Fixed CFBot failure. The failure was caused by an assert in
> wait_for_primary_slot_catchup() for null confirmed_lsn received. In
> wait_for_primary_slot_catchup(), we had an assumption that if
> restart_lsn is valid and 'conflicting' is also false, then we must
> have non-null confirmed_lsn. But this is not true. It is possible to
> get null values for confirmed_lsn and catalog_xmin if on the primary
> server the slot is just created with a valid restart_lsn and slot-sync
> worker has fetched the slot before the primary server could set valid
> confirmed_lsn and catalog_xmin. In
> pg_create_logical_replication_slot(), there is a small window between
> CreateInitDecodingContext-->ReplicationSlotReserveWal() which sets
> restart_lsn and DecodingContextFindStartpoint() which sets
> confirmed_lsn. If the slot-sync worker fetches the slot in this
> window, confirmed_lsn received will be NULL. Corrected the code to
> remove assert and added one additional condition that confirmed_lsn
> should be valid before moving the slot to 'r'.
> 
Looking at v53-0002 commit message:
It states:
"
If a logical slot on the primary is valid but is invalidated on the standby,
then that slot is dropped and recreated on the standby in next sync-cycle.
"
and one of the reasons mentioned is:
"
    - The primary changes wal_level to a level lower than logical.
"
I think that as long at there is still logical replication slot on the primary
that should not be possible. The primary should fail to start with messages like:
"
2023-12-22 14:06:09.281 UTC [31824] FATAL:  logical replication slot "logical_slot" exists, but wal_level < logical
"
Now, if:
- The standby is shutdown
- All the logical replication slots are removed on the primary
- wal_level is set to < logical on the primary and it is restarted
Then when the standby starts, the "synced" slots will be invalidated and later 
removed but not re-created on the next sync-cycle (because they don't exist
anymore on the primary).
Worth to reword a bit that part?
Regards,
-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
		
	В списке pgsql-hackers по дате отправления: