Re: Clear logical slot's 'synced' flag on promotion of standby
От | Ashutosh Sharma |
---|---|
Тема | Re: Clear logical slot's 'synced' flag on promotion of standby |
Дата | |
Msg-id | CAE9k0P=WXRHXLGxkegFLj9tVLrY45+uTtdgv+Pjt1mqyit4zZw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Clear logical slot's 'synced' flag on promotion of standby (Ajin Cherian <itsajin@gmail.com>) |
Ответы |
Re: Clear logical slot's 'synced' flag on promotion of standby
|
Список | pgsql-hackers |
Hi, On Tue, Sep 9, 2025 at 12:53 PM Ajin Cherian <itsajin@gmail.com> wrote: > > On Tue, Sep 9, 2025 at 4:21 PM shveta malik <shveta.malik@gmail.com> wrote: > > > > Hi, > > > > This is a spin-off thread from [1]. > > > > Currently, in the slot-sync worker, we have an error scenario [2] > > where, during slot synchronization, if we detect a slot with the same > > name and its synced flag is set to false, we emit an error. The > > rationale is to avoid potentially overwriting a user-created slot. > > > > But while analyzing [1], we observed that this error can lead to > > inconsistent behavior during switchovers. On the first switchover, the > > new standby logs an error: "Exiting from slot synchronization because > > a slot with the same name already exists on the standby." But during > > a double switchover, this error does not occur. > > > > Upon re-evaluating this, it seems more appropriate to clear the synced > > flag after promotion, as the flag does not hold any meaning on the > > primary. Doing so would ensure consistent behavior across all > > switchovers, as the same error will be raised avoiding the risk of > > overwriting user's slots. > > > > A patch can be posted soon on the same idea. > > Hi Shveta, > > Here’s a patch that addresses this issue. It clears any “synced” flags > on logical replication slots when a standby is promoted. I’ve also > added handling for crashes; if the server crashes before the flags are > cleared, they are reset on restart. > The restart logic was a bit tricky, since I had to rely on the > database state to decide when the reset is needed. Documentation on > these states is sparse, but from my testing I found that > DB_IN_CRASH_RECOVERY occurs when a standby crashes during promotion. > That’s the state I use to trigger the flag reset on restart. > + * required resources. Clear any leftover 'synced' flags on replication + * slots when in crash recovery on the primary. The DB_IN_CRASH_RECOVERY + * state check ensures that this code is only reached when a standby + * server crashes during promotion. */ StartupReplicationSlots(); + if (ControlFile->state == DB_IN_CRASH_RECOVERY) I believe the primary server can also enter the DB_IN_CRASH_RECOVERY state. For example, if the primary is already in crash recovery and crashes again while in crash recovery, it will restart in the DB_IN_CRASH_RECOVERY state, no? -- With this change are we saying that on primary the synced flag must be always false. Because the postgres doc on pg_replication_slots says: "The value of this column has no meaning on the primary server; the column value on the primary is default false for all slots but may (if leftover from a promoted standby) also be true." -- With Regards, Ashutosh Sharma.
В списке pgsql-hackers по дате отправления: