Re: Clear logical slot's 'synced' flag on promotion of standby
От | shveta malik |
---|---|
Тема | Re: Clear logical slot's 'synced' flag on promotion of standby |
Дата | |
Msg-id | CAJpy0uA111v1-3Lmo-J+QsCSLFMOYnJpOestsoH4CQHgyP4OMA@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Clear logical slot's 'synced' flag on promotion of standby (Ashutosh Sharma <ashu.coek88@gmail.com>) |
Список | pgsql-hackers |
On Thu, Sep 11, 2025 at 7:29 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > On Thu, Sep 11, 2025 at 9:17 AM shveta malik <shveta.malik@gmail.com> wrote: > > > > On Tue, Sep 9, 2025 at 2:19 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote: > > > > > > Hi, > > > > > > > > > + * required resources. Clear any leftover 'synced' flags on replication > > > + * slots when in crash recovery on the primary. The DB_IN_CRASH_RECOVERY > > > + * state check ensures that this code is only reached when a standby > > > + * server crashes during promotion. > > > */ > > > StartupReplicationSlots(); > > > + if (ControlFile->state == DB_IN_CRASH_RECOVERY) > > > > > > I believe the primary server can also enter the DB_IN_CRASH_RECOVERY > > > state. For example, if the primary is already in crash recovery and > > > crashes again while in crash recovery, it will restart in the > > > DB_IN_CRASH_RECOVERY state, no? > > > > > > > Yes, good point. I think we can differentiate the two cases based on > > the timeline change. A regular primary won't have a timeline change, > > whereas a promoted standby that failed during promotion will show a > > timeline change immediately upon restart. Thoughts? > > > > Will there be any issues if we clear the sync status immediately after > the standby.signal file is removed from the standby server? > > We could maybe introduce a temporary "promote.inprogress" marker file > on disk before removing standby.signal. The sequence would be: > > 1) Create promote.inprogress. > 2) Unlink standby.signal > 3) Clear the sync slot status. > 4) Remove promote.inprogress. > > This way, if the server crashes after standby.signal is removed but > before the sync status is cleared, the presence of promote.inprogress > would indicate that the standby was in the middle of promotion and > crashed before slot cleanup. On restart, we could use that marker to > detect the incomplete promotion and finish clearing the sync flags. > > If the crash happens at a later stage, the server will no longer start > as a standby anyway, and by then the sync flags would already have > been reset. > > This is just a thought and it may sound a bit naive. Let me know if I > am overlooking something. > The approach seems valid and should work, but introducing a new file like promote.inprogress for this purpose might be excessive. We can first try analyzing existing information to determine whether we can distinguish between the two scenarios -- a primary in crash recovery immediately after a promotion attempt versus a regular primary. If we are unable to find any way, we can revisit the idea. thanks Shveta
В списке pgsql-hackers по дате отправления: