Re: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От shveta malik
Тема Re: Synchronizing slots from primary to standby
Дата
Msg-id CAJpy0uDbuJNsHngyHcL04oJg9JmO0=yjSn3jWhPTrQ-Gssc5sQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (Peter Smith <smithpb2250@gmail.com>)
Список pgsql-hackers
On Thu, Jan 18, 2024 at 10:31 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> I have one question about the new code in v63-0002.
>
> ======
> src/backend/replication/logical/slotsync.c
>
> 1. ReplSlotSyncWorkerMain
>
> + Assert(SlotSyncWorker->pid == InvalidPid);
> +
> + /*
> + * Startup process signaled the slot sync worker to stop, so if meanwhile
> + * postmaster ended up starting the worker again, exit.
> + */
> + if (SlotSyncWorker->stopSignaled)
> + {
> + SpinLockRelease(&SlotSyncWorker->mutex);
> + proc_exit(0);
> + }
>
> Can we be sure a worker crash can't occur (in ShutDownSlotSync?) in
> such a way that SlotSyncWorker->stopSignaled was already assigned
> true, but SlotSyncWorker->pid was not yet reset to InvalidPid;
>
> e.g. Is the Assert above still OK?

We are good with the Assert here. I tried below cases:

1) When slotsync worker is say killed using 'kill', it is considered
as SIGTERM; slot sync worker invokes 'slotsync_worker_onexit()' before
going down and thus sets SlotSyncWorker->pid = InvalidPid. This means
when it is restarted (considering we have put the breakpoints in such
a way that postmaster had already reached do_start_bgworker() before
promotion finished), it is able to see stopSignaled set but pid is
InvalidPid and thus we are good.

2) Another case is when we kill slot sync worker using 'kill -9' (or
say we make it crash), in such a case, postmaster signals each sibling
process to quit (including startup process) and cleans up the shared
memory used by each (including SlotSyncWorker). In such a case
promotion fails. And if slot sync worker is started again, it will
find pid as InvalidPid. So we are good.

thanks
Shveta



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Network failure may prevent promotion
Следующее
От: Nathan Bossart
Дата:
Сообщение: Re: ALTER ROLE documentation improvement