Re: promotion related handling in pg_sync_replication_slots()

Поиск
Список
Период
Сортировка
От shveta malik
Тема Re: promotion related handling in pg_sync_replication_slots()
Дата
Msg-id CAJpy0uB=X1FrW-si-d0M5kz3ocR0rnHNSEXmw3ZtVsOkBEub+A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: promotion related handling in pg_sync_replication_slots()  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: promotion related handling in pg_sync_replication_slots()  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Fri, Apr 12, 2024 at 4:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Apr 12, 2024 at 7:57 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > On Sat, Apr 6, 2024 at 12:25 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > On Fri, Apr 5, 2024 at 10:31 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > Please find v2. Changes are:
> > >
> > > Thanks for the patch. Here are some comments.
> >
> > Thanks for reviewing.
> > >
> > > 1. Can we have a clear saying in the shmem variable who's syncing at
> > > the moment? Is it a slot sync worker or a backend via SQL function?
> > > Perhaps turn "bool syncing;" to "SlotSyncSource sync_source;"
> > >
> > > typedef enum SlotSyncSource
> > > {
> > >     SLOT_SYNC_NONE,
> > >     SLOT_SYNC_WORKER,
> > >     SLOT_SYNC_BACKEND,
> > > } SlotSyncSource;
> > >
> > > Then, the check in ShutDownSlotSync can be:
> > >
> > > +       /*
> > > +        * Return if neither the slot sync worker is running nor the function
> > > +        * pg_sync_replication_slots() is executing.
> > > +        */
> > > +       if ((SlotSyncCtx->pid == InvalidPid) &&
> > > SlotSyncCtx->sync_source != SLOT_SYNC_BACKEND)
> > >         {
> > >
>
> I don't know if this will be help, especially after fixing the race
> condition I mentioned. But otherwise, also, at this stage it doesn't
> seem helpful to add the source of sync explicitly.
>

Agreed.

Please find v3 addressing race-condition and one other comment.

Up-thread it was suggested that,  probably, checking
SlotSyncCtx->syncing should be sufficient in ShutDownSlotSync(). On
re-thinking, it might not be. Slot sync worker sets and resets
'syncing' with each sync-cycle, and thus we need to rely on worker's
pid in ShutDownSlotSync(), as there could be a window where promotion
is triggered and 'syncing' is not set for worker, while the worker is
still running. This implementation of setting and resetting syncing
with each sync-cycle looks better as compared to setting syncing
during the entire life-cycle of the worker. So, I did not change it.

To fix the race condition, I moved the setting of the 'syncing'  flag
together with the 'stopSignaled' check under the same spinLock for the
SQL function. OTOH, for worker, I feel it is good to check
'stopSignaled' at the beginning itself, while retaining the
setting/resetting of 'syncing' at a later stage during the actual sync
cycle. This makes handling for SQL function and worker slightly
different. And thus to achieve this, I had to take the 'syncing' flag
handling out of synchronize_slots() and move it to both worker and SQL
function by introducing 2 new functions check_and_set_syncing_flag()
and reset_syncing_flag().
I am analyzing if there are better ways to achieve this, any
suggestions are welcome.

thanks
Shveta

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Etsuro Fujita
Дата:
Сообщение: Re: Another WaitEventSet resource leakage in back branches
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: post-freeze damage control