Re: promotion related handling in pg_sync_replication_slots()

Поиск
Список
Период
Сортировка
От shveta malik
Тема Re: promotion related handling in pg_sync_replication_slots()
Дата
Msg-id CAJpy0uD-bRDc+rdf5znj7gp40nTq=m-1QE4U6z029tYV+C4xPg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: promotion related handling in pg_sync_replication_slots()  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Ответы Re: promotion related handling in pg_sync_replication_slots()  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Sat, Apr 6, 2024 at 12:25 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Fri, Apr 5, 2024 at 10:31 AM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > Please find v2. Changes are:
>
> Thanks for the patch. Here are some comments.

Thanks for reviewing.
>
> 1. Can we have a clear saying in the shmem variable who's syncing at
> the moment? Is it a slot sync worker or a backend via SQL function?
> Perhaps turn "bool syncing;" to "SlotSyncSource sync_source;"
>
> typedef enum SlotSyncSource
> {
>     SLOT_SYNC_NONE,
>     SLOT_SYNC_WORKER,
>     SLOT_SYNC_BACKEND,
> } SlotSyncSource;
>
> Then, the check in ShutDownSlotSync can be:
>
> +       /*
> +        * Return if neither the slot sync worker is running nor the function
> +        * pg_sync_replication_slots() is executing.
> +        */
> +       if ((SlotSyncCtx->pid == InvalidPid) &&
> SlotSyncCtx->sync_source != SLOT_SYNC_BACKEND)
>         {
>
> 2.
> SyncReplicationSlots(WalReceiverConn *wrconn)
>  {
> +    /*
> +     * Startup process signaled the slot sync to stop, so if meanwhile user
> +     * has invoked slot sync SQL function, simply return.
> +     */
> +    SpinLockAcquire(&SlotSyncCtx->mutex);
> +    if (SlotSyncCtx->stopSignaled)
> +    {
> +        ereport(LOG,
> +                errmsg("skipping slot synchronization as slot sync
> shutdown is signaled during promotion"));
> +
>
> Unless I'm missing something, I think this can't detect if the backend
> via SQL function is already half-way through syncing in
> synchronize_one_slot. So, better move this check to (or also have it
> there) slot sync loop that calls synchronize_one_slot. To avoid
> spinlock acquisitions, we can perhaps do this check in when we acquire
> the spinlock for synced flag.

If the sync via SQL function is already half-way, then promotion
should wait for it to finish. I don't think it is a good idea to move
the check to synchronize_one_slot().  The sync-call should either not
start (if it noticed the promotion) or finish the sync and then let
promotion proceed. But I would like to know others' opinion on this.

>
>     /* Search for the named slot */
>     if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
>     {
>         bool        synced;
>
>         SpinLockAcquire(&slot->mutex);
>         synced = slot->data.synced;
>         << get SlotSyncCtx->stopSignaled here >>
>         SpinLockRelease(&slot->mutex);
>
>         << do the slot sync skip check here if (stopSignaled) >>
>
> 3. Can we have a test or steps at least to check the consequences
> manually one can get if slot syncing via SQL function is happening
> during the promotion?
>
> IIUC, we need to ensure there is no backend acquiring it and
> performing sync while the slot sync worker is shutting down/standby
> promotion is occuring. Otherwise, some of the slots can get resynced
> and some are not while we are shutting down the slot sync worker as
> part of the standby promotion which might leave the slots in an
> inconsistent state.

I do not think that we can reach a state (exception is some error
scenario) where some of the slots are synced while the rest are not
during a *particular* sync-cycle only because promotion is going in
parallel. (And yes we need to fix the race-condition stated by Amit
up-thread for this statement to be true.)

thanks
Shveta



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Issue with the PRNG used by Postgres