Re: promotion related handling in pg_sync_replication_slots()

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: promotion related handling in pg_sync_replication_slots()
Дата
Msg-id CAA4eK1LiZdF9UAL0KTOCKt3rw1sU2cRrV+nwfjipbS2WZQwOCw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: promotion related handling in pg_sync_replication_slots()  (shveta malik <shveta.malik@gmail.com>)
Ответы Re: promotion related handling in pg_sync_replication_slots()
Список pgsql-hackers
On Fri, Apr 5, 2024 at 10:31 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> Please find v2. Changes are:
> 1) Rebased the patch as there was conflict due to recent commit 6f132ed.
> 2) Added an Assert in update_synced_slots_inactive_since() to ensure
> that the slot does not have active_pid.
> 3) Improved commit msg and comments.
>

Few comments:
==============
1.
 void
 SyncReplicationSlots(WalReceiverConn *wrconn)
 {
+ /*
+ * Startup process signaled the slot sync to stop, so if meanwhile user
+ * has invoked slot sync SQL function, simply return.
+ */
+ SpinLockAcquire(&SlotSyncCtx->mutex);
+ if (SlotSyncCtx->stopSignaled)
+ {
+ ereport(LOG,
+ errmsg("skipping slot synchronization as slot sync shutdown is
signaled during promotion"));
+
+ SpinLockRelease(&SlotSyncCtx->mutex);
+ return;
+ }
+ SpinLockRelease(&SlotSyncCtx->mutex);

There is a race condition with this code. Say during promotion
ShutDownSlotSync() is just before setting this flag and the user has
invoked pg_sync_replication_slots() and passed this check but still
didn't set the SlotSyncCtx->syncing flag. So, now, the promotion would
recognize that there is slot sync going on in parallel, and slot sync
wouldn't know that the promotion is in progress.

The current coding for slot sync worker ensures there is no such race
by checking the stopSignaled and setting the pid together under
spinlock. I think we need to move the setting of the syncing flag
similarly. Once we do that probably checking SlotSyncCtx->syncing
should be sufficient in ShutDownSlotSync(). If we change the location
of setting the 'syncing' flag then please ensure its cleanup as we
currently do in slotsync_failure_callback().

2.
@@ -1395,6 +1395,7 @@ update_synced_slots_inactive_since(void)
  if (s->in_use && s->data.synced)
  {
  Assert(SlotIsLogical(s));
+ Assert(s->active_pid == 0);

We can add a comment like: "The slot must not be acquired by any process"

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: remaining sql/json patches
Следующее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Introduce XID age and inactive timeout based replication slot invalidation