Re: Newly created replication slot may be invalidated by checkpoint
| От | Amit Kapila |
|---|---|
| Тема | Re: Newly created replication slot may be invalidated by checkpoint |
| Дата | |
| Msg-id | CAA4eK1KV_Da=kBSO9TYKmUYWm=RYTOZef00fTtBT3uV+dEr6aA@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: Newly created replication slot may be invalidated by checkpoint (Masahiko Sawada <sawada.mshk@gmail.com>) |
| Ответы |
RE: Newly created replication slot may be invalidated by checkpoint
|
| Список | pgsql-hackers |
On Thu, Dec 4, 2025 at 2:04 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Dec 2, 2025 at 10:15 PM Zhijie Hou (Fujitsu)
> <houzj.fnst@fujitsu.com> wrote:
> >
> > I think the invalidation cannot occur when copying because:
> >
> > Currently, there are no CHECK_FOR_INTERRUPTS() calls between the initial
> > restart_lsn copy (first phase) and the latest restart_lsn copy (second phase).
> > As a result, even if a checkpoint attempts to invalidate a slot and sends a
> > SIGTERM to the backend, the backend will first update the restart_lsn during the
> > second phase before responding to the signal. Consequently, during the next
> > cycle of InvalidatePossiblyObsoleteSlot(), the checkpoint will observe the
> > updated restart_lsn and skip the invalidation.
> >
> > For logical slots, although invoking the output plugin startup callback presents
> > a slight chance of processing the signal (when using third-party plugins), slot
> > invalidation in this scenario results in immediate slot dropping, because the
> > slot is in RS_EPHEMERAL state, thus preventing invalidation.
>
> Thank you for the analysis. I agree.
>
> > While theoretically, slot invalidation could occur if the code changes in the
> > future, addressing that possibility could be considered an independent
> > improvement task. What do you think ?
>
> Okay. I find that it also might make sense for HEAD to use
> RS_EPHEMERAL state for physical slots too to avoid being invalidated
> during creation, which probably can be discussed later. For back
> branches, the proposed idea of acquiring ReplicationSlotAllocationLock
> in an exclusive mode would be better. I think we might want to have a
> comment in CheckPointReplicationSlots() too that refers to
> ReplicationSlotReserveWal().
>
> Regarding whether we revert the original fix 2090edc6f32 and make it
> the same as we did in HEAD ca307d5cec90a4f, we need to change the size
> of ReplicationSlot struct. I'm concerned that it's really safe to
> change it because the data resides on the shared memory. For example,
> we typically iterate over all replication slots as follow:
>
> for (i = 0; i < max_replication_slots; i++)
> {
> ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];
>
> I'm concerned that the arithmetic for calculating the slot address is
> affected by the size of ReplicationSlot change.
>
Yes, this is a valid concern. I think we can go-ahead with fixing the
0001's-fix in HEAD and 18. We can discuss separately the fix for
back-branches prior to 18.
--
With Regards,
Amit Kapila.
В списке pgsql-hackers по дате отправления: