Re: Introduce XID age and inactive timeout based replication slot invalidation

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Introduce XID age and inactive timeout based replication slot invalidation
Дата
Msg-id CAA4eK1LFOnQ0onEhxXqD+ajq6P3GtC2XfyZ2_QTa256xB4sULQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Introduce XID age and inactive timeout based replication slot invalidation  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Список pgsql-hackers
On Wed, Apr 3, 2024 at 12:20 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Wed, Apr 03, 2024 at 11:17:41AM +0530, Bharath Rupireddy wrote:
> > On Wed, Apr 3, 2024 at 8:38 AM shveta malik <shveta.malik@gmail.com> wrote:
> > >
> > > > > Or a simple solution is that the slotsync worker updates
> > > > > inactive_since as it does for non-synced slots, and disables
> > > > > timeout-based slot invalidation for synced slots.
> > >
> > > I like this idea better, it takes care of such a case too when the
> > > user is relying on sync-function rather than worker and does not want
> > > to get the slots invalidated in between 2 sync function calls.
> >
> > Please find the attached v31 patches implementing the above idea:
>
> Thanks!
>
> Some comments related to v31-0001:
>
> === testing the behavior
>
> T1 ===
>
> > - synced slots get their on inactive_since just like any other slot
>
> It behaves as described.
>
> T2 ===
>
> > - synced slots inactive_since is set to current timestamp after the
> > standby gets promoted to help inactive_since interpret correctly just
> > like any other slot.
>
> It behaves as described.
>
> CR1 ===
>
> +        <structfield>inactive_since</structfield> value will get updated
> +        after every synchronization
>
> indicates the last synchronization time? (I think that after every synchronization
> could lead to confusion).
>

+1.

> CR2 ===
>
> +                       /*
> +                        * Set the time since the slot has become inactive after shutting
> +                        * down slot sync machinery. This helps correctly interpret the
> +                        * time if the standby gets promoted without a restart.
> +                        */
>
> It looks to me that this comment is not at the right place because there is
> nothing after the comment that indicates that we shutdown the "slot sync machinery".
>
> Maybe a better place is before the function definition and mention that this is
> currently called when we shutdown the "slot sync machinery"?
>

Won't it be better to have an assert for SlotSyncCtx->pid? IIRC, we
have some existing issues where we don't ensure that no one is running
sync API before shutdown is complete but I think we can deal with that
separately and here we can still have an Assert.

> CR3 ===
>
> +                        * We get the current time beforehand and only once to avoid
> +                        * system calls overhead while holding the lock.
>
> s/avoid system calls overhead while holding the lock/avoid system calls while holding the spinlock/?
>

Is it valid to say that there is overhead of this call while holding
spinlock? Because I don't think at the time of promotion we expect any
other concurrent slot activity. The first reason seems good enough.

One other observation:
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -42,6 +42,7 @@
 #include "access/transam.h"
 #include "access/xlog_internal.h"
 #include "access/xlogrecovery.h"
+#include "access/xlogutils.h"

Is there a reason for this inclusion? I don't see any change which
should need this one.

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Maiquel Grassi
Дата:
Сообщение: RE: Psql meta-command conninfo+
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Synchronizing slots from primary to standby