Re: Introduce XID age and inactive timeout based replication slot invalidation

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Introduce XID age and inactive timeout based replication slot invalidation
Дата
Msg-id CAA4eK1JtKieWMivbswYg5FVVB5FugCftLvQKVsxh=m_8nk04vw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Introduce XID age and inactive timeout based replication slot invalidation  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Ответы Re: Introduce XID age and inactive timeout based replication slot invalidation  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Список pgsql-hackers
On Fri, Mar 29, 2024 at 6:17 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Fri, Mar 29, 2024 at 03:03:01PM +0530, Amit Kapila wrote:
> > On Fri, Mar 29, 2024 at 11:49 AM Bertrand Drouvot
> > <bertranddrouvot.pg@gmail.com> wrote:
> > >
> > > On Fri, Mar 29, 2024 at 09:39:31AM +0530, Amit Kapila wrote:
> > > >
> > > > Commit message states: "why we can't just update inactive_since for
> > > > synced slots on the standby with the value received from remote slot
> > > > on the primary. This is consistent with any other slot parameter i.e.
> > > > all of them are synced from the primary."
> > > >
> > > > The inactive_since is not consistent with other slot parameters which
> > > > we copy. We don't perform anything related to those other parameters
> > > > like say two_phase phase which can change that property. However, we
> > > > do acquire the slot, advance the slot (as per recent discussion [1]),
> > > > and release it. Since these operations can impact inactive_since, it
> > > > seems to me that inactive_since is not the same as other parameters.
> > > > It can have a different value than the primary. Why would anyone want
> > > > to know the value of inactive_since from primary after the standby is
> > > > promoted?
> > >
> > > I think it can be useful "before" it is promoted and in case the primary is down.
> > >
> >
> > It is not clear to me what is user going to do by checking the
> > inactivity time for slots when the corresponding server is down.
>
> Say a failover needs to be done, then it could be useful to know for which
> slots the activity needs to be resumed (thinking about external logical decoding
> plugin, not about pub/sub here). If one see an inactive slot (since long "enough")
> then he can start to reasonate about what to do with it.
>
> > I thought the idea was to check such slots and see if they need to be
> > dropped or enabled again to avoid excessive disk usage, etc.
>
> Yeah that's the case but it does not mean inactive_since can't be useful in other
> ways.
>
> Also, say the slot has been invalidated on the primary (due to inactivity timeout),
> primary is down and there is a failover. By keeping the inactive_since from
> the primary, one could know when the inactivity that lead to the timeout started.
>

So, this means at promotion, we won't set the current_time for
inactive_since which is not what the currently proposed patch is
doing. Moreover, doing the invalidation on promoted standby based on
inactive_since of the primary node sounds debatable because the
inactive_timeout could be different on the new node (promoted
standby).

> Again, more concerned about external logical decoding plugin than pub/sub here.
>
> > > I agree that tracking the activity time of a synced slot can be useful, why
> > > not creating a dedicated field for that purpose (and keep inactive_since a
> > > perfect "copy" of the primary)?
> > >
> >
> > We can have a separate field for this but not sure if it is worth it.
>
> OTOH I'm not sure that erasing this information from the primary is useful. I
> think that 2 fields would be the best option and would be less subject of
> misinterpretation.
>
> > > > Now, the other concern is that calling GetCurrentTimestamp()
> > > > could be costly when the values for the slot are not going to be
> > > > updated but if that happens we can optimize such that before acquiring
> > > > the slot we can have some minimal pre-checks to ensure whether we need
> > > > to update the slot or not.
> > >
> > > Right, but for a very active slot it is likely that we call GetCurrentTimestamp()
> > > during almost each sync cycle.
> > >
> >
> > True, but if we have to save a slot to disk each time to persist the
> > changes (for an active slot) then probably GetCurrentTimestamp()
> > shouldn't be costly enough to matter.
>
> Right, persisting the changes to disk would be even more costly.
>

The point I was making is that currently after copying the
remote_node's values, we always persist the slots to disk, so the cost
of current_time shouldn't be much. Now, if the values won't change
then probably there is some cost but in most cases (active slots), the
values will always change. Also, if all the slots are inactive then we
will slow down the speed of sync. We also need to consider if we want
to copy the value of inactive_since from the primary and if that is
the only value changed then shall we persist the slot or not?

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Introduce XID age and inactive timeout based replication slot invalidation
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: Improve eviction algorithm in ReorderBuffer