Re: Introduce XID age and inactive timeout based replication slot invalidation

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Introduce XID age and inactive timeout based replication slot invalidation
Дата
Msg-id CAA4eK1KhMr5gEvzAutxkS--n6y16-mrMxF1fiq-d-RSo_URE5w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Introduce XID age and inactive timeout based replication slot invalidation  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Ответы Re: Introduce XID age and inactive timeout based replication slot invalidation  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
Список pgsql-hackers
On Mon, Mar 18, 2024 at 8:19 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> On Thu, Mar 14, 2024 at 12:27:26PM +0530, Amit Kapila wrote:
> > On Wed, Mar 13, 2024 at 10:16 PM Bharath Rupireddy
> > <bharath.rupireddyforpostgres@gmail.com> wrote:
> > >
> > > On Wed, Mar 13, 2024 at 11:13 AM shveta malik <shveta.malik@gmail.com> wrote:
> > > >
> > > > > Thanks. v8-0001 is how it looks. Please see the v8 patch set with this change.
> > > >
> > > > JFYI, the patch does not apply to the head. There is a conflict in
> > > > multiple files.
> > >
> > > Thanks for looking into this. I noticed that the v8 patches needed
> > > rebase. Before I go do anything with the patches, I'm trying to gain
> > > consensus on the design. Following is the summary of design choices
> > > we've discussed so far:
> > > 1) conflict_reason vs invalidation_reason.
> > > 2) When to compute the XID age?
> > >
> >
> > I feel we should focus on two things (a) one is to introduce a new
> > column invalidation_reason, and (b) let's try to first complete
> > invalidation due to timeout. We can look into XID stuff if time
> > permits, remember, we don't have ample time left.
>
> Agree. While it makes sense to invalidate slots for wal removal in
> CreateCheckPoint() (because this is the place where wal is removed), I 'm not
> sure this is the right place for the 2 new cases.
>
> Let's focus on the timeout one as proposed above (as probably the simplest one):
> as this one is purely related to time and activity what about to invalidate them
> when?:
>
> - their usage resume
> - in pg_get_replication_slots()
>
> The idea is to invalidate the slot when one resumes activity on it or wants to
> get information about it (and among other things wants to know if the slot is
> valid or not).
>

Trying to invalidate at those two places makes sense to me but we
still need to cover the cases where it takes very long to resume the
slot activity and the dangling slot cases where the activity is never
resumed. How about apart from the above two places, trying to
invalidate in CheckPointReplicationSlots() where we are traversing all
the slots? This could prevent invalid slots from being marked as
dirty.

BTW, how will the user use 'inactive_count' to know whether a
replication slot is becoming inactive too frequently? The patch just
keeps incrementing this counter, one will never know in the last 'n'
minutes, how many times the slot became inactive unless there is some
monitoring tool that keeps capturing this counter from time to time
and calculates the frequency in some way. Even, if this is useful, it
is not clear to me whether we need to store 'inactive_count' in the
slot's persistent data. I understand it could be a metric required by
the user but wouldn't it be better to track this via
pg_stat_replication_slots such that we don't need to store this in
slot's persist data? If this understanding is correct, I would say
let's remove 'inactive_count' as well from the main patch and discuss
it separately.

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrei Lepikhov
Дата:
Сообщение: Re: POC, WIP: OR-clause support for indexes
Следующее
От: Richard Guo
Дата:
Сообщение: Re: Support run-time partition pruning for hash join