Re: Introduce XID age and inactive timeout based replication slot invalidation
От | Bertrand Drouvot |
---|---|
Тема | Re: Introduce XID age and inactive timeout based replication slot invalidation |
Дата | |
Msg-id | ZfldthofFLdPibEM@ip-10-97-1-34.eu-west-3.compute.internal обсуждение исходный текст |
Ответ на | Re: Introduce XID age and inactive timeout based replication slot invalidation (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
Re: Introduce XID age and inactive timeout based replication slot invalidation
|
Список | pgsql-hackers |
Hi, On Tue, Mar 19, 2024 at 10:56:25AM +0530, Amit Kapila wrote: > On Mon, Mar 18, 2024 at 8:19 PM Bertrand Drouvot > <bertranddrouvot.pg@gmail.com> wrote: > > Agree. While it makes sense to invalidate slots for wal removal in > > CreateCheckPoint() (because this is the place where wal is removed), I 'm not > > sure this is the right place for the 2 new cases. > > > > Let's focus on the timeout one as proposed above (as probably the simplest one): > > as this one is purely related to time and activity what about to invalidate them > > when?: > > > > - their usage resume > > - in pg_get_replication_slots() > > > > The idea is to invalidate the slot when one resumes activity on it or wants to > > get information about it (and among other things wants to know if the slot is > > valid or not). > > > > Trying to invalidate at those two places makes sense to me but we > still need to cover the cases where it takes very long to resume the > slot activity and the dangling slot cases where the activity is never > resumed. I understand it's better to have the slot reflecting its real status internally but it is a real issue if that's not the case until the activity on it is resumed? (just asking, not saying we should not) > How about apart from the above two places, trying to > invalidate in CheckPointReplicationSlots() where we are traversing all > the slots? I think that's a good place but there is still a window of time (that could also be "large" depending of the activity and the checkpoint frequency) during which the slot is not known as invalid internally. But yeah, at leat we know that we'll mark it as invalid at some point... BTW: if (am_walsender) { + if (slot->data.persistency == RS_PERSISTENT) + { + SpinLockAcquire(&slot->mutex); + slot->data.last_inactive_at = GetCurrentTimestamp(); + slot->data.inactive_count++; + SpinLockRelease(&slot->mutex); I'm also feeling the same concern as Shveta mentioned in [1]: that a "normal" backend using pg_logical_slot_get_changes() or friends would not set the last_inactive_at. [1]: https://www.postgresql.org/message-id/CAJpy0uD64X%3D2ENmbHaRiWTKeQawr-rbGoy_GdhQQLVXzUSKTMg%40mail.gmail.com Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
В списке pgsql-hackers по дате отправления: