Re: Introduce XID age and inactive timeout based replication slot invalidation

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Introduce XID age and inactive timeout based replication slot invalidation
Дата
Msg-id CAA4eK1LLj+eaMN-K8oeOjfG+UuzTY=L5PXbcMJURZbFm+_aJSA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Introduce XID age and inactive timeout based replication slot invalidation  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Ответы Re: Introduce XID age and inactive timeout based replication slot invalidation
Список pgsql-hackers
On Sun, Mar 24, 2024 at 3:05 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> I've attached the v18 patch set here. I've also addressed earlier
> review comments from Amit, Ajin Cherian. Note that I've added new
> invalidation mechanism tests in a separate TAP test file just because
> I don't want to clutter or bloat any of the existing files and spread
> tests for physical slots and logical slots into separate existing TAP
> files.
>

Review comments on v18_0002 and v18_0005
=======================================
1.
 ReplicationSlotCreate(const char *name, bool db_specific,
    ReplicationSlotPersistency persistency,
-   bool two_phase, bool failover, bool synced)
+   bool two_phase, bool failover, bool synced,
+   int inactive_timeout)
 {
  ReplicationSlot *slot = NULL;
  int i;
@@ -345,6 +348,18 @@ ReplicationSlotCreate(const char *name, bool db_specific,
  errmsg("cannot enable failover for a temporary replication slot"));
  }

+ if (inactive_timeout > 0)
+ {
+ /*
+ * Do not allow users to set inactive_timeout for temporary slots,
+ * because temporary slots will not be saved to the disk.
+ */
+ if (persistency == RS_TEMPORARY)
+ ereport(ERROR,
+ errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot set inactive_timeout for a temporary replication slot"));
+ }

We have decided to update inactive_since for temporary slots. So,
unless there is some reason, we should allow inactive_timeout to also
be set for temporary slots.

2.
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1024,6 +1024,7 @@ CREATE VIEW pg_replication_slots AS
             L.safe_wal_size,
             L.two_phase,
             L.last_inactive_time,
+            L.inactive_timeout,

Shall we keep inactive_timeout before
last_inactive_time/inactive_since? I don't have any strong reason to
propose that way apart from that the former is provided by the user.

3.
@@ -287,6 +288,13 @@ pg_get_replication_slots(PG_FUNCTION_ARGS)
  slot_contents = *slot;
  SpinLockRelease(&slot->mutex);

+ /*
+ * Here's an opportunity to invalidate inactive replication slots
+ * based on timeout, so let's do it.
+ */
+ if (InvalidateReplicationSlotForInactiveTimeout(slot, false, true, true))
+ invalidated = true;

I don't think we should try to invalidate the slots in
pg_get_replication_slots. This function's purpose is to get the
current information on slots and has no intention to perform any work
for slots. Any error due to invalidation won't be what the user would
be expecting here.

4.
+static bool
+InvalidateSlotForInactiveTimeout(ReplicationSlot *slot,
+ bool need_control_lock,
+ bool need_mutex)
{
...
...
+ if (need_control_lock)
+ LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);
+
+ Assert(LWLockHeldByMeInMode(ReplicationSlotControlLock, LW_SHARED));
+
+ /*
+ * Check if the slot needs to be invalidated due to inactive_timeout. We
+ * do this with the spinlock held to avoid race conditions -- for example
+ * the restart_lsn could move forward, or the slot could be dropped.
+ */
+ if (need_mutex)
+ SpinLockAcquire(&slot->mutex);
...

I find this combination of parameters a bit strange. Because, say if
need_mutex is false and need_control_lock is true then that means this
function will acquire LWlock after acquiring spinlock which is
unacceptable. Now, this may not happen in practice as the callers
won't pass such a combination but still, this functionality should be
improved.

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bertrand Drouvot
Дата:
Сообщение: Re: Introduce XID age and inactive timeout based replication slot invalidation
Следующее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Add new error_action COPY ON_ERROR "log"