Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?
Дата
Msg-id 20211215.120157.422744426665374116.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Ответы Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Список pgsql-hackers
At Tue, 14 Dec 2021 19:04:09 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> On Fri, Dec 3, 2021 at 7:39 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Wed, Dec 1, 2021 at 9:50 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > > I think the PIDs are log-worthy for sure, but it's not clear to me that
> > > it is desirable to write them to the persistent state file.  In case of
> > > crashes, the log should serve just fine to aid root cause investigation
> > > -- in fact even better than the persistent file, where the data would be
> > > lost as soon as the next client acquires that slot.
> >
> > Thanks. +1 to log a message at LOG level whenever a replication slot
> > becomes active (gets assigned a valid pid to active_pid) and becomes
> > inactive(gets assigned 0 to active_pid). Having said that, isn't it
> > also helpful if we write a bool (1 byte character) whenever the slot
> > becomes active and inactive to the disk?
> 
> Here's the patch that adds a LOG message whenever a replication slot
> becomes active and inactive. These logs will be extremely useful on
> production servers to debug and analyze inactive replication slot
> issues.
> 
> Thoughts?

If I create a replication slot, I saw the following lines in server log.

[22054:client backend] LOG:  replication slot "s1" becomes active
[22054:client backend] DETAIL:  The process with PID 22054 acquired it.
[22054:client backend] STATEMENT:  select pg_drop_replication_slot('s1');
[22054:client backend] LOG:  replication slot "s1" becomes inactive
[22054:client backend] DETAIL:  The process with PID 22054 released it.
[22054:client backend] STATEMENT:  select pg_drop_replication_slot('s1');

They are apparently too much as if they were DEBUG3 lines.  The
process PID shown is of the process the slot operations took place so
I think it conveys no information.  The STATEMENT lines are also noisy
for non-ERROR emssages.  Couldn't we hide that line?

That is, how about making the log lines as simple as the follows?

[17156:walsender]  LOG:  acquired replication slot "s1"
[17156:walsender]  LOG:  released replication slot "s1"

I think in the first place we don't need this log lines at slot
creation since it is actually not acquirement nor releasing of a slot.


It behaves in a strange way when executing pg_basebackup.

[22864:walsender] LOG:  replication slot "pg_basebackup_22864" becomes active
[22864:walsender] DETAIL:  The process with PID 22864 acquired it.
[22864:walsender] STATEMENT:  CREATE_REPLICATION_SLOT "pg_basebackup_22864" TEMPORARY PHYSICAL ( RESERVE_WAL)
[22864:walsender] LOG:  replication slot "pg_basebackup_22864" becomes active
[22864:walsender] DETAIL:  The process with PID 22864 acquired it.
[22864:walsender] STATEMENT:  START_REPLICATION SLOT "pg_basebackup_22864" 0/6000000 TIMELINE 1
[22864:walsender] LOG:  replication slot "pg_basebackup_22864" becomes inactive
[22864:walsender] DETAIL:  The process with PID 22864 released it.

The slot is acquired twice then released once.  It is becuase the
patch doesn't emit "becomes inactive" line when releasing a temporary
slot. However, I'd rather think we don't need the first 'become
active' line like the previous example.


@@ -658,6 +690,13 @@ ReplicationSlotDropPtr(ReplicationSlot *slot)
     slot->active_pid = 0;
     slot->in_use = false;
     LWLockRelease(ReplicationSlotControlLock);
+
+    if (pid > 0)
+        ereport(LOG,
+                (errmsg("replication slot \"%s\" becomes inactive",
+                        NameStr(slot->data.name)),
+                 errdetail("The process with PID %d released it.", pid)));
+

This is wrong.  I see a "become inactive" message if I droped an
"inactive" replication slot.  The reason the inactive slot looks as if
it were acquired is it is temporarily aquired as a preparing step of
dropping.


Even assuming that the log lines are simplified to this extent, I
still see it a bit strange that the "becomes acitve (or acruied)"
message shows alone without having a message like "replication
connection accepted".  But that would be another issue even if it is
true.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: parallel vacuum comments
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Confused comment about drop replica identity index