Re: more descriptive message for process termination due to max_slot_wal_keep_size

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: more descriptive message for process termination due to max_slot_wal_keep_size
Дата
Msg-id 20211224.134214.1646603596386019768.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: more descriptive message for process termination due to max_slot_wal_keep_size  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Ответы Re: more descriptive message for process termination due to max_slot_wal_keep_size  (Masahiko Sawada <sawada.mshk@gmail.com>)
Список pgsql-hackers
At Thu, 23 Dec 2021 18:08:08 +0530, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote in 
> On Wed, Dec 15, 2021 at 9:42 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > > LOG:  invalidating slot "s1"
> > > DETAIL:  The slot's restart_lsn 0/10000D68 is behind the limit 0/11000000 defined by max_slot_wal_keep_size.
> >
> > The second line could be changed like the following or anything other.
> >
> > > DETAIL:  The slot's restart_lsn 0/10000D68 got behind the limit 0/11000000 determined by max_slot_wal_keep_size.
> > .....
> >
> 
> The second version looks better as it gives more details. I am fine
> with either of the above wordings.
> 
> I would prefer everything in the same message though since
> "invalidating slot ..." is too short a LOG message. Not everybody
> enabled details always.

Mmm. Right. I have gone too much to the same way with the
process-termination message.

I rearranged the meesages as follows in the attached version. (at master)

> LOG:  terminating process %d to release replication slot \"%s\" because its restart_lsn %X/%X exceeds
max_slot_wal_keep_size
> DETAIL:  The slot got behind the limit %X/%X determined by max_slot_wal_keep_size.

> LOG:  invalidating slot \"%s\" because its restart_LSN %X/%X exceeds max_slot_wal_keep_size
c> DETAIL:  The slot got behind the limit %X/%X determined by max_slot_wal_keep_size.

The messages is actually incomplete even in 13 so I think the change
to the errmsg() message of the first message is worth back-patching.

- v3-0001-Make-a-message-on-process-termination-more-dscrip.patch

  Changes only the first main message and it can be back-patched to 14. 

- v3-0001-Make-a-message-on-process-termination-more-dscrip_13.patch

  The same to the above but for 13, which doesn't have LSN_FORMAT_ARGS.

- v3-0002-Add-detailed-information-to-slot-invalidation-mes.patch

  Attaches the DETAIL line shown above to both messages, only for the
  master.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 4bfbb58801e9d2a0e0ab8320dd95bc850a0a397b Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Fri, 24 Dec 2021 12:52:07 +0900
Subject: [PATCH v3 1/2] Make a message on process termination more dscriptive

The message at process termination due to slot limit doesn't provide
the reason. In the major scenario the message is followed by another
message about slot invalidatation, which shows the detail for the
termination.  However the second message is missing if the slot is
temporary one.

Augment the first message with the reason same as the second message.

Backpatch through to 13 where the message was introduced.

Reported-by: Alex Enachioaie <alex@altmetric.com>
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org
Backpatch-through: 13
---
 src/backend/replication/slot.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 90ba9b417d..0ceac3a54d 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1253,8 +1253,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
             if (last_signaled_pid != active_pid)
             {
                 ereport(LOG,
-                        (errmsg("terminating process %d to release replication slot \"%s\"",
-                                active_pid, NameStr(slotname))));
+                        (errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn
%X/%Xexceeds max_slot_wal_keep_size",
 
+                                active_pid, NameStr(slotname),
+                                LSN_FORMAT_ARGS(restart_lsn))));
 
                 (void) kill(active_pid, SIGTERM);
                 last_signaled_pid = active_pid;
-- 
2.27.0

From 444b3bb64ee907280606696e0cb90f6b022c9cd6 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Fri, 24 Dec 2021 13:23:54 +0900
Subject: [PATCH v3] Make a message on process termination more dscriptive

The message at process termination due to slot limit doesn't provide
the reason. In the major scenario the message is followed by another
message about slot invalidatation, which shows the detail for the
termination.  However the second message is missing if the slot is
temporary one.

Augment the first message with the reason same as the second message.

Backpatch through to 13 where the message was introduced.

Reported-by: Alex Enachioaie <alex@altmetric.com>
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org
Backpatch-through: 13
---
 src/backend/replication/slot.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 02047ea920..15b8934ae2 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1228,8 +1228,10 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
             if (last_signaled_pid != active_pid)
             {
                 ereport(LOG,
-                        (errmsg("terminating process %d to release replication slot \"%s\"",
-                                active_pid, NameStr(slotname))));
+                        (errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn
%X/%Xexceeds max_slot_wal_keep_size",
 
+                                active_pid, NameStr(slotname),
+                                (uint32) (restart_lsn >> 32),
+                                (uint32) restart_lsn)));
 
                 (void) kill(active_pid, SIGTERM);
                 last_signaled_pid = active_pid;
-- 
2.27.0

From 4d37e4e6f2af557779d970f5fe4dba98c937ed49 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Fri, 24 Dec 2021 12:58:25 +0900
Subject: [PATCH v3 2/2] Add detailed information to slot-invalidation messages

The messages says just "exceeds max_slot_wal_keep_size" but doesn't
tell the concrete LSN of the limit. That information helps operators'
understanding on the issue.

Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org
---
 src/backend/replication/slot.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 0ceac3a54d..2b394fe0d7 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1255,7 +1255,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
                 ereport(LOG,
                         (errmsg("terminating process %d to release replication slot \"%s\" because its restart_lsn
%X/%Xexceeds max_slot_wal_keep_size",
 
                                 active_pid, NameStr(slotname),
-                                LSN_FORMAT_ARGS(restart_lsn))));
+                                LSN_FORMAT_ARGS(restart_lsn)),
+                         errdetail("The slot got behind the limit %X/%X determined by max_slot_wal_keep_size.",
+                                   LSN_FORMAT_ARGS(oldestLSN))));
 
                 (void) kill(active_pid, SIGTERM);
                 last_signaled_pid = active_pid;
@@ -1294,7 +1296,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
             ereport(LOG,
                     (errmsg("invalidating slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size",
                             NameStr(slotname),
-                            LSN_FORMAT_ARGS(restart_lsn))));
+                            LSN_FORMAT_ARGS(restart_lsn)),
+                     errdetail("The slot got behind the limit %X/%X determined by max_slot_wal_keep_size.",
+                               LSN_FORMAT_ARGS(oldestLSN))));
 
             /* done with this slot for now */
             break;
-- 
2.27.0


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Shinya Kato
Дата:
Сообщение: Re: Emit a warning if the extension's GUC is set incorrectly
Следующее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: Allow escape in application_name