Re: more descriptive message for process termination due to max_slot_wal_keep_size

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: more descriptive message for process termination due to max_slot_wal_keep_size
Дата
Msg-id 20211215.131218.807653512113171629.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: more descriptive message for process termination due to max_slot_wal_keep_size  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Ответы Re: more descriptive message for process termination due to max_slot_wal_keep_size  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Список pgsql-hackers
At Tue, 14 Dec 2021 19:31:21 +0530, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote in 
> On Tue, Dec 14, 2021 at 9:35 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > > [17605] LOG:  terminating process 17614 to release replication slot "s1"
> > + [17605] DETAIL:  The slot's restart_lsn 0/2C0000A0 exceeds max_slot_wal_keep_size.
> > > [17614] FATAL:  terminating connection due to administrator command
> > > [17605] LOG:  invalidating slot "s1" because its restart_lsn 0/2C0000A0 exceeds max_slot_wal_keep_size
> >
> > Somewhat the second and fourth lines look inconsistent each other but
> > that wouldn't be such a problem.  I don't think we want to concatenate
> > the two lines together as the result is a bit too long.
> >
> > > LOG:  terminating process 17614 to release replication slot "s1" because it's restart_lsn 0/2C0000A0 exceeds
max_slot_wal_keep_size.
> >
> > What do you think about this?
> 
> Agree. I think we should also specify the restart_lsn value which
> would be within max_slot_wal_keep_size for better understanding.

Thanks!  It seems to me the main message of the "invalidating" log has
no room for further detail.  So I split the reason out to DETAILS line
the same way with the "terminating" message in the attached second
patch. (It is separated from the first patch just for review) I
believe someone can make the DETAIL message simpler or more natural.

The attached patch set emits the following message.

> LOG:  invalidating slot "s1"
> DETAIL:  The slot's restart_lsn 0/10000D68 is behind the limit 0/11000000 defined by max_slot_wal_keep_size.

The second line could be changed like the following or anything other.

> DETAIL:  The slot's restart_lsn 0/10000D68 got behind the limit 0/11000000 determined by max_slot_wal_keep_size.
.....

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 6909d854a0d48f2504cbb6dfbc7eb571fd32bcd8 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Tue, 14 Dec 2021 10:50:00 +0900
Subject: [PATCH v2 1/2] Make an error message about process termination more
 descriptive

If checkpointer kills a process due to a temporary replication slot
exceeding max_slot_wal_keep_size, the messages fails to describe the
cause of the termination.  It is finally shown for persistent slots in
a subsequent message, but that message doesn't show for temporary
slots.  It's better to attach an explainaion to the termination
message directly.

Reported-by: Alex Enachioaie <alex@altmetric.com>
Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org
---
 src/backend/replication/slot.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 90ba9b417d..cba9a29113 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1254,7 +1254,8 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
             {
                 ereport(LOG,
                         (errmsg("terminating process %d to release replication slot \"%s\"",
-                                active_pid, NameStr(slotname))));
+                                active_pid, NameStr(slotname)),
+                         errdetail("The slot's restart_lsn %X/%X exceeds max_slot_wal_keep_size.",
LSN_FORMAT_ARGS(restart_lsn))));
 
                 (void) kill(active_pid, SIGTERM);
                 last_signaled_pid = active_pid;
-- 
2.27.0

From 265a84fa6a20ee7c5178ace41096df1a4b00465c Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Wed, 15 Dec 2021 13:08:20 +0900
Subject: [PATCH v2 2/2] Make the message further detailed

---
 src/backend/replication/slot.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index cba9a29113..695151e484 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1255,7 +1255,9 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
                 ereport(LOG,
                         (errmsg("terminating process %d to release replication slot \"%s\"",
                                 active_pid, NameStr(slotname)),
-                         errdetail("The slot's restart_lsn %X/%X exceeds max_slot_wal_keep_size.",
LSN_FORMAT_ARGS(restart_lsn))));
+                         errdetail("The slot's restart_lsn %X/%X is behind the limit %X/%X defined by
max_slot_wal_keep_size.",
+                                   LSN_FORMAT_ARGS(restart_lsn),
+                                   LSN_FORMAT_ARGS(oldestLSN))));
 
                 (void) kill(active_pid, SIGTERM);
                 last_signaled_pid = active_pid;
@@ -1292,9 +1294,11 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
             ReplicationSlotRelease();
 
             ereport(LOG,
-                    (errmsg("invalidating slot \"%s\" because its restart_lsn %X/%X exceeds max_slot_wal_keep_size",
-                            NameStr(slotname),
-                            LSN_FORMAT_ARGS(restart_lsn))));
+                    (errmsg("invalidating slot \"%s\"",
+                            NameStr(slotname)),
+                     errdetail("The slot's restart_lsn %X/%X is behind the limit %X/%X defined by
max_slot_wal_keep_size.",
+                            LSN_FORMAT_ARGS(restart_lsn),
+                            LSN_FORMAT_ARGS(oldestLSN))));
 
             /* done with this slot for now */
             break;
-- 
2.27.0


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Skipping logical replication transactions on subscriber side
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Skipping logical replication transactions on subscriber side