more descriptive message for process termination due to max_slot_wal_keep_size

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема more descriptive message for process termination due to max_slot_wal_keep_size
Дата
Msg-id 20211214.130456.2233153190058148084.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответы Re: more descriptive message for process termination due to max_slot_wal_keep_size  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Re: more descriptive message for process termination due to max_slot_wal_keep_size  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Список pgsql-hackers
Hello.

As complained in pgsql-bugs [1], when a process is terminated due to
max_slot_wal_keep_size, the related messages don't mention the root
cause for *the termination*.  Note that the third message does not
show for temporary replication slots.

[pid=a] LOG:  terminating process x to release replication slot "s"
[pid=x] LOG:  FATAL:  terminating connection due to administrator command
[pid=a] LOG:  invalidting slot "s" because its restart_lsn X/X exceeds max_slot_wal_keep_size

The attached patch attaches a DETAIL line to the first message.

> [17605] LOG:  terminating process 17614 to release replication slot "s1"
+ [17605] DETAIL:  The slot's restart_lsn 0/2C0000A0 exceeds max_slot_wal_keep_size.
> [17614] FATAL:  terminating connection due to administrator command
> [17605] LOG:  invalidating slot "s1" because its restart_lsn 0/2C0000A0 exceeds max_slot_wal_keep_size

Somewhat the second and fourth lines look inconsistent each other but
that wouldn't be such a problem.  I don't think we want to concatenate
the two lines together as the result is a bit too long.

> LOG:  terminating process 17614 to release replication slot "s1" because it's restart_lsn 0/2C0000A0 exceeds
max_slot_wal_keep_size.

What do you think about this?

[1] https://www.postgresql.org/message-id/20211214.101137.379073733372253470.horikyota.ntt%40gmail.com

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From b0c27dc80aff37ef984592b79f1dd20d052299fa Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Tue, 14 Dec 2021 10:50:00 +0900
Subject: [PATCH] Make an error message about process termination more
 descriptive

If checkpointer kills a process due to a temporary replication slot
exceeding max_slot_wal_keep_size, the messages fails to describe the
cause of the termination.  It is because the message that describes
the reason that is emitted for persistent slots does not show for
temporary slots.  Add a DETAIL line to the message common to all types
of slot to describe the cause.

Reported-by: Alex Enachioaie <alex@altmetric.com>
Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org
---
 src/backend/replication/slot.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 90ba9b417d..cba9a29113 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1254,7 +1254,8 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
             {
                 ereport(LOG,
                         (errmsg("terminating process %d to release replication slot \"%s\"",
-                                active_pid, NameStr(slotname))));
+                                active_pid, NameStr(slotname)),
+                         errdetail("The slot's restart_lsn %X/%X exceeds max_slot_wal_keep_size.",
LSN_FORMAT_ARGS(restart_lsn))));
 
                 (void) kill(active_pid, SIGTERM);
                 last_signaled_pid = active_pid;
-- 
2.27.0


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: Adding CI to our tree
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Adding CI to our tree