Re: BUG #17327: Postgres server does not correctly emit error for max_slot_wal_keep_size being breached

Поиск
Список
Период
Сортировка
От Alex Enachioaie
Тема Re: BUG #17327: Postgres server does not correctly emit error for max_slot_wal_keep_size being breached
Дата
Msg-id FGS14R.K3L8GOS0U6ZQ1@altmetric.com
обсуждение исходный текст
Ответ на Re: BUG #17327: Postgres server does not correctly emit error for max_slot_wal_keep_size being breached  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Ответы Re: BUG #17327: Postgres server does not correctly emit error for max_slot_wal_keep_size being breached  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список pgsql-bugs
Hello Kyotaro,

Understood, that makes sense re: invalidation and what I assumed might be happening.

I think I'm happy to leave the method of resolution up to you, I think the main point for me would be that when a 
replication process gets terminated as a consequence of the underlying temporary replication slot reaching max_slot_wal_keep_size
that we log a specific message to indicate to the user the cause of the termination rather than leave it ambiguous.

Thank you

King regards

Alex E
Senior Site Reliability Engineer
Altmetric

On Mon, Dec 13 2021 at 14:44:42 +0900, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
At Fri, 10 Dec 2021 15:46:11 +0000, Alex Enachioaie <alex@altmetric.com> wrote in
So, essentially the server side log emmitted on a temporary replication breaching max_slot_wal_keep_size limit is only: 2021-12-03 16:21:54 UTC [29724-2647] LOG: terminating process 42601 to release replication slot "pg_basebackup_42601" whereas for a persistent replication slot we get an additional line that clearly states _why_ the replication process was terminated: 2021-12-03 00:57:16 UTC [29724-2645] LOG: terminating process 3899 to release replication slot "backup" 2021-12-03 00:57:16 UTC [29724-2646] LOG: invalidating slot "backup" because its restart_lsn 47198/1E000000 exceeds max_slot_wal_keep_size I'm not sure if this means that in the case of a temporary slot it does not get invalidated at all (I've not looked at the code), or it's simply that we don't emit a log message when it does because the slot would be discarded anyway, but such a message would be very useful for diagnostic purposes imo.
The "invalidating slot" message is emitted when the slot needs to be invalidated, that is, when the slot persists after the user process is terminated. Thus the message cannot be seen for temporary slots since they are removed at process termination and no longer exist after that. At Wed, 08 Dec 2021 11:23:35 +0000, PG Bug reporting form <noreply@postgresql.org> wrote in
The core issue here then in our opinion is that Postgres server should log an error when the max_slot_wal_keep_size limit is reached for temporary replication slots as well as for permanent ones as otherwise users/administrators are presented only with non-descript connection termination errors which do not point to the actual cause of the problem.
If you mean the "invalidating slot" message by "an error", that wouldn't happen since invalidation is actually doesn't happen. Or, we could change the message like this. Does this make sense for you?
LOG: terminating process 42601 to release temporary replication slot "pg_basebackup_42601" DETAIL: The slot will be dropped by the process termination.
LOG: terminating process 3899 to release persistent replication slot "backup"
...
LOG: invalidating slot "backup" because its restart_lsn 47198/1E000000 exceeds max_slot_wal_keep_size
regards.
--
Kyotaro Horiguchi NTT Open Source Software Center

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: BUG #17334: Assert failed inside computeDistance() on gist index scanning
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: BUG #17326: Postgres crashed when pg_reload_conf() with ssl certificate parameters