Re: Timeout failure in 019_replslot_limit.pl

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: Timeout failure in 019_replslot_limit.pl
Дата
Msg-id YTa0NubICXtdm0Nz@paquier.xyz
обсуждение исходный текст
Ответ на Re: Timeout failure in 019_replslot_limit.pl  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Timeout failure in 019_replslot_limit.pl  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Список pgsql-hackers
On Mon, Sep 06, 2021 at 12:03:32PM -0400, Tom Lane wrote:
> I scraped the buildfarm logs looking for similar failures, and didn't
> find any.  (019_replslot_limit.pl hasn't failed at all in the farm
> since the last fix it received, in late July.)

The interesting bits are in 019_replslot_limit_primary3.log.  In a
failed run, I can see that we get immediately a process termination,
as follows:
2021-09-07 07:52:53.402 JST [22890] LOG:  terminating process 23082 to release replication slot "rep3"
2021-09-07 07:52:53.442 JST [23082] standby_3 FATAL:  terminating connection due to administrator command
2021-09-07 07:52:53.442 JST [23082] standby_3 STATEMENT:  START_REPLICATION SLOT "rep3" 0/700000 TIMELINE 1
2021-09-07 07:52:53.452 JST [23133] 019_replslot_limit.pl LOG:  statement: SELECT wal_status FROM pg_replication_slots
WHEREslot_name = 'rep3'
 

In a successful run, the pattern is different:
2021-09-07 09:27:39.832 JST [57114] standby_3 FATAL:  terminating connection due to administrator command
2021-09-07 09:27:39.832 JST [57114] standby_3 STATEMENT:  START_REPLICATION SLOT "rep3" 0/700000 TIMELINE 1
2021-09-07 09:27:39.832 JST [57092] LOG:  invalidating slot "rep3" because its restart_lsn 0/7000D8 exceeds
max_slot_wal_keep_size
2021-09-07 09:27:39.833 JST [57092] LOG:  checkpoint complete: wrote
19 buffers (14.8%); 0 WAL file(s) added, 1 removed, 0 recycled;
write=0.025 s, sync=0.001 s, total=0.030 s; sync files=0,
longest=0.000 s, average=0.000 s; distance=1024 kB, estimate=1024 kB
2021-09-07 09:27:39.833 JST [57092] LOG:  checkpoints are occurring too frequently (0 seconds apart)
2021-09-07 09:27:39.833 JST [57092] HINT:  Consider increasing the configuration parameter "max_wal_size".
2021-09-07 09:27:39.851 JST [57126] 019_replslot_limit.pl LOG:  statement: SELECT wal_status FROM pg_replication_slots
WHEREslot_name = 'rep3'
 

The slot invalidation is forgotten because we don't complete a
checkpoint that does the work it should do, no?  There is a completed
checkpoint before we query pg_replication_slots, and the buildfarm
shows the same thing.

> I wonder if Michael's setup had any unusual settings.

The way I use configure and build options has caught bugs with code
ordering in the past, but this one looks like just a timing issue with
the test itself.  I can only see that with Big Sur 11.5.2, and I just
got fresh logs this morning with a new failure, as of the attached.
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Smith
Дата:
Сообщение: Re: Column Filtering in Logical Replication
Следующее
От: Masahiro Ikeda
Дата:
Сообщение: Re: Allow escape in application_name