Re: Strange failure on mamba

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Strange failure on mamba
Дата
Msg-id 2051761.1668722889@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Strange failure on mamba  (Thomas Munro <thomas.munro@gmail.com>)
Ответы Re: Strange failure on mamba
Список pgsql-hackers
Thomas Munro <thomas.munro@gmail.com> writes:
> I wonder why the walreceiver didn't start in
> 008_min_recovery_point_node_3.log here:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2022-11-16%2023%3A13%3A38

mamba has been showing intermittent failures in various replication
tests since day one.  My guess is that it's slow enough to be
particularly subject to the signal-handler race conditions that we
know exist in walreceivers and elsewhere.  (Now, it wasn't any faster
in its previous incarnation as a macOS critter.  But maybe modern
NetBSD has different scheduler behavior than ancient macOS and that
contributes somehow.  Or maybe there's some other NetBSD weirdness
in here.)

I've tried to reproduce manually, without much success :-(

Like many of its other failures, there's a suggestive postmaster
log entry at the very end:

2022-11-16 19:45:53.851 EST [2036:4] LOG:  received immediate shutdown request
2022-11-16 19:45:58.873 EST [2036:5] LOG:  issuing SIGKILL to recalcitrant children
2022-11-16 19:45:58.881 EST [2036:6] LOG:  database system is shut down

So some postmaster child is stuck somewhere where it's not responding
to SIGQUIT.  While it's not unreasonable to guess that that's a
walreceiver, there's no hard evidence of it here.  I've been wondering
if it'd be worth patching the postmaster so that it's a bit more verbose
about which children it had to SIGKILL.  I've also wondered about
changing the SIGKILL to SIGABRT in hopes of reaping a core file that
could be investigated.

            regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Cary Huang
Дата:
Сообщение: Patch: Global Unique Index
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: Fix proposal for comparaison bugs in PostgreSQL::Version