Re: What is happening on buildfarm member crake?

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: What is happening on buildfarm member crake?
Дата
Msg-id 10013.1390687480@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: What is happening on buildfarm member crake?  (Andrew Dunstan <andrew@dunslane.net>)
Ответы Re: What is happening on buildfarm member crake?  (Andrew Dunstan <andrew@dunslane.net>)
Re: What is happening on buildfarm member crake?  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Andrew Dunstan <andrew@dunslane.net> writes:
> On 01/19/2014 08:22 PM, Robert Haas wrote:
>> Hmm, that looks an awful lot like the SIGUSR1 signal handler is
>> getting called after we've already completed shmem_exit.  And indeed
>> that seems like the sort of thing that would result in dying horribly
>> in just this way.  The obvious fix seems to be to check
>> proc_exit_inprogress before doing anything that might touch shared
>> memory, but there are a lot of other SIGUSR1 handlers that don't do
>> that either.  However, in those cases, the likely cause of a SIGUSR1
>> would be a sinval catchup interrupt or a recovery conflict, which
>> aren't likely to be so far delayed that they arrive after we've
>> already disconnected from shared memory.  But the dynamic background
>> workers stuff adds a new possible cause of SIGUSR1: the postmaster
>> letting us know that a child has started or died.  And that could
>> happen even after we've detached shared memory.

> Is anything happening about this? We're still getting quite a few of 
> these: 
> <http://www.pgbuildfarm.org/cgi-bin/show_failures.pl?max_days=3&member=crake>

Yeah.  If Robert's diagnosis is correct, and it sounds pretty plausible,
then this is really just one instance of a bug that's probably pretty
widespread in our signal handlers.  Somebody needs to go through 'em
all and look for touches of shared memory.

I'm not sure if we can just disable signal response the moment the
proc_exit_inprogress flag goes up, though.  In some cases such as lock
handling, it's likely that we need that functionality to keep working
for some part of the shutdown process.  We might end up having to disable
individual signal handlers at appropriate places.

Ick.
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Florian Pflug
Дата:
Сообщение: Re: [PATCH] Negative Transition Aggregate Functions (WIP)
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Storing pg_stat_statements query texts externally, pg_stat_statements in core