Immediate shutdown and system(3)

Поиск
Список
Период
Сортировка
We're using SIGQUIT to signal immediate shutdown request. Upon receiving 
SIGQUIT, postmaster in turn kills all the child processes with SIGQUIT 
and exits.

This is a problem when child processes use system(3) to call other 
programs. We use system(3) in two places: to execute archive_command and 
restore_command. Fujii Masao identified this with pg_standby back in 
November:

http://archives.postgresql.org/message-id/3f0b79eb0811280156s78a3730en73aca49b6e95d3cb@mail.gmail.com
and recently discussed here
http://archives.postgresql.org/message-id/3f0b79eb0902260919l2675aaafq10e5b2d49ebfa3a1@mail.gmail.com

I'm starting a new thread to bring this to attention of those who 
haven't been following the hot standby stuff. pg_standby has a 
particular problem because it traps SIGQUIT to mean "end recovery, 
promote standby to master", which it shouldn't do IMHO. But ignoring 
that for a moment, the problem is generic.

SIGQUIT by default dumps core. That's not what we want to happen on 
immediate shutdown. All PostgreSQL processes trap SIGQUIT to exit 
immediately instead, but external commands will dump core. system(3) 
ignores SIGQUIT, so we can't trap it in the parent process; it is always 
relayed to the child.

There's a few options on how to fix that:

1. Implement a custom version of system(3) using fork+exec that let's us 
trap SIGQUIT and send e.g SIGTERM or SIGINT to the child instead. It 
might be a bit tricky to get this right in a portable way; Windows would 
certainly need a completely separate implementation.

2. Use a signal other than SIGQUIT for immediate shutdown of child 
processes. We can't change the signal sent to postmaster for 
backwards-compatibility reasons, but the signal sent by postmaster to 
child processes we could change. We've already used all signals in 
normal backends, but perhaps we could rearrange them.

3. Use SIGINT instead of SIGQUIT for immediate shutdown of the two child 
processes that use system(3): the archiver process and the startup 
process. Neither of them use SIGINT currently. SIGINT is ignored by 
system(3), like SIGQUIT, but the default action is to terminate the 
process rather than core dump. Unfortunately pg_standby traps SIGINT too 
to mean "promote to master", but we could change it to use SIGUSR1 
instead for that purpose. If someone has a script that uses "killall 
-INT pg_standby" to promote a standby server to master, it would need to 
be changed. Looking at the manual page of pg_standby, however, it seems 
that the kill-method of triggering a promotion isn't documented, so with 
a notice in release notes we could do that.

I'm leaning towards option 3, but I wonder if anyone sees a better solution.

This is all for CVS HEAD. In back-branches, I think we should just 
remove the signal handler for SIGQUIT from pg_standby and leave it at 
that. If you perform an immediate shutdown, you can get a core dump from 
archive_command or restore_command, but that's a minor inconvenience.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dave Page
Дата:
Сообщение: Re: RE: [HACKERS] Kerberos V5 required for PostgreSQL installation on Windows
Следующее
От: Zeugswetter Andreas OSB sIT
Дата:
Сообщение: Re: Service not starting: Error 1053