RE: SIGQUIT on archiver child processes maybe not such a hot idea?

Поиск
Список
Период
Сортировка
От Tsunakawa, Takayuki
Тема RE: SIGQUIT on archiver child processes maybe not such a hot idea?
Дата
Msg-id 0A3221C70F24FB45833433255569204D1FD0B676@G01JPEXMBYT05
обсуждение исходный текст
Ответ на SIGQUIT on archiver child processes maybe not such a hot idea?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: SIGQUIT on archiver child processes maybe not such a hot idea?  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> After investigation, the mechanism that's causing that is that the
> src/test/recovery/t/010_logical_decoding_timelines.pl test shuts
> down its replica server with a mode-immediate stop, which causes
> that postmaster to shut down all its children with SIGQUIT, and
> in particular that signal propagates to a "cp" command that the
> archiver process is executing.  The "cp" is unsurprisingly running
> with default SIGQUIT handling, which per the signal man page
> includes dumping core.

We've experienced this (core dump in the data directory by an archive command) years ago.  Related to this, the example
ofusing cp in the PostgreSQL manual is misleading, because cp doesn't reliably persist the WAL archive file.
 


> This makes me wonder whether we shouldn't be using some other signal
> to shut down archiver subprocesses.  It's not real cool if we're
> spewing cores all over the place.  Admittedly, production servers
> are likely running with "ulimit -c 0" on most modern platforms,
> so this might not be a huge problem in the field; but accumulation
> of core files could be a problem anywhere that's configured to allow
> server core dumps.

We enable the core dump in production to help the investigation just in case.


> Ideally, perhaps, we'd be using SIGINT not SIGQUIT to shut down
> non-Postgres child processes.  But redesigning the system's signal
> handling to make that possible seems like a bit of a mess.
> 
> Thoughts?

We're using a shell script and a command that's called in the shell script.  That is:

archive_command = 'call some_shell_script.sh ...'

[some_shell_script.sh]
ulimit -c 0
trap SIGQUIT to just exit on the receipt of the signal
call some_command to copy file

some_command also catches SIGQUIT just exit.  It copies and syncs the file.

I proposed something in this line as below, but I couldn't respond to Peter's review comments due to other tasks.  Does
anyonethink it's worth resuming this?
 

https://www.postgresql.org/message-id/7E37040CF3804EA5B018D7A022822984@maumau


Regards
Takayuki Tsunakawa






В списке pgsql-hackers по дате отправления:

Предыдущее
От: Euler Taveira
Дата:
Сообщение: Re: row filtering for logical replication
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: refactoring - share str2*int64 functions