Re: SIGQUIT on archiver child processes maybe not such a hot idea?

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: SIGQUIT on archiver child processes maybe not such a hot idea?
Дата
Msg-id 20190902065134.GE1841@paquier.xyz
обсуждение исходный текст
Ответ на RE: SIGQUIT on archiver child processes maybe not such a hot idea?  ("Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com>)
Ответы Re: SIGQUIT on archiver child processes maybe not such a hot idea?  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Re: SIGQUIT on archiver child processes maybe not such a hot idea?  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-hackers
On Mon, Sep 02, 2019 at 12:27:09AM +0000, Tsunakawa, Takayuki wrote:
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>> After investigation, the mechanism that's causing that is that the
>> src/test/recovery/t/010_logical_decoding_timelines.pl test shuts
>> down its replica server with a mode-immediate stop, which causes
>> that postmaster to shut down all its children with SIGQUIT, and
>> in particular that signal propagates to a "cp" command that the
>> archiver process is executing.  The "cp" is unsurprisingly running
>> with default SIGQUIT handling, which per the signal man page
>> includes dumping core.
>
> We've experienced this (core dump in the data directory by an
> archive command) years ago.  Related to this, the example of using
> cp in the PostgreSQL manual is misleading, because cp doesn't
> reliably persist the WAL archive file.

The previous talks about having pg_copy are still where they were a
couple of years ago as we did not agree on which semantics it should
have.  If we could move forward with that and update the documentation
from its insanity that would be great and...  The signal handling is
something else we could customize in a more favorable way with the
archiver.  Anyway, switching from something else than SIGQUIT to stop
the archiver will not prevent any other tools from generating core
dumps with this other signal.

> We enable the core dump in production to help the investigation just in case.

So do I in some of the stuff I work on.

> some_command also catches SIGQUIT just exit.  It copies and syncs the file.
>
> I proposed something in this line as below, but I couldn't respond to Peter's review comments due to other tasks.
Doesanyone think it's worth resuming this? 
>
> https://www.postgresql.org/message-id/7E37040CF3804EA5B018D7A022822984@maumau

And I was looking for this thread a couple of lines ago :)
Thanks.
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: pg_basebackup -F t fails when fsync spends more time thantcp_user_timeout
Следующее
От: Jeevan Ladhe
Дата:
Сообщение: Re: block-level incremental backup