Re: SIGQUIT on archiver child processes maybe not such a hot idea?

Поиск
Список
Период
Сортировка
От Kyotaro Horiguchi
Тема Re: SIGQUIT on archiver child processes maybe not such a hot idea?
Дата
Msg-id 20190902.183153.120412900.horikyota.ntt@gmail.com
обсуждение исходный текст
Ответ на Re: SIGQUIT on archiver child processes maybe not such a hot idea?  (Michael Paquier <michael@paquier.xyz>)
Ответы RE: SIGQUIT on archiver child processes maybe not such a hot idea?  ("Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com>)
Список pgsql-hackers
At Mon, 2 Sep 2019 15:51:34 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20190902065134.GE1841@paquier.xyz>
> On Mon, Sep 02, 2019 at 12:27:09AM +0000, Tsunakawa, Takayuki wrote:
> > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> >> After investigation, the mechanism that's causing that is that the
> >> src/test/recovery/t/010_logical_decoding_timelines.pl test shuts
> >> down its replica server with a mode-immediate stop, which causes
> >> that postmaster to shut down all its children with SIGQUIT, and
> >> in particular that signal propagates to a "cp" command that the
> >> archiver process is executing.  The "cp" is unsurprisingly running
> >> with default SIGQUIT handling, which per the signal man page
> >> includes dumping core.
> > 
> > We've experienced this (core dump in the data directory by an
> > archive command) years ago.  Related to this, the example of using
> > cp in the PostgreSQL manual is misleading, because cp doesn't
> > reliably persist the WAL archive file.
> 
> The previous talks about having pg_copy are still where they were a
> couple of years ago as we did not agree on which semantics it should
> have.  If we could move forward with that and update the documentation
> from its insanity that would be great and...  The signal handling is
> something else we could customize in a more favorable way with the
> archiver.  Anyway, switching from something else than SIGQUIT to stop
> the archiver will not prevent any other tools from generating core
> dumps with this other signal.

Since we are allowing OPs to use arbitrary command as
archive_command, providing a replacement with non-standard signal
handling for a specific command doesn't seem a general solution
to me. Couldn't we have pg_system(a tentative name), which
intercepts SIGQUIT then sends SIGINT to children? Might be need
to resend SIGQUIT after some interval, though..

> > We enable the core dump in production to help the investigation just in case.
> 
> So do I in some of the stuff I work on.
> 
> > some_command also catches SIGQUIT just exit.  It copies and syncs the file.
> > 
> > I proposed something in this line as below, but I couldn't respond to Peter's review comments due to other tasks.
Doesanyone think it's worth resuming this?
 
> > 
> > https://www.postgresql.org/message-id/7E37040CF3804EA5B018D7A022822984@maumau
> 
> And I was looking for this thread a couple of lines ago :)
> Thanks.

# Is there any means to view the whole of a thread from archive?
# I'm a kind of reluctant to wander among messages like a rat in
# a maze:p

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: pg_basebackup -F t fails when fsync spends more time thantcp_user_timeout
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: [HACKERS] CLUSTER command progress monitor