Re: SIGQUIT on archiver child processes maybe not such a hot idea?

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: SIGQUIT on archiver child processes maybe not such a hot idea?
Дата
Msg-id CA+hUKGLBEt0hs=-sKdcBSneyCvyKzYOmwMA=OA7yH1-OQ9AMFw@mail.gmail.com
обсуждение исходный текст
Ответ на RE: SIGQUIT on archiver child processes maybe not such a hot idea?  ("Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com>)
Список pgsql-hackers
On Tue, Sep 3, 2019 at 2:43 PM Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> From: Kyotaro Horiguchi [mailto:horikyota.ntt@gmail.com]
> > Since we are allowing OPs to use arbitrary command as
> > archive_command, providing a replacement with non-standard signal
> > handling for a specific command doesn't seem a general solution
> > to me. Couldn't we have pg_system(a tentative name), which
> > intercepts SIGQUIT then sends SIGINT to children? Might be need
> > to resend SIGQUIT after some interval, though..
>
> The same idea that you referred to as pg_system occurred to me, too.  But I wondered if the archiver process can get
thepid of its child (shell? archive_command?), while keeping the capabilities of system() (= the shell).  Even if we
fork()and then system(), doesn't the OS send SIGQUIT to any descendents of the archiver when postmaster sends SIGQUIT
tothe child process group? 

So, to recap what's happening here, we have a tree of processes like this:

postmaster
-> archiver
   -> sh
      -> cp [user-supplied archiving command]

The archiver is a process group leader, because it called setsid().
The postmaster's signal_child() does kill(pid, ...) and also
kill(-pid, ...), so the kernel sends SIGQUIT to archiver (twice), sh
and cp.  As for what they do with the signal, it depends on timing:

1.  The archiver normally exits immediately in pgarch_exit(), but
while system() is running, SIGQUIT and SIGINT are ignored (see POSIX).
2.  sh normally uses SIGINT to break out of loops etc, but while it's
waiting for a subprocess, it also ignores SIGQUIT and SIGINT (see
POSIX).
3.  cp inherits the default disposition and (unless it handles it
specially) dumps core.

I think the general idea here is that interactive shells and similar
things want to ignore signals from users typing ^C (SIGINT) or ^\
(SIGQUIT) so they can affect  just the thing that's actually running
at this moment, not the tree of processes waiting.

Yeah, I guess we could have our own pg_system() function that does
roughly what system() does, namely fork(), then execl() in the child
and waitpid() in the parent, but the child could begin a new process
group with setsid() before running execl() (so that it no longer gets
SIGQUIT with the postmaster signals the archiver), and the parent
could record pg_system_child_pid when forking, and install a QUIT
handler that does kill(-pg_system_child_pid, SIGTERM), as well as
setting a flag that will cause its main loop to exit (but not before
it has run waitpid()).  With some carefully placed blocks and unblocks
and ignores, to avoid races.

That all sounds like a lot of work though, and it might be easier to
just make an exception and use SIGTERM to shut down the archiver, as I
think Tom was suggesting.  Unfortunately we have the same problem
elsewhere, where we use popen().  I just wrote a C program that does
just "sleep(60)", ran it with COPY FROM PROGRAM, then sent SIGQUIT to
the postmaster, and got a dumped core.

--
Thomas Munro
https://enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Tsunakawa, Takayuki"
Дата:
Сообщение: RE: Speed up transaction completion faster after many relations areaccessed in a transaction
Следующее
От: Erik Rijkers
Дата:
Сообщение: Re: row filtering for logical replication