Обсуждение: A bit of PG archeology uncovers an interesting Linux/Unix factoid

Поиск
Список
Период
Сортировка

A bit of PG archeology uncovers an interesting Linux/Unix factoid

От
Greg Stark
Дата:
For reasons, I was trying to compile older versions of Postgres and
ran into a strange behaviour where system() worked normally but then
returned -1 with errno set to ECHILD. And surprisingly it looks like
we've seen this behaviour in the past but on a Solaris:

commit 07d4d36aae79cf2ac365e381ed3e7ce62dcfa783
Author: Tatsuo Ishii <ishii@postgresql.org>
Date:   Thu May 25 06:53:43 2000 +0000
   On solaris, createdb/dropdb fails because of strange behavior of system().   (it returns error with errno ECHILD
uponsuccessful completion of commands).   This fix ignores an error from system() if errno == ECHILD.
 

It looks like Linux now behaves similarly, in fact there's a Redhat
notice about this causing similar headaches in Oracle:
https://access.redhat.com/solutions/37218

So just in case anyone else wants to use system() in Postgres or
indeed any other Unix application that twiddles with the SIGCHILD
handler this is something to beware of. It's not entirely clear to me
that the mention of SA_NOCLDWAIT is the only way to get this
behaviour, at least one stackoverflow answer implied just setting
SIG_IGN was enough.

-- 
greg



Re: A bit of PG archeology uncovers an interesting Linux/Unix factoid

От
Chapman Flack
Дата:
On 02/15/16 13:42, Greg Stark wrote:

>     (it returns error with errno ECHILD upon successful completion of commands).
>     This fix ignores an error from system() if errno == ECHILD.
> 
> It looks like Linux now behaves similarly,

It seems to be official, in the Single Unix Specification:
http://pubs.opengroup.org/onlinepubs/7908799/xsh/sigaction.html

SA_NOCLDWAIT   If set, and sig equals SIGCHLD, child processes of the calling   processes will not be transformed into
zombieprocesses when they   terminate. If the calling process subsequently waits for its   children, and the process
hasno unwaited for children that were   transformed into zombie processes, it will block until all of its   children
terminate,and wait(), wait3(), waitid() and waitpid() will   fail and set errno to [ECHILD]. Otherwise, terminating
child  processes will be transformed into zombie processes, unless SIGCHLD   is set to SIG_IGN.
 

> So just in case anyone else wants to use system() in Postgres or
> indeed any other Unix application that twiddles with the SIGCHILD
> handler this is something to beware of. It's not entirely clear to me
> that the mention of SA_NOCLDWAIT is the only way to get this
> behaviour, at least one stackoverflow answer implied just setting
> SIG_IGN was enough.

Yup:

• If a process sets the action for the SIGCHLD signal to SIG_IGN, the   behaviour is unspecified, except as specified
below.If the action   for the SIGCHLD signal is set to SIG_IGN, child processes of the   calling processes will not be
transformedinto zombie processes when   they terminate. If the calling process subsequently waits for its   children,
andthe process has no unwaited for children that were   transformed into zombie processes, it will block until all of
its  children terminate, and wait(), wait3(), waitid() and waitpid() will   fail and set errno to [ECHILD].
 

-Chap



Re: A bit of PG archeology uncovers an interesting Linux/Unix factoid

От
Greg Stark
Дата:
On Tue, Feb 16, 2016 at 12:51 AM, Chapman Flack <chap@anastigmatix.net> wrote:
> If the calling process subsequently waits for its
>     children, and the process has no unwaited for children that were
>     transformed into zombie processes, it will block until all of its
>     children terminate, and wait(), wait3(), waitid() and waitpid() will
>     fail and set errno to [ECHILD].

Sure, but I don't see anything saying system() should be expected to
not handle this situation. At least there's nothing in the system.3
man page that says it should be expected to always return an error if
SIGCHILD is ignored.

And actually looking at that documentation it's not clear to me why
it's the case. I would have expected system to immediately call
waitpid after the fork and unless the subprocess was very quick that
should be sufficient to get the exit code. One might even imagine
having system intentionally have some kind interlock to ensure that
the parent has called waitpid before the child execs the shell.

-- 
greg



Re: A bit of PG archeology uncovers an interesting Linux/Unix factoid

От
Chapman Flack
Дата:
On 02/15/16 20:03, Greg Stark wrote:
> On Tue, Feb 16, 2016 at 12:51 AM, Chapman Flack <chap@anastigmatix.net> wrote:
>> If the calling process subsequently waits for its
>>     children, and the process has no unwaited for children that were
>>     transformed into zombie processes, it will block until all of its
>>     children terminate, and wait(), wait3(), waitid() and waitpid() will
>>     fail and set errno to [ECHILD].

> And actually looking at that documentation it's not clear to me why
> it's the case. I would have expected system to immediately call
> waitpid after the fork and unless the subprocess was very quick that
> should be sufficient to get the exit code. One might even imagine
> having system intentionally have some kind interlock to ensure that
> the parent has called waitpid before the child execs the shell.

Doesn't the wording suggest that even if the parent is fast enough
to call waitpid before the child exits, waitpid will only block until
the child terminates and then say ECHILD anyway?

I wouldn't be surprised if they specified it that way to avoid creating
a race condition where you would *sometimes* think it was doing what you
wanted.

Agree that the language for ECHILD in system(3) doesn't clearly reflect that
in the "status ... is no longer available" description it gives for ECHILD.

-Chap