Обсуждение: A bit of PG archeology uncovers an interesting Linux/Unix factoid
For reasons, I was trying to compile older versions of Postgres and ran into a strange behaviour where system() worked normally but then returned -1 with errno set to ECHILD. And surprisingly it looks like we've seen this behaviour in the past but on a Solaris: commit 07d4d36aae79cf2ac365e381ed3e7ce62dcfa783 Author: Tatsuo Ishii <ishii@postgresql.org> Date: Thu May 25 06:53:43 2000 +0000 On solaris, createdb/dropdb fails because of strange behavior of system(). (it returns error with errno ECHILD uponsuccessful completion of commands). This fix ignores an error from system() if errno == ECHILD. It looks like Linux now behaves similarly, in fact there's a Redhat notice about this causing similar headaches in Oracle: https://access.redhat.com/solutions/37218 So just in case anyone else wants to use system() in Postgres or indeed any other Unix application that twiddles with the SIGCHILD handler this is something to beware of. It's not entirely clear to me that the mention of SA_NOCLDWAIT is the only way to get this behaviour, at least one stackoverflow answer implied just setting SIG_IGN was enough. -- greg
On 02/15/16 13:42, Greg Stark wrote: > (it returns error with errno ECHILD upon successful completion of commands). > This fix ignores an error from system() if errno == ECHILD. > > It looks like Linux now behaves similarly, It seems to be official, in the Single Unix Specification: http://pubs.opengroup.org/onlinepubs/7908799/xsh/sigaction.html SA_NOCLDWAIT If set, and sig equals SIGCHLD, child processes of the calling processes will not be transformed into zombieprocesses when they terminate. If the calling process subsequently waits for its children, and the process hasno unwaited for children that were transformed into zombie processes, it will block until all of its children terminate,and wait(), wait3(), waitid() and waitpid() will fail and set errno to [ECHILD]. Otherwise, terminating child processes will be transformed into zombie processes, unless SIGCHLD is set to SIG_IGN. > So just in case anyone else wants to use system() in Postgres or > indeed any other Unix application that twiddles with the SIGCHILD > handler this is something to beware of. It's not entirely clear to me > that the mention of SA_NOCLDWAIT is the only way to get this > behaviour, at least one stackoverflow answer implied just setting > SIG_IGN was enough. Yup: • If a process sets the action for the SIGCHLD signal to SIG_IGN, the behaviour is unspecified, except as specified below.If the action for the SIGCHLD signal is set to SIG_IGN, child processes of the calling processes will not be transformedinto zombie processes when they terminate. If the calling process subsequently waits for its children, andthe process has no unwaited for children that were transformed into zombie processes, it will block until all of its children terminate, and wait(), wait3(), waitid() and waitpid() will fail and set errno to [ECHILD]. -Chap
On Tue, Feb 16, 2016 at 12:51 AM, Chapman Flack <chap@anastigmatix.net> wrote: > If the calling process subsequently waits for its > children, and the process has no unwaited for children that were > transformed into zombie processes, it will block until all of its > children terminate, and wait(), wait3(), waitid() and waitpid() will > fail and set errno to [ECHILD]. Sure, but I don't see anything saying system() should be expected to not handle this situation. At least there's nothing in the system.3 man page that says it should be expected to always return an error if SIGCHILD is ignored. And actually looking at that documentation it's not clear to me why it's the case. I would have expected system to immediately call waitpid after the fork and unless the subprocess was very quick that should be sufficient to get the exit code. One might even imagine having system intentionally have some kind interlock to ensure that the parent has called waitpid before the child execs the shell. -- greg
On 02/15/16 20:03, Greg Stark wrote: > On Tue, Feb 16, 2016 at 12:51 AM, Chapman Flack <chap@anastigmatix.net> wrote: >> If the calling process subsequently waits for its >> children, and the process has no unwaited for children that were >> transformed into zombie processes, it will block until all of its >> children terminate, and wait(), wait3(), waitid() and waitpid() will >> fail and set errno to [ECHILD]. > And actually looking at that documentation it's not clear to me why > it's the case. I would have expected system to immediately call > waitpid after the fork and unless the subprocess was very quick that > should be sufficient to get the exit code. One might even imagine > having system intentionally have some kind interlock to ensure that > the parent has called waitpid before the child execs the shell. Doesn't the wording suggest that even if the parent is fast enough to call waitpid before the child exits, waitpid will only block until the child terminates and then say ECHILD anyway? I wouldn't be surprised if they specified it that way to avoid creating a race condition where you would *sometimes* think it was doing what you wanted. Agree that the language for ECHILD in system(3) doesn't clearly reflect that in the "status ... is no longer available" description it gives for ECHILD. -Chap