On Jul4, 2011, at 17:53 , Heikki Linnakangas wrote:
>> Under Linux, select() may report a socket file descriptor as "ready for
>> reading", while nevertheless a subsequent read blocks. This could for
>> example happen when data has arrived but upon examination has wrong
>> checksum and is discarded. There may be other circumstances in which a
>> file descriptor is spuriously reported as ready. Thus it may be safer
>> to use O_NONBLOCK on sockets that should not block.
>
> So in theory, on Linux you might WaitLatch might sometimes incorrectly return WL_POSTMASTER_DEATH. None of the
callerscheck for WL_POSTMASTER_DEATH return code, they call PostmasterIsAlive() before assuming the postmaster has
died,so that won't affect correctness at the moment. I doubt that scenario can even happen in our case, select() on a
pipethat is never written to. But maybe we should add add an assertion to WaitLatch to assert that if select() reports
thatthe postmaster pipe has been closed, PostmasterIsAlive() also returns false.
The correct solution would be to read() from the pipe after select()
returns, and only return WL_POSTMASTER_DEATCH if the read doesn't return
EAGAIN. To prevent that read() from blocking if the read event was indeed
spurious, O_NONBLOCK must be set on the pipe but that patch does that already.
Btw, with the death-watch / life-sign / whatever infrastructure in place,
shouldn't PostmasterIsAlive() be using that instead of getppid() / kill(0)?
best regards,
Florian Pflug