Обсуждение: PATCH: Keep one postmaster monitoring pipe per process

Поиск
Список
Период
Сортировка

PATCH: Keep one postmaster monitoring pipe per process

От
Marco Pfatschbacher
Дата:
Hi,

the current implementation of PostmasterIsAlive() uses a pipe to
monitor the existence of the postmaster process.
One end of the pipe is held open in the postmaster, while the other end is
inherited to all the auxiliary and background processes when they fork.
This leads to multiple processes calling select(2), poll(2) and read(2)
on the same end of the pipe.
While this is technically perfectly ok, it has the unfortunate side
effect that it triggers an inefficient behaviour[0] in the select/poll
implementation on some operating systems[1]:
The kernel can only keep track of one pid per select address and
thus has no other choice than to wakeup(9) every process that
is waiting on select/poll.

In our case the system had to wakeup ~3000 idle ssh processes
every time postgresql did call PostmasterIsAlive.
WalReceiver did run trigger with a rate of ~400 calls per second.
With the result that the system performs very badly,
being mostly busy scheduling idle processs.


Attached patch avoids the select contention by using a
separate pipe for each auxiliary and background process.
Since the postmaster has three different ways to create
new processes, the patch got a bit more complicated than
I anticipated :)

For auxiliary processes, pgstat, pgarch and the autovacuum launcher
get a preallocated pipe each. The pipes are held in:
  postmaster_alive_fds_own[NUM_AUXPROCTYPES];
  postmaster_alive_fds_watch[NUM_AUXPROCTYPES];

Just before we fork a new process we set postmaster_alive_fd
for each process type:
  postmaster_alive_fd = postmaster_alive_fds_watch[type];

Since there can be multiple backend processes, BackendStarup()
allocates a pipe on-demand and keeps the reference in the Backend
structure. And is closed when the backend terminates.

The patch was developed and tested under OpenBSD using the REL9_4_STABLE
branch.  I've merged it to current, compile tested and ran make check
on Ubuntu 14.04.

   Marco

[0]
 http://man.openbsd.org/OpenBSD-5.9/man2/select.2?manpath=OpenBSD-5.9

 BUGS
 [...]
 "Internally to the kernel, select() and pselect() work poorly if multiple
 processes wait on the same file descriptor. Given that, it is rather
 surprising to see that many daemons are written that way."

[1]
 At least OpenBSD and NetBSD are affected, FreeBSD rewrote
 their select implementation in 8.0.

Вложения

Re: PATCH: Keep one postmaster monitoring pipe per process

От
Andres Freund
Дата:
Hi,

On 2016-09-15 15:57:55 +0200, Marco Pfatschbacher wrote:
> the current implementation of PostmasterIsAlive() uses a pipe to
> monitor the existence of the postmaster process.
> One end of the pipe is held open in the postmaster, while the other end is
> inherited to all the auxiliary and background processes when they fork.
> This leads to multiple processes calling select(2), poll(2) and read(2)
> on the same end of the pipe.
> While this is technically perfectly ok, it has the unfortunate side
> effect that it triggers an inefficient behaviour[0] in the select/poll
> implementation on some operating systems[1]:
> The kernel can only keep track of one pid per select address and
> thus has no other choice than to wakeup(9) every process that
> is waiting on select/poll.

Yikes, that's a pretty absurd implementation.

Does openbsd's kqueue implementation have the same issue? There's
http://archives.postgresql.org/message-id/CAEepm%3D37oF84-iXDTQ9MrGjENwVGds%2B5zTr38ca73kWR7ez_tA%40mail.gmail.com

> Attached patch avoids the select contention by using a
> separate pipe for each auxiliary and background process.

I'm quite unenthusiastic about forcing that many additional file
descriptors onto the postmaster...

I'm not quite sure I understand why this an issue here - there shouldn't
ever be events on this fd, so why is the kernel waking up all processes?
It'd kinda makes sense it'd wake up all processes if there's one
waiting, but ... ?


>  BUGS
>  [...]
>  "Internally to the kernel, select() and pselect() work poorly if multiple
>  processes wait on the same file descriptor. Given that, it is rather
>  surprising to see that many daemons are written that way."

Gee. Maybe it's more surprising that that issue isn't being addressed?


Regards,

Andres



Re: PATCH: Keep one postmaster monitoring pipe per process

От
Thomas Munro
Дата:
On Fri, Sep 16, 2016 at 1:57 AM, Marco Pfatschbacher
<Marco_Pfatschbacher@genua.de> wrote:
> the current implementation of PostmasterIsAlive() uses a pipe to
> monitor the existence of the postmaster process.
> One end of the pipe is held open in the postmaster, while the other end is
> inherited to all the auxiliary and background processes when they fork.
> This leads to multiple processes calling select(2), poll(2) and read(2)
> on the same end of the pipe.
> While this is technically perfectly ok, it has the unfortunate side
> effect that it triggers an inefficient behaviour[0] in the select/poll
> implementation on some operating systems[1]:
> The kernel can only keep track of one pid per select address and
> thus has no other choice than to wakeup(9) every process that
> is waiting on select/poll.
>
> [...]
>
>  BUGS
>  [...]
>  "Internally to the kernel, select() and pselect() work poorly if multiple
>  processes wait on the same file descriptor. Given that, it is rather
>  surprising to see that many daemons are written that way."
>
> [1]
>  At least OpenBSD and NetBSD are affected, FreeBSD rewrote
>  their select implementation in 8.0.

Very interesting.  Perhaps that is why NetBSD shows a speedup with the
kqueue patch[1] but FreeBSD doesn't.  I guess that if I could get the
kqueue patch to perform better on large FreeBSD systems, it would also
be a solution to this problem.

[1]
https://www.postgresql.org/message-id/flat/CAEepm%3D2i78TOJeV4O0-0meiihiRfVQ29ur7%3DMBHxsUKaPSWeAg%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com



Re: PATCH: Keep one postmaster monitoring pipe per process

От
Tom Lane
Дата:
Marco Pfatschbacher <Marco_Pfatschbacher@genua.de> writes:
> the current implementation of PostmasterIsAlive() uses a pipe to
> monitor the existence of the postmaster process.
> One end of the pipe is held open in the postmaster, while the other end is
> inherited to all the auxiliary and background processes when they fork.
> This leads to multiple processes calling select(2), poll(2) and read(2)
> on the same end of the pipe.
> While this is technically perfectly ok, it has the unfortunate side
> effect that it triggers an inefficient behaviour[0] in the select/poll
> implementation on some operating systems[1]:
> The kernel can only keep track of one pid per select address and
> thus has no other choice than to wakeup(9) every process that
> is waiting on select/poll.

That seems like a rather bad kernel bug.

> In our case the system had to wakeup ~3000 idle ssh processes
> every time postgresql did call PostmasterIsAlive.

Uh, these are processes not even connected to the pipe in question?
That's *really* a kernel bug.

> Attached patch avoids the select contention by using a
> separate pipe for each auxiliary and background process.

I think this would likely move the performance problems somewhere else.
In particular, it would mean that every postmaster child would inherit
pipes leading to all the older children.  We could close 'em again
I guess, but we have previously found that having to do things that
way is a rather serious performance drag --- see the problems we had
with POSIX named semaphores, here for instance:
https://www.postgresql.org/message-id/flat/3830CBEB-F8CE-4EBC-BE16-A415E78A4CBC%40apple.com
I really don't want the postmaster to be holding any per-child kernel
resources.

It'd certainly be nice if we could find another solution besides
the pipe-based one, but I don't think "more pipes" is the answer.
There was some discussion of using Linux's prctl(PR_SET_PDEATHSIG)
when available; do the BSDen have anything like that?
        regards, tom lane



Re: PATCH: Keep one postmaster monitoring pipe per process

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> Very interesting.  Perhaps that is why NetBSD shows a speedup with the
> kqueue patch[1] but FreeBSD doesn't.  I guess that if I could get the
> kqueue patch to perform better on large FreeBSD systems, it would also
> be a solution to this problem.

I just noticed that kqueue appears to offer a solution to this problem,
ie one of the things you can wait for is exit of another process (named
by PID, looks like).  If that's portable to all kqueue platforms, then
integrating a substitute for the postmaster death pipe might push that
patch over the hump to being a net win.
        regards, tom lane



Re: PATCH: Keep one postmaster monitoring pipe per process

От
Marco Pfatschbacher
Дата:
On Thu, Sep 15, 2016 at 04:24:07PM -0400, Tom Lane wrote:
> Marco Pfatschbacher <Marco_Pfatschbacher@genua.de> writes:
> > the current implementation of PostmasterIsAlive() uses a pipe to
> > monitor the existence of the postmaster process.
> > One end of the pipe is held open in the postmaster, while the other end is
> > inherited to all the auxiliary and background processes when they fork.
> > This leads to multiple processes calling select(2), poll(2) and read(2)
> > on the same end of the pipe.
> > While this is technically perfectly ok, it has the unfortunate side
> > effect that it triggers an inefficient behaviour[0] in the select/poll
> > implementation on some operating systems[1]:
> > The kernel can only keep track of one pid per select address and
> > thus has no other choice than to wakeup(9) every process that
> > is waiting on select/poll.
> 
> That seems like a rather bad kernel bug.

It's a limitation that has been there since at least BSD4.4.
But yeah, I wished things were better. 

> > In our case the system had to wakeup ~3000 idle ssh processes
> > every time postgresql did call PostmasterIsAlive.
> 
> Uh, these are processes not even connected to the pipe in question?
> That's *really* a kernel bug.

The kernel does not know if they were selecting on that pipe,
because the only slot to keep that mapping has been already taken.
It's not that bad of a performance hit, If the system doesn't
many processes.

> > Attached patch avoids the select contention by using a
> > separate pipe for each auxiliary and background process.
> 
> I think this would likely move the performance problems somewhere else.
> In particular, it would mean that every postmaster child would inherit
> pipes leading to all the older children. 

I kept them at a minimum. (See ClosePostmasterPorts)

>  We could close 'em again
> I guess, but we have previously found that having to do things that
> way is a rather serious performance drag --- see the problems we had

I think closing a few files doesn't hurt too much, but I see your point.

> with POSIX named semaphores, here for instance:
> https://www.postgresql.org/message-id/flat/3830CBEB-F8CE-4EBC-BE16-A415E78A4CBC%40apple.com
> I really don't want the postmaster to be holding any per-child kernel
> resources.
> 
> It'd certainly be nice if we could find another solution besides
> the pipe-based one, but I don't think "more pipes" is the answer.
> There was some discussion of using Linux's prctl(PR_SET_PDEATHSIG)
> when available; do the BSDen have anything like that?

Not that I know of.
But since the WalReceiver process seemed to be the one calling
PostmasterIsAlive way more often than the rest, maybe we could limit
the performance hit by not calling it on every received wal chunk?

Cheers,  Marco



Re: PATCH: Keep one postmaster monitoring pipe per process

От
Marco Pfatschbacher
Дата:
On Thu, Sep 15, 2016 at 04:40:00PM -0400, Tom Lane wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
> > Very interesting.  Perhaps that is why NetBSD shows a speedup with the
> > kqueue patch[1] but FreeBSD doesn't.  I guess that if I could get the
> > kqueue patch to perform better on large FreeBSD systems, it would also
> > be a solution to this problem.
> 
> I just noticed that kqueue appears to offer a solution to this problem,
> ie one of the things you can wait for is exit of another process (named
> by PID, looks like).  If that's portable to all kqueue platforms, then
> integrating a substitute for the postmaster death pipe might push that
> patch over the hump to being a net win.
That sounds plausible.
I could give this a try after I get back from my vacation :)
  Marco



Re: PATCH: Keep one postmaster monitoring pipe per process

От
Marco Pfatschbacher
Дата:
On Thu, Sep 15, 2016 at 12:26:16PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2016-09-15 15:57:55 +0200, Marco Pfatschbacher wrote:
> > the current implementation of PostmasterIsAlive() uses a pipe to
> > monitor the existence of the postmaster process.
> > One end of the pipe is held open in the postmaster, while the other end is
> > inherited to all the auxiliary and background processes when they fork.
> > This leads to multiple processes calling select(2), poll(2) and read(2)
> > on the same end of the pipe.
> > While this is technically perfectly ok, it has the unfortunate side
> > effect that it triggers an inefficient behaviour[0] in the select/poll
> > implementation on some operating systems[1]:
> > The kernel can only keep track of one pid per select address and
> > thus has no other choice than to wakeup(9) every process that
> > is waiting on select/poll.
> 
> Yikes, that's a pretty absurd implementation.

Not when you take into account that it's been written over 20 years ago ;) 

> Does openbsd's kqueue implementation have the same issue? There's
> http://archives.postgresql.org/message-id/CAEepm%3D37oF84-iXDTQ9MrGjENwVGds%2B5zTr38ca73kWR7ez_tA%40mail.gmail.com

Looks ok, but your milage may vary. I've seen strange subtle bugs
using kqeue..

struct kevent {       __uintptr_t     ident;          /* identifier for this event */       short           filter;
   /* filter for event */       u_short         flags;       u_int           fflags;       quad_t          data;
void           *udata;         /* opaque user data identifier */
 
};

> > Attached patch avoids the select contention by using a
> > separate pipe for each auxiliary and background process.
> 
> I'm quite unenthusiastic about forcing that many additional file
> descriptors onto the postmaster...

In our use case it's about 30.

> I'm not quite sure I understand why this an issue here - there shouldn't
> ever be events on this fd, so why is the kernel waking up all processes?
> It'd kinda makes sense it'd wake up all processes if there's one
> waiting, but ... ?

Every read is an event, and that's what PostmasterIsAlive does. 
  Marco



Re: PATCH: Keep one postmaster monitoring pipe per process

От
Andres Freund
Дата:
Hi,

On 2016-09-16 09:55:48 +0200, Marco Pfatschbacher wrote:
> On Thu, Sep 15, 2016 at 12:26:16PM -0700, Andres Freund wrote:
> > Yikes, that's a pretty absurd implementation.
>
> Not when you take into account that it's been written over 20 years ago ;)

Well, that doesn't mean it can't be fixed ;)

> > I'm not quite sure I understand why this an issue here - there shouldn't
> > ever be events on this fd, so why is the kernel waking up all processes?
> > It'd kinda makes sense it'd wake up all processes if there's one
> > waiting, but ... ?
>
> Every read is an event, and that's what PostmasterIsAlive does.

But in most places we only do a PostmasterIsAlive if WaitLatch returns
WL_POSTMASTER_DEATH.  The only walreceiver related place that doesn't is
WalRcvWaitForStartPosition(). If that's indeed the cause of your issues 
this quite possibly could be fixed by doing the    if (!PostmasterIsAlive())        exit(1);
check not unconditionally, but only after the WaitLatch at the end of
the loop, and only if WL_POSTMASTER_DEATH is returned by WaitLatch()?
That'll be a minor behaviour change for the WALRCV_RESTARTING, but that
seems fine, we'll just loop once more outside (after a quick glance at
least).

Andres



Re: PATCH: Keep one postmaster monitoring pipe per process

От
Robert Haas
Дата:
On Fri, Sep 16, 2016 at 2:23 PM, Andres Freund <andres@anarazel.de> wrote:
>> Every read is an event, and that's what PostmasterIsAlive does.
>
> But in most places we only do a PostmasterIsAlive if WaitLatch returns
> WL_POSTMASTER_DEATH.  The only walreceiver related place that doesn't is
> WalRcvWaitForStartPosition(). If that's indeed the cause of your issues
> this quite possibly could be fixed by doing the
>                 if (!PostmasterIsAlive())
>                         exit(1);
> check not unconditionally, but only after the WaitLatch at the end of
> the loop, and only if WL_POSTMASTER_DEATH is returned by WaitLatch()?
> That'll be a minor behaviour change for the WALRCV_RESTARTING, but that
> seems fine, we'll just loop once more outside (after a quick glance at
> least).

At least some of the latch implementations already check
PostmasterIsAlive() internally to avoid returning spurious events; and
secure_read() at least assumes that the WL_POSTMASTER_DEATH return is
reliable and doesn't need a double-check.

So we can probably just remove the check altogether and instead bail
out if it returns WL_POSTMASTER_DEATH.  That probably saves a system
call per loop iteration even on platforms where the kernel doesn't
exhibit any surprising behavior.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: PATCH: Keep one postmaster monitoring pipe per process

От
Thomas Munro
Дата:
On Sat, Sep 17, 2016 at 6:23 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-09-16 09:55:48 +0200, Marco Pfatschbacher wrote:
>> On Thu, Sep 15, 2016 at 12:26:16PM -0700, Andres Freund wrote:
>> > Yikes, that's a pretty absurd implementation.
>>
>> Not when you take into account that it's been written over 20 years ago ;)
>
> Well, that doesn't mean it can't be fixed ;)
>
>> > I'm not quite sure I understand why this an issue here - there shouldn't
>> > ever be events on this fd, so why is the kernel waking up all processes?
>> > It'd kinda makes sense it'd wake up all processes if there's one
>> > waiting, but ... ?
>>
>> Every read is an event, and that's what PostmasterIsAlive does.
>
> But in most places we only do a PostmasterIsAlive if WaitLatch returns
> WL_POSTMASTER_DEATH.  The only walreceiver related place that doesn't is
> WalRcvWaitForStartPosition(). If that's indeed the cause of your issues
> this quite possibly could be fixed by doing the
>                 if (!PostmasterIsAlive())
>                         exit(1);
> check not unconditionally, but only after the WaitLatch at the end of
> the loop, and only if WL_POSTMASTER_DEATH is returned by WaitLatch()?
> That'll be a minor behaviour change for the WALRCV_RESTARTING, but that
> seems fine, we'll just loop once more outside (after a quick glance at
> least).

Yeah, I wondered why that was different than the pattern established
elsewhere when I was hacking on replication code.  There are actually
several places where we call PostmasterIsAlive() unconditionally in a
loop that waits for WL_POSTMASTER_DEATH but ignores the return code:
in pgarch.c, syncrep.c, walsender.c and walreceiver.c.  Should we just
change them all to check the return code and exit/break/ereport/etc as
appropriate?  That would match the code from autovacuum.c,
checkpointer.c, pgstat.c, be-secure.c and bgworker.c.  Something like
the attached.

The code in basebackup.c also waits for WL_POSTMASTER_DEATH but
doesn't check for it in the return value *or* call
PostmasterIsAlive().  I'm not sure what to make of that.  I didn't
test it but it looks like maybe it would continue running after
postmaster death but not honour the throttling rate limit because
WaitLatch would keep returning immediately.

--
Thomas Munro
http://www.enterprisedb.com

Вложения

Re: PATCH: Keep one postmaster monitoring pipe per process

От
Andres Freund
Дата:
On 2016-09-20 11:07:03 +1200, Thomas Munro wrote:
> Yeah, I wondered why that was different than the pattern established
> elsewhere when I was hacking on replication code.  There are actually
> several places where we call PostmasterIsAlive() unconditionally in a
> loop that waits for WL_POSTMASTER_DEATH but ignores the return code:

Note that just changing this implies a behavioural change in at least
some of those: If the loop is busy with work, the WaitLatch might never
be reached.  I think that might be ok in most of those, but it does
require examination.

- Andres



Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

От
Marco Pfatschbacher
Дата:
On Thu, Sep 15, 2016 at 04:40:00PM -0400, Tom Lane wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
> > Very interesting.  Perhaps that is why NetBSD shows a speedup with the
> > kqueue patch[1] but FreeBSD doesn't.  I guess that if I could get the
> > kqueue patch to perform better on large FreeBSD systems, it would also
> > be a solution to this problem.
> 
> I just noticed that kqueue appears to offer a solution to this problem,
> ie one of the things you can wait for is exit of another process (named
> by PID, looks like).  If that's portable to all kqueue platforms, then
> integrating a substitute for the postmaster death pipe might push that
> patch over the hump to being a net win.
> 
>             regards, tom lane

Hi,

sorry for the long silence.
I forgot about this after being on vacation.

kqueue does indeed support EVFILT_PROC like you said.
But using this effectively, would need a separate monitoring process,
because we cannot use poll(2) and kevent(2) together within the same loop.

But how about this:
We just use getppid to see whether the process id of our parent
has changed to 1 (init).
Just like calling read on the pipe, that's just one systemcall.

I did not bother to weed the postmaster_alive_fds yet, but
this change works fine for me.

Regards,
   Marco

diff --git a/src/backend/storage/ipc/pmsignal.c b/src/backend/storage/ipc/pmsignal.c
index 85db6b21f8..a0596c61fd 100644
--- a/src/backend/storage/ipc/pmsignal.c
+++ b/src/backend/storage/ipc/pmsignal.c
@@ -272,21 +272,16 @@ bool
 PostmasterIsAlive(void)
 {
 #ifndef WIN32
-    char        c;
-    ssize_t        rc;
+    pid_t        ppid;
 
-    rc = read(postmaster_alive_fds[POSTMASTER_FD_WATCH], &c, 1);
-    if (rc < 0)
+    ppid = getppid();
+    if (ppid == 1)
     {
-        if (errno == EAGAIN || errno == EWOULDBLOCK)
-            return true;
-        else
-            elog(FATAL, "read on postmaster death monitoring pipe failed: %m");
+        elog(FATAL, "postmaster died: our ppid changed to init");
+        return false;
     }
-    else if (rc > 0)
-        elog(FATAL, "unexpected data in postmaster death monitoring pipe");
 
-    return false;
+    return true;
 #else                            /* WIN32 */
     return (WaitForSingleObject(PostmasterHandle, 0) == WAIT_TIMEOUT);
 #endif                            /* WIN32 */


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

От
Thomas Munro
Дата:
On Tue, Dec 5, 2017 at 5:55 AM, Marco Pfatschbacher
<Marco_Pfatschbacher@genua.de> wrote:
> On Thu, Sep 15, 2016 at 04:40:00PM -0400, Tom Lane wrote:
>> I just noticed that kqueue appears to offer a solution to this problem,
>> ie one of the things you can wait for is exit of another process (named
>> by PID, looks like).  If that's portable to all kqueue platforms, then
>> integrating a substitute for the postmaster death pipe might push that
>> patch over the hump to being a net win.
>
> kqueue does indeed support EVFILT_PROC like you said.
> But using this effectively, would need a separate monitoring process,
> because we cannot use poll(2) and kevent(2) together within the same loop.

FWIW the kqueue patch I posted uses EVFILT_PROC as Tom suggested and
it works fine according to my testing (FreeBSD +macOS).  With that
patch, there is no poll, just kqueue everywhere.

> But how about this:
> We just use getppid to see whether the process id of our parent
> has changed to 1 (init).
> Just like calling read on the pipe, that's just one systemcall.

I'm still not sure why we have to make any system calls at all.  Do we
not trust WaitLatch() to tell us about WL_POSTMASTER_DEATH reliably,
or is it that we don't trust it to tell us about that event when there
are other events happening due to priorities, or...?

-- 
Thomas Munro
http://www.enterprisedb.com


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

От
Thomas Munro
Дата:
On Tue, Sep 20, 2016 at 11:26 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-09-20 11:07:03 +1200, Thomas Munro wrote:
>> Yeah, I wondered why that was different than the pattern established
>> elsewhere when I was hacking on replication code.  There are actually
>> several places where we call PostmasterIsAlive() unconditionally in a
>> loop that waits for WL_POSTMASTER_DEATH but ignores the return code:
>
> Note that just changing this implies a behavioural change in at least
> some of those: If the loop is busy with work, the WaitLatch might never
> be reached.  I think that might be ok in most of those, but it does
> require examination.

Rebased, but I don't actually like this patch any more.  Over in
another thread[1] I proposed that we should just make exit(1) the
default behaviour built into latch.c for those cases that don't want
to do something special (eg SyncRepWaitForLSN()), so we don't finish
up duplicating the ugly exit(1) code everywhere (or worse, forgetting
to).  Tom Lane seemed to think that was an idea worth pursuing.

I think what we need for PG12 is a patch that does that, and also
introduces a reused WaitEventSet object in several of these places.
Then eg SyncRepWaitForLSN() won't be interacting with contended kernel
objects every time (assuming an implementation like epoll or
eventually kqueue, completion ports etc is available).

Then if pgarch_ArchiverCopyLoop() and HandleStartupProcInterrupts()
(ie loops without waiting) adopt a prctl(PR_SET_PDEATHSIG)-based
approach where available as suggested by Andres[2] or fall back to
polling a reusable WaitEventSet (timeout = 0), then there'd be no more
calls to PostmasterIsAlive() outside latch.c.

[1] https://www.postgresql.org/message-id/CAEepm%3D2LqHzizbe7muD7-2yHUbTOoF7Q%2BqkSD5Q41kuhttRTwA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/7261eb39-0369-f2f4-1bb5-62f3b6083b5e%40iki.fi

-- 
Thomas Munro
http://www.enterprisedb.com

Вложения

Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

От
Andres Freund
Дата:
Hi,

On 2018-04-11 11:57:20 +1200, Thomas Munro wrote:
> Rebased, but I don't actually like this patch any more.  Over in
> another thread[1] I proposed that we should just make exit(1) the
> default behaviour built into latch.c for those cases that don't want
> to do something special (eg SyncRepWaitForLSN()), so we don't finish
> up duplicating the ugly exit(1) code everywhere (or worse, forgetting
> to).  Tom Lane seemed to think that was an idea worth pursuing.
> 
> I think what we need for PG12 is a patch that does that, and also
> introduces a reused WaitEventSet object in several of these places.
> Then eg SyncRepWaitForLSN() won't be interacting with contended kernel
> objects every time (assuming an implementation like epoll or
> eventually kqueue, completion ports etc is available).
> 
> Then if pgarch_ArchiverCopyLoop() and HandleStartupProcInterrupts()
> (ie loops without waiting) adopt a prctl(PR_SET_PDEATHSIG)-based
> approach where available as suggested by Andres[2] or fall back to
> polling a reusable WaitEventSet (timeout = 0), then there'd be no more
> calls to PostmasterIsAlive() outside latch.c.

I'm still unconvinced by this. There's good reasons why code might be
busy-looping without checking the latch, and we shouldn't force code to
add more latch checks if unnecessary. Resetting the latch frequently can
actually increase the amount of context switches considerably.

- Andres


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

От
Thomas Munro
Дата:
On Wed, Apr 11, 2018 at 12:03 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2018-04-11 11:57:20 +1200, Thomas Munro wrote:
>> Then if pgarch_ArchiverCopyLoop() and HandleStartupProcInterrupts()
>> (ie loops without waiting) adopt a prctl(PR_SET_PDEATHSIG)-based
>> approach where available as suggested by Andres[2] or fall back to
>> polling a reusable WaitEventSet (timeout = 0), then there'd be no more
>> calls to PostmasterIsAlive() outside latch.c.
>
> I'm still unconvinced by this. There's good reasons why code might be
> busy-looping without checking the latch, and we shouldn't force code to
> add more latch checks if unnecessary. Resetting the latch frequently can
> actually increase the amount of context switches considerably.

What latch?  There wouldn't be a latch in a WaitEventSet that is used
only to detect postmaster death.

I arrived at this idea via the realisation that the closest thing to
prctl(PR_SET_PDEATHSIG) on BSD-family systems today is
please-tell-my-kqueue-if-this-process-dies.  It so happens that my
kqueue patch already uses that instead of the pipe to detect
postmaster death.  The only problem is that you have to ask it, by
calling it kevent().  In a busy loop like those two, where there is no
other kind of waiting going on, you could do that by calling kevent()
with timeout = 0 to check the queue.

You could probably figure out a way to hide the
prctl(PR_SET_PDEATHSIG)-based approach inside the WaitEventSet code,
with a fast path that doesn't make any system calls if the only event
registered is postmaster death (you can just check the global variable
set by your signal handler).  But I guess you wouldn't like the extra
function call so I guess you'd prefer to check the global variable
directly in the busy loop, in builds that have
prctl(PR_SET_PDEATHSIG).

-- 
Thomas Munro
http://www.enterprisedb.com


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

От
Andres Freund
Дата:
Hi,

On 2018-04-11 12:17:14 +1200, Thomas Munro wrote:
> I arrived at this idea via the realisation that the closest thing to
> prctl(PR_SET_PDEATHSIG) on BSD-family systems today is
> please-tell-my-kqueue-if-this-process-dies.  It so happens that my
> kqueue patch already uses that instead of the pipe to detect
> postmaster death.  The only problem is that you have to ask it, by
> calling it kevent().  In a busy loop like those two, where there is no
> other kind of waiting going on, you could do that by calling kevent()
> with timeout = 0 to check the queue.

Which is not cheap.


> You could probably figure out a way to hide the
> prctl(PR_SET_PDEATHSIG)-based approach inside the WaitEventSet code,
> with a fast path that doesn't make any system calls if the only event
> registered is postmaster death (you can just check the global variable
> set by your signal handler).  But I guess you wouldn't like the extra
> function call so I guess you'd prefer to check the global variable
> directly in the busy loop, in builds that have
> prctl(PR_SET_PDEATHSIG).

Yea, I'd really want this to be a inline function of the style

static inline bool
PostmasterIsAlive(void)
{
    if (!postmaster_possibly_dead)
        return true;
    return PostmasterIsAliveInternal();
}

I do not like the WES alternative because a special cased "just
postmaster death" WES seems to have absolutely no advantage over a
faster PostmasterIsAlive().  And using WES seems to make a cheap check
like the above significantly more complicated.

Greetings,

Andres Freund


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

От
Thomas Munro
Дата:
On Wed, Apr 11, 2018 at 12:26 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2018-04-11 12:17:14 +1200, Thomas Munro wrote:
>> I arrived at this idea via the realisation that the closest thing to
>> prctl(PR_SET_PDEATHSIG) on BSD-family systems today is
>> please-tell-my-kqueue-if-this-process-dies.  It so happens that my
>> kqueue patch already uses that instead of the pipe to detect
>> postmaster death.  The only problem is that you have to ask it, by
>> calling it kevent().  In a busy loop like those two, where there is no
>> other kind of waiting going on, you could do that by calling kevent()
>> with timeout = 0 to check the queue.
>
> Which is not cheap.

Certainly, but I am reacting to reports via you of performance
problems coming from read(heavily_contended_fd) by saying: although we
can't avoid a syscall like Linux can in this case, we *can* avoid the
reportedly contended pipe mutex completely by reading only our own
process's kqueue.  Combined with Heikki's proposal not to check every
single time through busy loops to reduce the syscall frequency, that
might be the optimal solution until some hypothetical
prctl(PR_SET_PDEATHSIG)-feature-match patch arrives.  Just a thought.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

От
Thomas Munro
Дата:
On Wed, Apr 11, 2018 at 12:47 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, Apr 11, 2018 at 12:26 PM, Andres Freund <andres@anarazel.de> wrote:
>> On 2018-04-11 12:17:14 +1200, Thomas Munro wrote:
>>> I arrived at this idea via the realisation that the closest thing to
>>> prctl(PR_SET_PDEATHSIG) on BSD-family systems today is
>>> please-tell-my-kqueue-if-this-process-dies.  It so happens that my
>>> kqueue patch already uses that instead of the pipe to detect
>>> postmaster death.  The only problem is that you have to ask it, by
>>> calling it kevent().  In a busy loop like those two, where there is no
>>> other kind of waiting going on, you could do that by calling kevent()
>>> with timeout = 0 to check the queue.
>>
>> Which is not cheap.

After bouncing that idea around with a fellow pgsql-hacker off-list, I
take that idea back.  It's a lot simpler and as cheap if not cheaper
to check getppid() == 1 or similar every Nth time through the busy
loop.

I heard another interesting idea -- you can use F_SETOWN + O_ASYNC to
ask for SIGIO to be delivered to you when the pipe is closed.
Unfortunately that'd require a different pipe for each backend so
wouldn't work for us.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

От
Thomas Munro
Дата:
Hi,

I'd like to disentangle two related topics.  For "I want
PostmasterIsAlive() to go faster using signals on platforms that can
support that", please see over here:


https://www.postgresql.org/message-id/flat/7261eb39-0369-f2f4-1bb5-62f3b6083b5e@iki.fi#7261eb39-0369-f2f4-1bb5-62f3b6083b5e@iki.fi

For "I want to fix all the code that ignores the WL_POSTMASTER_DEATH
event and then calls PostmasterIsAlive() every time through its loop,
or fails to detect postmaster death at all", this is the thread
(unless someone sees a reason to reentangle them).

Here's a draft patch that does that.  One contentious question is:
should you have to opt *in* to auto-exit-on-postmaster death?  Andres
opined that you should.  I actually think it's not so bad if you don't
have to do that, and instead have to opt out.  I think of it as a kind
of 'process cancellation point' or a quiet PANIC that you can opt out
of.  It's nice to remove the old boilerplate code without having to
add a new boilerplate event that you have to remember every time.  Any
other opinions?

I'm not sure if the exit(1) vs proc_exit(1) distinction is important.

-- 
Thomas Munro
http://www.enterprisedb.com

Вложения

Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

От
Thomas Munro
Дата:
On Wed, Apr 18, 2018 at 6:55 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Here's a draft patch that does that.

Here's a better one (the previous version could read past the end of
the occurred_events array).

-- 
Thomas Munro
http://www.enterprisedb.com

Вложения

Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

От
Heikki Linnakangas
Дата:
On 18/04/18 09:55, Thomas Munro wrote:
> Here's a draft patch that does that.  One contentious question is:
> should you have to opt *in* to auto-exit-on-postmaster death?  Andres
> opined that you should.  I actually think it's not so bad if you don't
> have to do that, and instead have to opt out.  I think of it as a kind
> of 'process cancellation point' or a quiet PANIC that you can opt out
> of.  It's nice to remove the old boilerplate code without having to
> add a new boilerplate event that you have to remember every time.  Any
> other opinions?

Hmm. Exiting on postmaster death by default does feel a bit too magical 
to me. But as your patch points out, there are currently no places where 
you *don't* want to exit on postmaster death, some callers just prefer 
to handle it themselves. And I like having a default or something to 
make sure that all call sites in the future will also exit quickly.

I'd suggest that we add a new WL_EXIT_ON_POSTMASTER_DEATH flag, making 
it opt-on. But add an assertion in WaitLatchOrSocket:

Assert ((wakeEvents & (WL_EXIT_POSTMASTER_DEATH | WL_POSTMASTER_DEATH)) 
!= 0);

That ensures that all callers either use WL_EXIT_ON_POSTMASTER_DEATH, or 
WL_POSTMASTER_DEATH to handle it in the caller. Having to specify 
WL_EXIT_ON_POSTMASTER_DEATH reminds you that the call might exit(), so 
if that's not what you want, you need to do something else. But the 
assertion makes sure that all callers deal with postmaster death in some 
way.

- Heikki


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

От
Thomas Munro
Дата:
On Sat, Jul 14, 2018 at 5:34 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On 18/04/18 09:55, Thomas Munro wrote:
>> Here's a draft patch that does that.  One contentious question is:
>> should you have to opt *in* to auto-exit-on-postmaster death?  Andres
>> opined that you should.  I actually think it's not so bad if you don't
>> have to do that, and instead have to opt out.  I think of it as a kind
>> of 'process cancellation point' or a quiet PANIC that you can opt out
>> of.  It's nice to remove the old boilerplate code without having to
>> add a new boilerplate event that you have to remember every time.  Any
>> other opinions?
>
> Hmm. Exiting on postmaster death by default does feel a bit too magical to
> me. But as your patch points out, there are currently no places where you
> *don't* want to exit on postmaster death, some callers just prefer to handle
> it themselves. And I like having a default or something to make sure that
> all call sites in the future will also exit quickly.
>
> I'd suggest that we add a new WL_EXIT_ON_POSTMASTER_DEATH flag, making it
> opt-on.

Ok, it's now 2 against 1.  So here's a version that uses an explicit
WL_EXIT_ON_PM_DEATH value.  I like that name because it's shorter and
more visually distinctive (harder to confuse with
WL_POSTMASTER_DEATH).

> But add an assertion in WaitLatchOrSocket:
>
> Assert ((wakeEvents & (WL_EXIT_POSTMASTER_DEATH | WL_POSTMASTER_DEATH)) !=
> 0);

Ok.  Done for the WaitLatchXXX() interface.  The WaitEventSet
interface (rarely used directly) is less amenable to that change.

Here are some of the places I had to add WL_EXIT_ON_PM_DEATH:
gather_readnext(), shm_mq_send_bytes(), shm_mq_receive_bytes(),
shm_mq_wait_internal(), ProcSleep(), ProcWaitForSignal(), pg_sleep(),
pgfdw_get_result().

Was it intentional that any of those places don't currently exit on
postmaster vaporisation?

-- 
Thomas Munro
http://www.enterprisedb.com

Вложения

Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe perprocess

От
Kyotaro HORIGUCHI
Дата:
Hello.

At Wed, 18 Jul 2018 14:02:47 +1200, Thomas Munro <thomas.munro@enterprisedb.com> wrote in
<CAEepm=3t57KBxb90CTqnDZiSTnTq3jwxZUc0zaFDiaOkGQqNxA@mail.gmail.com>
> On Sat, Jul 14, 2018 at 5:34 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> > On 18/04/18 09:55, Thomas Munro wrote:
> >> Here's a draft patch that does that.  One contentious question is:
> >> should you have to opt *in* to auto-exit-on-postmaster death?  Andres
> >> opined that you should.  I actually think it's not so bad if you don't
> >> have to do that, and instead have to opt out.  I think of it as a kind
> >> of 'process cancellation point' or a quiet PANIC that you can opt out
> >> of.  It's nice to remove the old boilerplate code without having to
> >> add a new boilerplate event that you have to remember every time.  Any
> >> other opinions?
> >
> > Hmm. Exiting on postmaster death by default does feel a bit too magical to
> > me. But as your patch points out, there are currently no places where you
> > *don't* want to exit on postmaster death, some callers just prefer to handle
> > it themselves. And I like having a default or something to make sure that
> > all call sites in the future will also exit quickly.
> >
> > I'd suggest that we add a new WL_EXIT_ON_POSTMASTER_DEATH flag, making it
> > opt-on.
> 
> Ok, it's now 2 against 1.  So here's a version that uses an explicit
> WL_EXIT_ON_PM_DEATH value.  I like that name because it's shorter and
> more visually distinctive (harder to confuse with
> WL_POSTMASTER_DEATH).
> 
> > But add an assertion in WaitLatchOrSocket:
> >
> > Assert ((wakeEvents & (WL_EXIT_POSTMASTER_DEATH | WL_POSTMASTER_DEATH)) !=
> > 0);
> 
> Ok.  Done for the WaitLatchXXX() interface.  The WaitEventSet
> interface (rarely used directly) is less amenable to that change.
> 
> Here are some of the places I had to add WL_EXIT_ON_PM_DEATH:
> gather_readnext(), shm_mq_send_bytes(), shm_mq_receive_bytes(),
> shm_mq_wait_internal(), ProcSleep(), ProcWaitForSignal(), pg_sleep(),
> pgfdw_get_result().
> 
> Was it intentional that any of those places don't currently exit on
> postmaster vaporisation?

I think that backends are supposed to complete running query even
if postmaster dies meanwhile and currently that seems
true. pgfdw_get_result seems to be following the policy. Perhaps
it's the same for all of the functions listed above.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

От
Thomas Munro
Дата:
On Wed, Jul 18, 2018 at 8:30 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> At Wed, 18 Jul 2018 14:02:47 +1200, Thomas Munro <thomas.munro@enterprisedb.com> wrote in
<CAEepm=3t57KBxb90CTqnDZiSTnTq3jwxZUc0zaFDiaOkGQqNxA@mail.gmail.com>
>> Here are some of the places I had to add WL_EXIT_ON_PM_DEATH:
>> gather_readnext(), shm_mq_send_bytes(), shm_mq_receive_bytes(),
>> shm_mq_wait_internal(), ProcSleep(), ProcWaitForSignal(), pg_sleep(),
>> pgfdw_get_result().
>>
>> Was it intentional that any of those places don't currently exit on
>> postmaster vaporisation?
>
> I think that backends are supposed to complete running query even
> if postmaster dies meanwhile and currently that seems
> true. pgfdw_get_result seems to be following the policy. Perhaps
> it's the same for all of the functions listed above.

Hmm.  Why wait any longer?  The cluster is broken.  Is there some
correctness reason to defer shutdown in any of these places?

While looking for earlier discussion that might explain why some
places ignored it, I came across this email from Andres, saying the
same thing.

https://www.postgresql.org/message-id/20160321093534.inkduxvpirs5n44j%40alap3.anarazel.de

He mentioned that syslogger.c is a special case.  In my patch I added
WL_EXIT_ON_PM_DEATH to SysLoggerMain()'s WaitLatch*() calls, because I
have to or the new assertion fails.  Hmm, yeah, that's not great
because it might discard the last words of other backends.  So here is
a new version that treats syslogger.c specially and may have other
performance benefits.

-- 
Thomas Munro
http://www.enterprisedb.com

Вложения

Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe perprocess

От
Kyotaro HORIGUCHI
Дата:
At Thu, 19 Jul 2018 16:58:30 +1200, Thomas Munro <thomas.munro@enterprisedb.com> wrote in
<CAEepm=1R__sW-CTByAi+cFUXX8tJu8MTNdeQGZ4L-x5gZ4mxNA@mail.gmail.com>
> On Wed, Jul 18, 2018 at 8:30 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > At Wed, 18 Jul 2018 14:02:47 +1200, Thomas Munro <thomas.munro@enterprisedb.com> wrote in
<CAEepm=3t57KBxb90CTqnDZiSTnTq3jwxZUc0zaFDiaOkGQqNxA@mail.gmail.com>
> >> Here are some of the places I had to add WL_EXIT_ON_PM_DEATH:
> >> gather_readnext(), shm_mq_send_bytes(), shm_mq_receive_bytes(),
> >> shm_mq_wait_internal(), ProcSleep(), ProcWaitForSignal(), pg_sleep(),
> >> pgfdw_get_result().
> >>
> >> Was it intentional that any of those places don't currently exit on
> >> postmaster vaporisation?
> >
> > I think that backends are supposed to complete running query even
> > if postmaster dies meanwhile and currently that seems
> > true. pgfdw_get_result seems to be following the policy. Perhaps
> > it's the same for all of the functions listed above.
> 
> Hmm.  Why wait any longer?  The cluster is broken.  Is there some
> correctness reason to defer shutdown in any of these places?

Well, please don't get me wrong. I don't object to backends' exit
on postmaster death, rather I'm for it. Maybe the idea of mine
came from somthing like this (perhaps newer one), in the meaning
of just why it is working in that way now.

https://www.postgresql.org/message-id/21877.1294946166@sss.pgh.pa.us

> While looking for earlier discussion that might explain why some
> places ignored it, I came across this email from Andres, saying the
> same thing.
> 
> https://www.postgresql.org/message-id/20160321093534.inkduxvpirs5n44j%40alap3.anarazel.de
> 
> He mentioned that syslogger.c is a special case.  In my patch I added
> WL_EXIT_ON_PM_DEATH to SysLoggerMain()'s WaitLatch*() calls, because I
> have to or the new assertion fails.  Hmm, yeah, that's not great
> because it might discard the last words of other backends.  So here is
> a new version that treats syslogger.c specially and may have other
> performance benefits.

Yeah. That seems good. Couldn't we reuse prepared WaitEventSet in
other places? For example PgstatCollectorMain has the same
characteristics, where WaitLatchOrSocket is used with fixed
parameters and waiting on a socket which gets frequent receipts.

# Is it intentional that the patch doesn't touch pgstat.c?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

От
Thomas Munro
Дата:
On Thu, Jul 19, 2018 at 10:30 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> Hmm.  Why wait any longer?  The cluster is broken.  Is there some
>> correctness reason to defer shutdown in any of these places?
>
> Well, please don't get me wrong. I don't object to backends' exit
> on postmaster death, rather I'm for it. Maybe the idea of mine
> came from somthing like this (perhaps newer one), in the meaning
> of just why it is working in that way now.
>
> https://www.postgresql.org/message-id/21877.1294946166@sss.pgh.pa.us

Thanks, that was interesting.  It's from *before* the pipe solution
was implemented though so I'm not sure it's relevant: the
exit-as-soon-as-possible camp won that battle, we just missed a few
places.

>> He mentioned that syslogger.c is a special case.  In my patch I added
>> WL_EXIT_ON_PM_DEATH to SysLoggerMain()'s WaitLatch*() calls, because I
>> have to or the new assertion fails.  Hmm, yeah, that's not great
>> because it might discard the last words of other backends.  So here is
>> a new version that treats syslogger.c specially and may have other
>> performance benefits.
>
> Yeah. That seems good. Couldn't we reuse prepared WaitEventSet in
> other places? For example PgstatCollectorMain has the same
> characteristics, where WaitLatchOrSocket is used with fixed
> parameters and waiting on a socket which gets frequent receipts.

+1, but I'm considering that to be a separate project, or I'll never
get this patch committed.  It may be possible to have a small number
of them reused in many places, and it may be possible for
WaitLatchXXX() to reuse them automatically (so we don't have to change
 every call site).

> # Is it intentional that the patch doesn't touch pgstat.c?

Yes.  pgstat.c still uses WL_POSTMASTER_DEATH because it does
something special: it calls pgstat_write_statsfiles() before it exits.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

От
Thomas Munro
Дата:
On Thu, Jul 19, 2018 at 11:51 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Thu, Jul 19, 2018 at 10:30 PM, Kyotaro HORIGUCHI
> > Yeah. That seems good. Couldn't we reuse prepared WaitEventSet in
> > other places? For example PgstatCollectorMain has the same
> > characteristics, where WaitLatchOrSocket is used with fixed
> > parameters and waiting on a socket which gets frequent receipts.
>
> +1, but I'm considering that to be a separate project, or I'll never
> get this patch committed.  It may be possible to have a small number
> of them reused in many places, and it may be possible for
> WaitLatchXXX() to reuse them automatically (so we don't have to change
>  every call site).
>
> > # Is it intentional that the patch doesn't touch pgstat.c?
>
> Yes.  pgstat.c still uses WL_POSTMASTER_DEATH because it does
> something special: it calls pgstat_write_statsfiles() before it exits.

Rebased.

-- 
Thomas Munro
http://www.enterprisedb.com

Вложения

Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe perprocess

От
Kyotaro HORIGUCHI
Дата:
At Sun, 2 Sep 2018 07:04:19 +1200, Thomas Munro <thomas.munro@enterprisedb.com> wrote in
<CAEepm=0=PkSXQ5oNU8BhY1DEzxw_EspsU14D_zAPnj+fBAjxFQ@mail.gmail.com>
> > > # Is it intentional that the patch doesn't touch pgstat.c?
> >
> > Yes.  pgstat.c still uses WL_POSTMASTER_DEATH because it does
> > something special: it calls pgstat_write_statsfiles() before it exits.

Mmm. Exactly..

> Rebased.

Thank you for the new version.

===
In sysloger.c, cur_flags is (just set but) no longer used.

===
In latch.c,

- The parentheses around the symbols don't seem to be needed.
|           (wakeEvents & (WL_EXIT_ON_PM_DEATH)) != 0 ||
|           (wakeEvents & (WL_POSTMASTER_DEATH)) != 0);

- The following assertion looks contradicting to the comment.
|    /* Postmaster-managed callers must handle postmaster death somehow. */
|    Assert(!IsUnderPostmaster ||
|           (wakeEvents & (WL_EXIT_ON_PM_DEATH)) != 0 ||
|           (wakeEvents & (WL_POSTMASTER_DEATH)) != 0);

  (Maybe Assert(IsUnderPost && (wakeEv & (WL_EXI | WL_PO)) != 0);)

  And don't we need a description about this restriction in the
  function comment?

- I think it may be better that the followings had parentheses
  around '&' expressions.

|    if (wakeEvents & WL_POSTMASTER_DEATH && IsUnderPostmaster)
|    if (wakeEvents & WL_EXIT_ON_PM_DEATH && IsUnderPostmaster)


===
All the caller sites of WaitLatch, WaitLatchOrSocket and
WaitEventSetWait are covered by this patch and all them look
fine.


bgworker.c, pgstat.c, be-secure-openssl.c, be-secure.c:
  Not modified on purpose. WL_EXIT_POSTMASTER_DEATH is not proper
  to use there.

pgarch.c, syncrep.c, walsender.c:
  Removed redundant check of postmaster death.

syslogger.c:
  Left as it doesn't exit at postmaster death on purpose.  Uses
  reusable wait event set.

walrceiver.c:
  Adds new bailing out point in WalRcvWaitForStartPosition and it
  seems reasonable.

shm_mq.c:
  Adds PMdie bail out in shm_mq_send/receive_bytes, wait_internal.
  It looks fine.

postgres_fdw/connection.c:
  Adds pm_die bailout while getting result. This seems to be a bug fix.
  I'm fine with it included in this patch.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

От
Kyotaro HORIGUCHI
Дата:
Ugrrr! PLEASE ignore this! It's not wrong at all.

2018年9月6日(木) 18:58 Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>:
- The following assertion looks contradicting to the comment.
|    /* Postmaster-managed callers must handle postmaster death somehow. */
|    Assert(!IsUnderPostmaster ||
|           (wakeEvents & (WL_EXIT_ON_PM_DEATH)) != 0 ||
|           (wakeEvents & (WL_POSTMASTER_DEATH)) != 0);

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
 

Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

От
Thomas Munro
Дата:
On Thu, Sep 6, 2018 at 9:57 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> In sysloger.c, cur_flags is (just set but) no longer used.

Right.  Fixed.

> ===
> In latch.c,
>
> - The parentheses around the symbols don't seem to be needed.
> |           (wakeEvents & (WL_EXIT_ON_PM_DEATH)) != 0 ||
> |           (wakeEvents & (WL_POSTMASTER_DEATH)) != 0);

Fixed.

>   And don't we need a description about this restriction in the
>   function comment?

Ok, added.

> - I think it may be better that the followings had parentheses
>   around '&' expressions.
>
> |    if (wakeEvents & WL_POSTMASTER_DEATH && IsUnderPostmaster)
> |    if (wakeEvents & WL_EXIT_ON_PM_DEATH && IsUnderPostmaster)

Fixed.  Also I was a little inconsistent about whether to use implicit
or explicit comparison with 0, and decided to go for the former.

> All the caller sites of WaitLatch, WaitLatchOrSocket and
> WaitEventSetWait are covered by this patch and all them look
> fine.

Thanks for the review!

While rebasing, I also changed WL_EXIT_ON_PM_DEATH's means of exit
from exit(1) to proc_exit(1).  In master we're quite inconsistent
about that as you can see from the lines removed by this patch, but
the comments for proc_exit() seem to insist that it is the one true
way out.  Other than an elog(DEBUG3) message, the main difference
between proc_exit() and direct exit() seems to be the PROFILE_PID_DIR
stuff that changes directory on the way out the door for gprof users.
It looks like per-pid gmon files could have been achieved by setting
environment variables[1][2], but I guess there is some value in doing
it the same way on every platform.

[1] https://sourceware.org/git/?p=glibc.git;a=blob;f=gmon/gmon.c;hb=HEAD#l354
[2] https://github.com/freebsd/freebsd/blob/master/lib/libc/gmon/gmon.c#L154

-- 
Thomas Munro
http://www.enterprisedb.com

Вложения

Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe perprocess

От
Kyotaro HORIGUCHI
Дата:
Thank you for the fix.

At Tue, 23 Oct 2018 17:26:37 +1300, Thomas Munro <thomas.munro@enterprisedb.com> wrote in
<CAEepm=3BxOHw63Anwyx=pyy7Ju8cW6T0ZXNeKQS=WZp80hv_rg@mail.gmail.com>
> On Thu, Sep 6, 2018 at 9:57 PM Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> >   And don't we need a description about this restriction in the
> >   function comment?
> 
> Ok, added.

Thank you. It looks good.

> While rebasing, I also changed WL_EXIT_ON_PM_DEATH's means of exit
> from exit(1) to proc_exit(1).  In master we're quite inconsistent
> about that as you can see from the lines removed by this patch, but
> the comments for proc_exit() seem to insist that it is the one true
> way out.  Other than an elog(DEBUG3) message, the main difference
> between proc_exit() and direct exit() seems to be the PROFILE_PID_DIR
> stuff that changes directory on the way out the door for gprof users.

It looks exactly like that for me, too.

> It looks like per-pid gmon files could have been achieved by setting
> environment variables[1][2], but I guess there is some value in doing
> it the same way on every platform.
> 
> [1] https://sourceware.org/git/?p=glibc.git;a=blob;f=gmon/gmon.c;hb=HEAD#l354
> [2] https://github.com/freebsd/freebsd/blob/master/lib/libc/gmon/gmon.c#L154

Agreed. Anyway they don't allow the tweak for avworker.

It seems fine for me. I'm going to mark this ReadyForCommitter
after a few days of waiting for anyone who has more
comments/opinions.

https://commitfest.postgresql.org/20/1618/

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe perprocess

От
Kyotaro HORIGUCHI
Дата:
At Fri, 26 Oct 2018 14:13:51 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20181026.141351.09076928.horiguchi.kyotaro@lab.ntt.co.jp>
> Thank you for the fix.
> 
> At Tue, 23 Oct 2018 17:26:37 +1300, Thomas Munro <thomas.munro@enterprisedb.com> wrote in
<CAEepm=3BxOHw63Anwyx=pyy7Ju8cW6T0ZXNeKQS=WZp80hv_rg@mail.gmail.com>
> > On Thu, Sep 6, 2018 at 9:57 PM Kyotaro HORIGUCHI
> > <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > >   And don't we need a description about this restriction in the
> > >   function comment?
> > 
> > Ok, added.
> 
> Thank you. It looks good.
> 
> > While rebasing, I also changed WL_EXIT_ON_PM_DEATH's means of exit
> > from exit(1) to proc_exit(1).  In master we're quite inconsistent
> > about that as you can see from the lines removed by this patch, but
> > the comments for proc_exit() seem to insist that it is the one true
> > way out.  Other than an elog(DEBUG3) message, the main difference
> > between proc_exit() and direct exit() seems to be the PROFILE_PID_DIR
> > stuff that changes directory on the way out the door for gprof users.
> 
> It looks exactly like that for me, too.
> 
> > It looks like per-pid gmon files could have been achieved by setting
> > environment variables[1][2], but I guess there is some value in doing
> > it the same way on every platform.
> > 
> > [1] https://sourceware.org/git/?p=glibc.git;a=blob;f=gmon/gmon.c;hb=HEAD#l354
> > [2] https://github.com/freebsd/freebsd/blob/master/lib/libc/gmon/gmon.c#L154
> 
> Agreed. Anyway they don't allow the tweak for avworker.
> 
> It seems fine for me. I'm going to mark this ReadyForCommitter
> after a few days of waiting for anyone who has more
> comments/opinions.
> 
> https://commitfest.postgresql.org/20/1618/

Seen no objection nor further comments. I marked this as "Ready
for Committer".

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> [ 0001-Add-WL_EXIT_ON_PM_DEATH-pseudo-event-v4.patch ]

I took a quick look through this.  I have no objection to the idea of
letting the latch infrastructure do the proc_exit(1), but I'm wondering
why this is in the thread that it's in.  Is there any remaining connection
to the original complaint about BSDen not coping well with lots of
processes waiting on the same pipe?

A couple minor code gripes:

-        rc = WaitLatch(MyLatch,
-                       WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
-                       (nap.tv_sec * 1000L) + (nap.tv_usec / 1000L),
-                       WAIT_EVENT_AUTOVACUUM_MAIN);
+        WaitLatch(MyLatch,
+                  WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH,
+                  (nap.tv_sec * 1000L) + (nap.tv_usec / 1000L),
+                  WAIT_EVENT_AUTOVACUUM_MAIN);

I'd advise making the code in places like this look like

        (void) WaitLatch(MyLatch, ...);

Otherwise, you are likely to draw complaints about "return value is
sometimes ignored" from Coverity and other static analyzers.  The
(void) cast makes it explicit that you're intentionally ignoring
the result value in this one place.

@@ -1021,6 +1051,19 @@ WaitEventSetWait(WaitEventSet *set, long timeout,
 
     pgstat_report_wait_end();
 
+    /*
+     * Exit immediately if the postmaster died and the caller asked for
+     * automatic exit in that case.
+     */
+    if (set->exit_on_postmaster_death)
+    {
+        for (i = 0; i < returned_events; ++i)
+        {
+            if (occurred_events[i].events == WL_POSTMASTER_DEATH)
+                proc_exit(1);
+        }
+    }
+
     return returned_events;
 }
 
Since exit_on_postmaster_death = true is going to be the normal case,
it seems a bit unfortunate that we have to incur this looping every time
we go through WaitEventSetWait.  Can't we put the handling of this in some
spot where it only gets executed when we've detected WL_POSTMASTER_DEATH,
like right after the PostmasterIsAliveInternal calls?

            if (!PostmasterIsAliveInternal())
            {
+               if (set->exit_on_postmaster_death)
+                   proc_exit(1);
                occurred_events->fd = PGINVALID_SOCKET;
                occurred_events->events = WL_POSTMASTER_DEATH;
                occurred_events++;
                returned_events++;
            }

            regards, tom lane


Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

От
Thomas Munro
Дата:
On Sat, Nov 17, 2018 at 5:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
> > [ 0001-Add-WL_EXIT_ON_PM_DEATH-pseudo-event-v4.patch ]
>
> I took a quick look through this.  I have no objection to the idea of
> letting the latch infrastructure do the proc_exit(1), but I'm wondering
> why this is in the thread that it's in.  Is there any remaining connection
> to the original complaint about BSDen not coping well with lots of
> processes waiting on the same pipe?

It turned into a general clean-up of inconsistent postmaster death
handling after the one-pipe-per-backend proposal was rejected.
Perhaps I should have started a new thread, but it does in fact
address most cases of processes calling PostmasterIsAlive()
frequently, which was part of the original complaint.  That falls into
the last-mentioned category from this paragraph of the commit message:

  Repair all code that was previously ignoring postmaster death completely,
  or requesting the event but ignoring it, or requesting the event but then
  doing an unconditional PostmasterIsAlive() call every time through its
  event loop (which is an expensive syscall on platforms for which we don't
  have USE_POSTMASTER_DEATH_SIGNAL support).

Other potential improvements that could be made on both sides:

* We should probably be smarter about how often we call
PostmasterIsAlive() in the recovery loop, which is the only remaining
case like that (because it's a busy loop, given enough WAL to chew on,
so there is no waiting primitive that would report pipe readiness).

* It would be nice if more kernels supported parent death signals.

* Based on this report, those kernels really should sort out their
wakeup logic for duplicated pipe fds.  I wonder if other multi-process
applications have the same problem.  Our logger pipe and the proposed
checkpointer fsync pipe too, maybe, not sure.

> I'd advise making the code in places like this look like
>
>                 (void) WaitLatch(MyLatch, ...);
>
> Otherwise, you are likely to draw complaints about "return value is
> sometimes ignored" from Coverity and other static analyzers.  The
> (void) cast makes it explicit that you're intentionally ignoring
> the result value in this one place.

Done.

> Since exit_on_postmaster_death = true is going to be the normal case,
> it seems a bit unfortunate that we have to incur this looping every time
> we go through WaitEventSetWait.  Can't we put the handling of this in some
> spot where it only gets executed when we've detected WL_POSTMASTER_DEATH,
> like right after the PostmasterIsAliveInternal calls?
>
>             if (!PostmasterIsAliveInternal())
>             {
> +               if (set->exit_on_postmaster_death)
> +                   proc_exit(1);
>                 occurred_events->fd = PGINVALID_SOCKET;
>                 occurred_events->events = WL_POSTMASTER_DEATH;
>                 occurred_events++;
>                 returned_events++;
>             }

I was probably trying to avoid repeating the code in each of the 3
implementations of that function (poll, epoll, win32).  But there is
already other similar duplication there and I agree that is better.
Done that way.

Thanks for the review.

-- 
Thomas Munro
http://www.enterprisedb.com

Вложения

Re: [HACKERS] PATCH: Keep one postmaster monitoring pipe per process

От
Thomas Munro
Дата:
On Sat, Nov 17, 2018 at 2:27 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Thanks for the review.

Pushed.

-- 
Thomas Munro
http://www.enterprisedb.com