Обсуждение: PATCH: Keep one postmaster monitoring pipe per process
Hi, the current implementation of PostmasterIsAlive() uses a pipe to monitor the existence of the postmaster process. One end of the pipe is held open in the postmaster, while the other end is inherited to all the auxiliary and background processes when they fork. This leads to multiple processes calling select(2), poll(2) and read(2) on the same end of the pipe. While this is technically perfectly ok, it has the unfortunate side effect that it triggers an inefficient behaviour[0] in the select/poll implementation on some operating systems[1]: The kernel can only keep track of one pid per select address and thus has no other choice than to wakeup(9) every process that is waiting on select/poll. In our case the system had to wakeup ~3000 idle ssh processes every time postgresql did call PostmasterIsAlive. WalReceiver did run trigger with a rate of ~400 calls per second. With the result that the system performs very badly, being mostly busy scheduling idle processs. Attached patch avoids the select contention by using a separate pipe for each auxiliary and background process. Since the postmaster has three different ways to create new processes, the patch got a bit more complicated than I anticipated :) For auxiliary processes, pgstat, pgarch and the autovacuum launcher get a preallocated pipe each. The pipes are held in: postmaster_alive_fds_own[NUM_AUXPROCTYPES]; postmaster_alive_fds_watch[NUM_AUXPROCTYPES]; Just before we fork a new process we set postmaster_alive_fd for each process type: postmaster_alive_fd = postmaster_alive_fds_watch[type]; Since there can be multiple backend processes, BackendStarup() allocates a pipe on-demand and keeps the reference in the Backend structure. And is closed when the backend terminates. The patch was developed and tested under OpenBSD using the REL9_4_STABLE branch. I've merged it to current, compile tested and ran make check on Ubuntu 14.04. Marco [0] http://man.openbsd.org/OpenBSD-5.9/man2/select.2?manpath=OpenBSD-5.9 BUGS [...] "Internally to the kernel, select() and pselect() work poorly if multiple processes wait on the same file descriptor. Given that, it is rather surprising to see that many daemons are written that way." [1] At least OpenBSD and NetBSD are affected, FreeBSD rewrote their select implementation in 8.0.
Вложения
Hi, On 2016-09-15 15:57:55 +0200, Marco Pfatschbacher wrote: > the current implementation of PostmasterIsAlive() uses a pipe to > monitor the existence of the postmaster process. > One end of the pipe is held open in the postmaster, while the other end is > inherited to all the auxiliary and background processes when they fork. > This leads to multiple processes calling select(2), poll(2) and read(2) > on the same end of the pipe. > While this is technically perfectly ok, it has the unfortunate side > effect that it triggers an inefficient behaviour[0] in the select/poll > implementation on some operating systems[1]: > The kernel can only keep track of one pid per select address and > thus has no other choice than to wakeup(9) every process that > is waiting on select/poll. Yikes, that's a pretty absurd implementation. Does openbsd's kqueue implementation have the same issue? There's http://archives.postgresql.org/message-id/CAEepm%3D37oF84-iXDTQ9MrGjENwVGds%2B5zTr38ca73kWR7ez_tA%40mail.gmail.com > Attached patch avoids the select contention by using a > separate pipe for each auxiliary and background process. I'm quite unenthusiastic about forcing that many additional file descriptors onto the postmaster... I'm not quite sure I understand why this an issue here - there shouldn't ever be events on this fd, so why is the kernel waking up all processes? It'd kinda makes sense it'd wake up all processes if there's one waiting, but ... ? > BUGS > [...] > "Internally to the kernel, select() and pselect() work poorly if multiple > processes wait on the same file descriptor. Given that, it is rather > surprising to see that many daemons are written that way." Gee. Maybe it's more surprising that that issue isn't being addressed? Regards, Andres
On Fri, Sep 16, 2016 at 1:57 AM, Marco Pfatschbacher <Marco_Pfatschbacher@genua.de> wrote: > the current implementation of PostmasterIsAlive() uses a pipe to > monitor the existence of the postmaster process. > One end of the pipe is held open in the postmaster, while the other end is > inherited to all the auxiliary and background processes when they fork. > This leads to multiple processes calling select(2), poll(2) and read(2) > on the same end of the pipe. > While this is technically perfectly ok, it has the unfortunate side > effect that it triggers an inefficient behaviour[0] in the select/poll > implementation on some operating systems[1]: > The kernel can only keep track of one pid per select address and > thus has no other choice than to wakeup(9) every process that > is waiting on select/poll. > > [...] > > BUGS > [...] > "Internally to the kernel, select() and pselect() work poorly if multiple > processes wait on the same file descriptor. Given that, it is rather > surprising to see that many daemons are written that way." > > [1] > At least OpenBSD and NetBSD are affected, FreeBSD rewrote > their select implementation in 8.0. Very interesting. Perhaps that is why NetBSD shows a speedup with the kqueue patch[1] but FreeBSD doesn't. I guess that if I could get the kqueue patch to perform better on large FreeBSD systems, it would also be a solution to this problem. [1] https://www.postgresql.org/message-id/flat/CAEepm%3D2i78TOJeV4O0-0meiihiRfVQ29ur7%3DMBHxsUKaPSWeAg%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com
Marco Pfatschbacher <Marco_Pfatschbacher@genua.de> writes: > the current implementation of PostmasterIsAlive() uses a pipe to > monitor the existence of the postmaster process. > One end of the pipe is held open in the postmaster, while the other end is > inherited to all the auxiliary and background processes when they fork. > This leads to multiple processes calling select(2), poll(2) and read(2) > on the same end of the pipe. > While this is technically perfectly ok, it has the unfortunate side > effect that it triggers an inefficient behaviour[0] in the select/poll > implementation on some operating systems[1]: > The kernel can only keep track of one pid per select address and > thus has no other choice than to wakeup(9) every process that > is waiting on select/poll. That seems like a rather bad kernel bug. > In our case the system had to wakeup ~3000 idle ssh processes > every time postgresql did call PostmasterIsAlive. Uh, these are processes not even connected to the pipe in question? That's *really* a kernel bug. > Attached patch avoids the select contention by using a > separate pipe for each auxiliary and background process. I think this would likely move the performance problems somewhere else. In particular, it would mean that every postmaster child would inherit pipes leading to all the older children. We could close 'em again I guess, but we have previously found that having to do things that way is a rather serious performance drag --- see the problems we had with POSIX named semaphores, here for instance: https://www.postgresql.org/message-id/flat/3830CBEB-F8CE-4EBC-BE16-A415E78A4CBC%40apple.com I really don't want the postmaster to be holding any per-child kernel resources. It'd certainly be nice if we could find another solution besides the pipe-based one, but I don't think "more pipes" is the answer. There was some discussion of using Linux's prctl(PR_SET_PDEATHSIG) when available; do the BSDen have anything like that? regards, tom lane
Thomas Munro <thomas.munro@enterprisedb.com> writes: > Very interesting. Perhaps that is why NetBSD shows a speedup with the > kqueue patch[1] but FreeBSD doesn't. I guess that if I could get the > kqueue patch to perform better on large FreeBSD systems, it would also > be a solution to this problem. I just noticed that kqueue appears to offer a solution to this problem, ie one of the things you can wait for is exit of another process (named by PID, looks like). If that's portable to all kqueue platforms, then integrating a substitute for the postmaster death pipe might push that patch over the hump to being a net win. regards, tom lane
On Thu, Sep 15, 2016 at 04:24:07PM -0400, Tom Lane wrote: > Marco Pfatschbacher <Marco_Pfatschbacher@genua.de> writes: > > the current implementation of PostmasterIsAlive() uses a pipe to > > monitor the existence of the postmaster process. > > One end of the pipe is held open in the postmaster, while the other end is > > inherited to all the auxiliary and background processes when they fork. > > This leads to multiple processes calling select(2), poll(2) and read(2) > > on the same end of the pipe. > > While this is technically perfectly ok, it has the unfortunate side > > effect that it triggers an inefficient behaviour[0] in the select/poll > > implementation on some operating systems[1]: > > The kernel can only keep track of one pid per select address and > > thus has no other choice than to wakeup(9) every process that > > is waiting on select/poll. > > That seems like a rather bad kernel bug. It's a limitation that has been there since at least BSD4.4. But yeah, I wished things were better. > > In our case the system had to wakeup ~3000 idle ssh processes > > every time postgresql did call PostmasterIsAlive. > > Uh, these are processes not even connected to the pipe in question? > That's *really* a kernel bug. The kernel does not know if they were selecting on that pipe, because the only slot to keep that mapping has been already taken. It's not that bad of a performance hit, If the system doesn't many processes. > > Attached patch avoids the select contention by using a > > separate pipe for each auxiliary and background process. > > I think this would likely move the performance problems somewhere else. > In particular, it would mean that every postmaster child would inherit > pipes leading to all the older children. I kept them at a minimum. (See ClosePostmasterPorts) > We could close 'em again > I guess, but we have previously found that having to do things that > way is a rather serious performance drag --- see the problems we had I think closing a few files doesn't hurt too much, but I see your point. > with POSIX named semaphores, here for instance: > https://www.postgresql.org/message-id/flat/3830CBEB-F8CE-4EBC-BE16-A415E78A4CBC%40apple.com > I really don't want the postmaster to be holding any per-child kernel > resources. > > It'd certainly be nice if we could find another solution besides > the pipe-based one, but I don't think "more pipes" is the answer. > There was some discussion of using Linux's prctl(PR_SET_PDEATHSIG) > when available; do the BSDen have anything like that? Not that I know of. But since the WalReceiver process seemed to be the one calling PostmasterIsAlive way more often than the rest, maybe we could limit the performance hit by not calling it on every received wal chunk? Cheers, Marco
On Thu, Sep 15, 2016 at 04:40:00PM -0400, Tom Lane wrote: > Thomas Munro <thomas.munro@enterprisedb.com> writes: > > Very interesting. Perhaps that is why NetBSD shows a speedup with the > > kqueue patch[1] but FreeBSD doesn't. I guess that if I could get the > > kqueue patch to perform better on large FreeBSD systems, it would also > > be a solution to this problem. > > I just noticed that kqueue appears to offer a solution to this problem, > ie one of the things you can wait for is exit of another process (named > by PID, looks like). If that's portable to all kqueue platforms, then > integrating a substitute for the postmaster death pipe might push that > patch over the hump to being a net win. That sounds plausible. I could give this a try after I get back from my vacation :) Marco
On Thu, Sep 15, 2016 at 12:26:16PM -0700, Andres Freund wrote: > Hi, > > On 2016-09-15 15:57:55 +0200, Marco Pfatschbacher wrote: > > the current implementation of PostmasterIsAlive() uses a pipe to > > monitor the existence of the postmaster process. > > One end of the pipe is held open in the postmaster, while the other end is > > inherited to all the auxiliary and background processes when they fork. > > This leads to multiple processes calling select(2), poll(2) and read(2) > > on the same end of the pipe. > > While this is technically perfectly ok, it has the unfortunate side > > effect that it triggers an inefficient behaviour[0] in the select/poll > > implementation on some operating systems[1]: > > The kernel can only keep track of one pid per select address and > > thus has no other choice than to wakeup(9) every process that > > is waiting on select/poll. > > Yikes, that's a pretty absurd implementation. Not when you take into account that it's been written over 20 years ago ;) > Does openbsd's kqueue implementation have the same issue? There's > http://archives.postgresql.org/message-id/CAEepm%3D37oF84-iXDTQ9MrGjENwVGds%2B5zTr38ca73kWR7ez_tA%40mail.gmail.com Looks ok, but your milage may vary. I've seen strange subtle bugs using kqeue.. struct kevent { __uintptr_t ident; /* identifier for this event */ short filter; /* filter for event */ u_short flags; u_int fflags; quad_t data; void *udata; /* opaque user data identifier */ }; > > Attached patch avoids the select contention by using a > > separate pipe for each auxiliary and background process. > > I'm quite unenthusiastic about forcing that many additional file > descriptors onto the postmaster... In our use case it's about 30. > I'm not quite sure I understand why this an issue here - there shouldn't > ever be events on this fd, so why is the kernel waking up all processes? > It'd kinda makes sense it'd wake up all processes if there's one > waiting, but ... ? Every read is an event, and that's what PostmasterIsAlive does. Marco
Hi, On 2016-09-16 09:55:48 +0200, Marco Pfatschbacher wrote: > On Thu, Sep 15, 2016 at 12:26:16PM -0700, Andres Freund wrote: > > Yikes, that's a pretty absurd implementation. > > Not when you take into account that it's been written over 20 years ago ;) Well, that doesn't mean it can't be fixed ;) > > I'm not quite sure I understand why this an issue here - there shouldn't > > ever be events on this fd, so why is the kernel waking up all processes? > > It'd kinda makes sense it'd wake up all processes if there's one > > waiting, but ... ? > > Every read is an event, and that's what PostmasterIsAlive does. But in most places we only do a PostmasterIsAlive if WaitLatch returns WL_POSTMASTER_DEATH. The only walreceiver related place that doesn't is WalRcvWaitForStartPosition(). If that's indeed the cause of your issues this quite possibly could be fixed by doing the if (!PostmasterIsAlive()) exit(1); check not unconditionally, but only after the WaitLatch at the end of the loop, and only if WL_POSTMASTER_DEATH is returned by WaitLatch()? That'll be a minor behaviour change for the WALRCV_RESTARTING, but that seems fine, we'll just loop once more outside (after a quick glance at least). Andres
On Fri, Sep 16, 2016 at 2:23 PM, Andres Freund <andres@anarazel.de> wrote: >> Every read is an event, and that's what PostmasterIsAlive does. > > But in most places we only do a PostmasterIsAlive if WaitLatch returns > WL_POSTMASTER_DEATH. The only walreceiver related place that doesn't is > WalRcvWaitForStartPosition(). If that's indeed the cause of your issues > this quite possibly could be fixed by doing the > if (!PostmasterIsAlive()) > exit(1); > check not unconditionally, but only after the WaitLatch at the end of > the loop, and only if WL_POSTMASTER_DEATH is returned by WaitLatch()? > That'll be a minor behaviour change for the WALRCV_RESTARTING, but that > seems fine, we'll just loop once more outside (after a quick glance at > least). At least some of the latch implementations already check PostmasterIsAlive() internally to avoid returning spurious events; and secure_read() at least assumes that the WL_POSTMASTER_DEATH return is reliable and doesn't need a double-check. So we can probably just remove the check altogether and instead bail out if it returns WL_POSTMASTER_DEATH. That probably saves a system call per loop iteration even on platforms where the kernel doesn't exhibit any surprising behavior. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Sep 17, 2016 at 6:23 AM, Andres Freund <andres@anarazel.de> wrote: > On 2016-09-16 09:55:48 +0200, Marco Pfatschbacher wrote: >> On Thu, Sep 15, 2016 at 12:26:16PM -0700, Andres Freund wrote: >> > Yikes, that's a pretty absurd implementation. >> >> Not when you take into account that it's been written over 20 years ago ;) > > Well, that doesn't mean it can't be fixed ;) > >> > I'm not quite sure I understand why this an issue here - there shouldn't >> > ever be events on this fd, so why is the kernel waking up all processes? >> > It'd kinda makes sense it'd wake up all processes if there's one >> > waiting, but ... ? >> >> Every read is an event, and that's what PostmasterIsAlive does. > > But in most places we only do a PostmasterIsAlive if WaitLatch returns > WL_POSTMASTER_DEATH. The only walreceiver related place that doesn't is > WalRcvWaitForStartPosition(). If that's indeed the cause of your issues > this quite possibly could be fixed by doing the > if (!PostmasterIsAlive()) > exit(1); > check not unconditionally, but only after the WaitLatch at the end of > the loop, and only if WL_POSTMASTER_DEATH is returned by WaitLatch()? > That'll be a minor behaviour change for the WALRCV_RESTARTING, but that > seems fine, we'll just loop once more outside (after a quick glance at > least). Yeah, I wondered why that was different than the pattern established elsewhere when I was hacking on replication code. There are actually several places where we call PostmasterIsAlive() unconditionally in a loop that waits for WL_POSTMASTER_DEATH but ignores the return code: in pgarch.c, syncrep.c, walsender.c and walreceiver.c. Should we just change them all to check the return code and exit/break/ereport/etc as appropriate? That would match the code from autovacuum.c, checkpointer.c, pgstat.c, be-secure.c and bgworker.c. Something like the attached. The code in basebackup.c also waits for WL_POSTMASTER_DEATH but doesn't check for it in the return value *or* call PostmasterIsAlive(). I'm not sure what to make of that. I didn't test it but it looks like maybe it would continue running after postmaster death but not honour the throttling rate limit because WaitLatch would keep returning immediately. -- Thomas Munro http://www.enterprisedb.com
Вложения
On 2016-09-20 11:07:03 +1200, Thomas Munro wrote: > Yeah, I wondered why that was different than the pattern established > elsewhere when I was hacking on replication code. There are actually > several places where we call PostmasterIsAlive() unconditionally in a > loop that waits for WL_POSTMASTER_DEATH but ignores the return code: Note that just changing this implies a behavioural change in at least some of those: If the loop is busy with work, the WaitLatch might never be reached. I think that might be ok in most of those, but it does require examination. - Andres
On Thu, Sep 15, 2016 at 04:40:00PM -0400, Tom Lane wrote: > Thomas Munro <thomas.munro@enterprisedb.com> writes: > > Very interesting. Perhaps that is why NetBSD shows a speedup with the > > kqueue patch[1] but FreeBSD doesn't. I guess that if I could get the > > kqueue patch to perform better on large FreeBSD systems, it would also > > be a solution to this problem. > > I just noticed that kqueue appears to offer a solution to this problem, > ie one of the things you can wait for is exit of another process (named > by PID, looks like). If that's portable to all kqueue platforms, then > integrating a substitute for the postmaster death pipe might push that > patch over the hump to being a net win. > > regards, tom lane Hi, sorry for the long silence. I forgot about this after being on vacation. kqueue does indeed support EVFILT_PROC like you said. But using this effectively, would need a separate monitoring process, because we cannot use poll(2) and kevent(2) together within the same loop. But how about this: We just use getppid to see whether the process id of our parent has changed to 1 (init). Just like calling read on the pipe, that's just one systemcall. I did not bother to weed the postmaster_alive_fds yet, but this change works fine for me. Regards, Marco diff --git a/src/backend/storage/ipc/pmsignal.c b/src/backend/storage/ipc/pmsignal.c index 85db6b21f8..a0596c61fd 100644 --- a/src/backend/storage/ipc/pmsignal.c +++ b/src/backend/storage/ipc/pmsignal.c @@ -272,21 +272,16 @@ bool PostmasterIsAlive(void) { #ifndef WIN32 - char c; - ssize_t rc; + pid_t ppid; - rc = read(postmaster_alive_fds[POSTMASTER_FD_WATCH], &c, 1); - if (rc < 0) + ppid = getppid(); + if (ppid == 1) { - if (errno == EAGAIN || errno == EWOULDBLOCK) - return true; - else - elog(FATAL, "read on postmaster death monitoring pipe failed: %m"); + elog(FATAL, "postmaster died: our ppid changed to init"); + return false; } - else if (rc > 0) - elog(FATAL, "unexpected data in postmaster death monitoring pipe"); - return false; + return true; #else /* WIN32 */ return (WaitForSingleObject(PostmasterHandle, 0) == WAIT_TIMEOUT); #endif /* WIN32 */
On Tue, Dec 5, 2017 at 5:55 AM, Marco Pfatschbacher <Marco_Pfatschbacher@genua.de> wrote: > On Thu, Sep 15, 2016 at 04:40:00PM -0400, Tom Lane wrote: >> I just noticed that kqueue appears to offer a solution to this problem, >> ie one of the things you can wait for is exit of another process (named >> by PID, looks like). If that's portable to all kqueue platforms, then >> integrating a substitute for the postmaster death pipe might push that >> patch over the hump to being a net win. > > kqueue does indeed support EVFILT_PROC like you said. > But using this effectively, would need a separate monitoring process, > because we cannot use poll(2) and kevent(2) together within the same loop. FWIW the kqueue patch I posted uses EVFILT_PROC as Tom suggested and it works fine according to my testing (FreeBSD +macOS). With that patch, there is no poll, just kqueue everywhere. > But how about this: > We just use getppid to see whether the process id of our parent > has changed to 1 (init). > Just like calling read on the pipe, that's just one systemcall. I'm still not sure why we have to make any system calls at all. Do we not trust WaitLatch() to tell us about WL_POSTMASTER_DEATH reliably, or is it that we don't trust it to tell us about that event when there are other events happening due to priorities, or...? -- Thomas Munro http://www.enterprisedb.com
On Tue, Sep 20, 2016 at 11:26 AM, Andres Freund <andres@anarazel.de> wrote: > On 2016-09-20 11:07:03 +1200, Thomas Munro wrote: >> Yeah, I wondered why that was different than the pattern established >> elsewhere when I was hacking on replication code. There are actually >> several places where we call PostmasterIsAlive() unconditionally in a >> loop that waits for WL_POSTMASTER_DEATH but ignores the return code: > > Note that just changing this implies a behavioural change in at least > some of those: If the loop is busy with work, the WaitLatch might never > be reached. I think that might be ok in most of those, but it does > require examination. Rebased, but I don't actually like this patch any more. Over in another thread[1] I proposed that we should just make exit(1) the default behaviour built into latch.c for those cases that don't want to do something special (eg SyncRepWaitForLSN()), so we don't finish up duplicating the ugly exit(1) code everywhere (or worse, forgetting to). Tom Lane seemed to think that was an idea worth pursuing. I think what we need for PG12 is a patch that does that, and also introduces a reused WaitEventSet object in several of these places. Then eg SyncRepWaitForLSN() won't be interacting with contended kernel objects every time (assuming an implementation like epoll or eventually kqueue, completion ports etc is available). Then if pgarch_ArchiverCopyLoop() and HandleStartupProcInterrupts() (ie loops without waiting) adopt a prctl(PR_SET_PDEATHSIG)-based approach where available as suggested by Andres[2] or fall back to polling a reusable WaitEventSet (timeout = 0), then there'd be no more calls to PostmasterIsAlive() outside latch.c. [1] https://www.postgresql.org/message-id/CAEepm%3D2LqHzizbe7muD7-2yHUbTOoF7Q%2BqkSD5Q41kuhttRTwA%40mail.gmail.com [2] https://www.postgresql.org/message-id/flat/7261eb39-0369-f2f4-1bb5-62f3b6083b5e%40iki.fi -- Thomas Munro http://www.enterprisedb.com
Вложения
Hi, On 2018-04-11 11:57:20 +1200, Thomas Munro wrote: > Rebased, but I don't actually like this patch any more. Over in > another thread[1] I proposed that we should just make exit(1) the > default behaviour built into latch.c for those cases that don't want > to do something special (eg SyncRepWaitForLSN()), so we don't finish > up duplicating the ugly exit(1) code everywhere (or worse, forgetting > to). Tom Lane seemed to think that was an idea worth pursuing. > > I think what we need for PG12 is a patch that does that, and also > introduces a reused WaitEventSet object in several of these places. > Then eg SyncRepWaitForLSN() won't be interacting with contended kernel > objects every time (assuming an implementation like epoll or > eventually kqueue, completion ports etc is available). > > Then if pgarch_ArchiverCopyLoop() and HandleStartupProcInterrupts() > (ie loops without waiting) adopt a prctl(PR_SET_PDEATHSIG)-based > approach where available as suggested by Andres[2] or fall back to > polling a reusable WaitEventSet (timeout = 0), then there'd be no more > calls to PostmasterIsAlive() outside latch.c. I'm still unconvinced by this. There's good reasons why code might be busy-looping without checking the latch, and we shouldn't force code to add more latch checks if unnecessary. Resetting the latch frequently can actually increase the amount of context switches considerably. - Andres
On Wed, Apr 11, 2018 at 12:03 PM, Andres Freund <andres@anarazel.de> wrote: > On 2018-04-11 11:57:20 +1200, Thomas Munro wrote: >> Then if pgarch_ArchiverCopyLoop() and HandleStartupProcInterrupts() >> (ie loops without waiting) adopt a prctl(PR_SET_PDEATHSIG)-based >> approach where available as suggested by Andres[2] or fall back to >> polling a reusable WaitEventSet (timeout = 0), then there'd be no more >> calls to PostmasterIsAlive() outside latch.c. > > I'm still unconvinced by this. There's good reasons why code might be > busy-looping without checking the latch, and we shouldn't force code to > add more latch checks if unnecessary. Resetting the latch frequently can > actually increase the amount of context switches considerably. What latch? There wouldn't be a latch in a WaitEventSet that is used only to detect postmaster death. I arrived at this idea via the realisation that the closest thing to prctl(PR_SET_PDEATHSIG) on BSD-family systems today is please-tell-my-kqueue-if-this-process-dies. It so happens that my kqueue patch already uses that instead of the pipe to detect postmaster death. The only problem is that you have to ask it, by calling it kevent(). In a busy loop like those two, where there is no other kind of waiting going on, you could do that by calling kevent() with timeout = 0 to check the queue. You could probably figure out a way to hide the prctl(PR_SET_PDEATHSIG)-based approach inside the WaitEventSet code, with a fast path that doesn't make any system calls if the only event registered is postmaster death (you can just check the global variable set by your signal handler). But I guess you wouldn't like the extra function call so I guess you'd prefer to check the global variable directly in the busy loop, in builds that have prctl(PR_SET_PDEATHSIG). -- Thomas Munro http://www.enterprisedb.com
Hi, On 2018-04-11 12:17:14 +1200, Thomas Munro wrote: > I arrived at this idea via the realisation that the closest thing to > prctl(PR_SET_PDEATHSIG) on BSD-family systems today is > please-tell-my-kqueue-if-this-process-dies. It so happens that my > kqueue patch already uses that instead of the pipe to detect > postmaster death. The only problem is that you have to ask it, by > calling it kevent(). In a busy loop like those two, where there is no > other kind of waiting going on, you could do that by calling kevent() > with timeout = 0 to check the queue. Which is not cheap. > You could probably figure out a way to hide the > prctl(PR_SET_PDEATHSIG)-based approach inside the WaitEventSet code, > with a fast path that doesn't make any system calls if the only event > registered is postmaster death (you can just check the global variable > set by your signal handler). But I guess you wouldn't like the extra > function call so I guess you'd prefer to check the global variable > directly in the busy loop, in builds that have > prctl(PR_SET_PDEATHSIG). Yea, I'd really want this to be a inline function of the style static inline bool PostmasterIsAlive(void) { if (!postmaster_possibly_dead) return true; return PostmasterIsAliveInternal(); } I do not like the WES alternative because a special cased "just postmaster death" WES seems to have absolutely no advantage over a faster PostmasterIsAlive(). And using WES seems to make a cheap check like the above significantly more complicated. Greetings, Andres Freund
On Wed, Apr 11, 2018 at 12:26 PM, Andres Freund <andres@anarazel.de> wrote: > On 2018-04-11 12:17:14 +1200, Thomas Munro wrote: >> I arrived at this idea via the realisation that the closest thing to >> prctl(PR_SET_PDEATHSIG) on BSD-family systems today is >> please-tell-my-kqueue-if-this-process-dies. It so happens that my >> kqueue patch already uses that instead of the pipe to detect >> postmaster death. The only problem is that you have to ask it, by >> calling it kevent(). In a busy loop like those two, where there is no >> other kind of waiting going on, you could do that by calling kevent() >> with timeout = 0 to check the queue. > > Which is not cheap. Certainly, but I am reacting to reports via you of performance problems coming from read(heavily_contended_fd) by saying: although we can't avoid a syscall like Linux can in this case, we *can* avoid the reportedly contended pipe mutex completely by reading only our own process's kqueue. Combined with Heikki's proposal not to check every single time through busy loops to reduce the syscall frequency, that might be the optimal solution until some hypothetical prctl(PR_SET_PDEATHSIG)-feature-match patch arrives. Just a thought. -- Thomas Munro http://www.enterprisedb.com
On Wed, Apr 11, 2018 at 12:47 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Wed, Apr 11, 2018 at 12:26 PM, Andres Freund <andres@anarazel.de> wrote: >> On 2018-04-11 12:17:14 +1200, Thomas Munro wrote: >>> I arrived at this idea via the realisation that the closest thing to >>> prctl(PR_SET_PDEATHSIG) on BSD-family systems today is >>> please-tell-my-kqueue-if-this-process-dies. It so happens that my >>> kqueue patch already uses that instead of the pipe to detect >>> postmaster death. The only problem is that you have to ask it, by >>> calling it kevent(). In a busy loop like those two, where there is no >>> other kind of waiting going on, you could do that by calling kevent() >>> with timeout = 0 to check the queue. >> >> Which is not cheap. After bouncing that idea around with a fellow pgsql-hacker off-list, I take that idea back. It's a lot simpler and as cheap if not cheaper to check getppid() == 1 or similar every Nth time through the busy loop. I heard another interesting idea -- you can use F_SETOWN + O_ASYNC to ask for SIGIO to be delivered to you when the pipe is closed. Unfortunately that'd require a different pipe for each backend so wouldn't work for us. -- Thomas Munro http://www.enterprisedb.com
Hi, I'd like to disentangle two related topics. For "I want PostmasterIsAlive() to go faster using signals on platforms that can support that", please see over here: https://www.postgresql.org/message-id/flat/7261eb39-0369-f2f4-1bb5-62f3b6083b5e@iki.fi#7261eb39-0369-f2f4-1bb5-62f3b6083b5e@iki.fi For "I want to fix all the code that ignores the WL_POSTMASTER_DEATH event and then calls PostmasterIsAlive() every time through its loop, or fails to detect postmaster death at all", this is the thread (unless someone sees a reason to reentangle them). Here's a draft patch that does that. One contentious question is: should you have to opt *in* to auto-exit-on-postmaster death? Andres opined that you should. I actually think it's not so bad if you don't have to do that, and instead have to opt out. I think of it as a kind of 'process cancellation point' or a quiet PANIC that you can opt out of. It's nice to remove the old boilerplate code without having to add a new boilerplate event that you have to remember every time. Any other opinions? I'm not sure if the exit(1) vs proc_exit(1) distinction is important. -- Thomas Munro http://www.enterprisedb.com
Вложения
On Wed, Apr 18, 2018 at 6:55 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Here's a draft patch that does that. Here's a better one (the previous version could read past the end of the occurred_events array). -- Thomas Munro http://www.enterprisedb.com
Вложения
On 18/04/18 09:55, Thomas Munro wrote: > Here's a draft patch that does that. One contentious question is: > should you have to opt *in* to auto-exit-on-postmaster death? Andres > opined that you should. I actually think it's not so bad if you don't > have to do that, and instead have to opt out. I think of it as a kind > of 'process cancellation point' or a quiet PANIC that you can opt out > of. It's nice to remove the old boilerplate code without having to > add a new boilerplate event that you have to remember every time. Any > other opinions? Hmm. Exiting on postmaster death by default does feel a bit too magical to me. But as your patch points out, there are currently no places where you *don't* want to exit on postmaster death, some callers just prefer to handle it themselves. And I like having a default or something to make sure that all call sites in the future will also exit quickly. I'd suggest that we add a new WL_EXIT_ON_POSTMASTER_DEATH flag, making it opt-on. But add an assertion in WaitLatchOrSocket: Assert ((wakeEvents & (WL_EXIT_POSTMASTER_DEATH | WL_POSTMASTER_DEATH)) != 0); That ensures that all callers either use WL_EXIT_ON_POSTMASTER_DEATH, or WL_POSTMASTER_DEATH to handle it in the caller. Having to specify WL_EXIT_ON_POSTMASTER_DEATH reminds you that the call might exit(), so if that's not what you want, you need to do something else. But the assertion makes sure that all callers deal with postmaster death in some way. - Heikki
On Sat, Jul 14, 2018 at 5:34 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 18/04/18 09:55, Thomas Munro wrote: >> Here's a draft patch that does that. One contentious question is: >> should you have to opt *in* to auto-exit-on-postmaster death? Andres >> opined that you should. I actually think it's not so bad if you don't >> have to do that, and instead have to opt out. I think of it as a kind >> of 'process cancellation point' or a quiet PANIC that you can opt out >> of. It's nice to remove the old boilerplate code without having to >> add a new boilerplate event that you have to remember every time. Any >> other opinions? > > Hmm. Exiting on postmaster death by default does feel a bit too magical to > me. But as your patch points out, there are currently no places where you > *don't* want to exit on postmaster death, some callers just prefer to handle > it themselves. And I like having a default or something to make sure that > all call sites in the future will also exit quickly. > > I'd suggest that we add a new WL_EXIT_ON_POSTMASTER_DEATH flag, making it > opt-on. Ok, it's now 2 against 1. So here's a version that uses an explicit WL_EXIT_ON_PM_DEATH value. I like that name because it's shorter and more visually distinctive (harder to confuse with WL_POSTMASTER_DEATH). > But add an assertion in WaitLatchOrSocket: > > Assert ((wakeEvents & (WL_EXIT_POSTMASTER_DEATH | WL_POSTMASTER_DEATH)) != > 0); Ok. Done for the WaitLatchXXX() interface. The WaitEventSet interface (rarely used directly) is less amenable to that change. Here are some of the places I had to add WL_EXIT_ON_PM_DEATH: gather_readnext(), shm_mq_send_bytes(), shm_mq_receive_bytes(), shm_mq_wait_internal(), ProcSleep(), ProcWaitForSignal(), pg_sleep(), pgfdw_get_result(). Was it intentional that any of those places don't currently exit on postmaster vaporisation? -- Thomas Munro http://www.enterprisedb.com
Вложения
Hello. At Wed, 18 Jul 2018 14:02:47 +1200, Thomas Munro <thomas.munro@enterprisedb.com> wrote in <CAEepm=3t57KBxb90CTqnDZiSTnTq3jwxZUc0zaFDiaOkGQqNxA@mail.gmail.com> > On Sat, Jul 14, 2018 at 5:34 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On 18/04/18 09:55, Thomas Munro wrote: > >> Here's a draft patch that does that. One contentious question is: > >> should you have to opt *in* to auto-exit-on-postmaster death? Andres > >> opined that you should. I actually think it's not so bad if you don't > >> have to do that, and instead have to opt out. I think of it as a kind > >> of 'process cancellation point' or a quiet PANIC that you can opt out > >> of. It's nice to remove the old boilerplate code without having to > >> add a new boilerplate event that you have to remember every time. Any > >> other opinions? > > > > Hmm. Exiting on postmaster death by default does feel a bit too magical to > > me. But as your patch points out, there are currently no places where you > > *don't* want to exit on postmaster death, some callers just prefer to handle > > it themselves. And I like having a default or something to make sure that > > all call sites in the future will also exit quickly. > > > > I'd suggest that we add a new WL_EXIT_ON_POSTMASTER_DEATH flag, making it > > opt-on. > > Ok, it's now 2 against 1. So here's a version that uses an explicit > WL_EXIT_ON_PM_DEATH value. I like that name because it's shorter and > more visually distinctive (harder to confuse with > WL_POSTMASTER_DEATH). > > > But add an assertion in WaitLatchOrSocket: > > > > Assert ((wakeEvents & (WL_EXIT_POSTMASTER_DEATH | WL_POSTMASTER_DEATH)) != > > 0); > > Ok. Done for the WaitLatchXXX() interface. The WaitEventSet > interface (rarely used directly) is less amenable to that change. > > Here are some of the places I had to add WL_EXIT_ON_PM_DEATH: > gather_readnext(), shm_mq_send_bytes(), shm_mq_receive_bytes(), > shm_mq_wait_internal(), ProcSleep(), ProcWaitForSignal(), pg_sleep(), > pgfdw_get_result(). > > Was it intentional that any of those places don't currently exit on > postmaster vaporisation? I think that backends are supposed to complete running query even if postmaster dies meanwhile and currently that seems true. pgfdw_get_result seems to be following the policy. Perhaps it's the same for all of the functions listed above. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, Jul 18, 2018 at 8:30 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > At Wed, 18 Jul 2018 14:02:47 +1200, Thomas Munro <thomas.munro@enterprisedb.com> wrote in <CAEepm=3t57KBxb90CTqnDZiSTnTq3jwxZUc0zaFDiaOkGQqNxA@mail.gmail.com> >> Here are some of the places I had to add WL_EXIT_ON_PM_DEATH: >> gather_readnext(), shm_mq_send_bytes(), shm_mq_receive_bytes(), >> shm_mq_wait_internal(), ProcSleep(), ProcWaitForSignal(), pg_sleep(), >> pgfdw_get_result(). >> >> Was it intentional that any of those places don't currently exit on >> postmaster vaporisation? > > I think that backends are supposed to complete running query even > if postmaster dies meanwhile and currently that seems > true. pgfdw_get_result seems to be following the policy. Perhaps > it's the same for all of the functions listed above. Hmm. Why wait any longer? The cluster is broken. Is there some correctness reason to defer shutdown in any of these places? While looking for earlier discussion that might explain why some places ignored it, I came across this email from Andres, saying the same thing. https://www.postgresql.org/message-id/20160321093534.inkduxvpirs5n44j%40alap3.anarazel.de He mentioned that syslogger.c is a special case. In my patch I added WL_EXIT_ON_PM_DEATH to SysLoggerMain()'s WaitLatch*() calls, because I have to or the new assertion fails. Hmm, yeah, that's not great because it might discard the last words of other backends. So here is a new version that treats syslogger.c specially and may have other performance benefits. -- Thomas Munro http://www.enterprisedb.com
Вложения
At Thu, 19 Jul 2018 16:58:30 +1200, Thomas Munro <thomas.munro@enterprisedb.com> wrote in <CAEepm=1R__sW-CTByAi+cFUXX8tJu8MTNdeQGZ4L-x5gZ4mxNA@mail.gmail.com> > On Wed, Jul 18, 2018 at 8:30 PM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > At Wed, 18 Jul 2018 14:02:47 +1200, Thomas Munro <thomas.munro@enterprisedb.com> wrote in <CAEepm=3t57KBxb90CTqnDZiSTnTq3jwxZUc0zaFDiaOkGQqNxA@mail.gmail.com> > >> Here are some of the places I had to add WL_EXIT_ON_PM_DEATH: > >> gather_readnext(), shm_mq_send_bytes(), shm_mq_receive_bytes(), > >> shm_mq_wait_internal(), ProcSleep(), ProcWaitForSignal(), pg_sleep(), > >> pgfdw_get_result(). > >> > >> Was it intentional that any of those places don't currently exit on > >> postmaster vaporisation? > > > > I think that backends are supposed to complete running query even > > if postmaster dies meanwhile and currently that seems > > true. pgfdw_get_result seems to be following the policy. Perhaps > > it's the same for all of the functions listed above. > > Hmm. Why wait any longer? The cluster is broken. Is there some > correctness reason to defer shutdown in any of these places? Well, please don't get me wrong. I don't object to backends' exit on postmaster death, rather I'm for it. Maybe the idea of mine came from somthing like this (perhaps newer one), in the meaning of just why it is working in that way now. https://www.postgresql.org/message-id/21877.1294946166@sss.pgh.pa.us > While looking for earlier discussion that might explain why some > places ignored it, I came across this email from Andres, saying the > same thing. > > https://www.postgresql.org/message-id/20160321093534.inkduxvpirs5n44j%40alap3.anarazel.de > > He mentioned that syslogger.c is a special case. In my patch I added > WL_EXIT_ON_PM_DEATH to SysLoggerMain()'s WaitLatch*() calls, because I > have to or the new assertion fails. Hmm, yeah, that's not great > because it might discard the last words of other backends. So here is > a new version that treats syslogger.c specially and may have other > performance benefits. Yeah. That seems good. Couldn't we reuse prepared WaitEventSet in other places? For example PgstatCollectorMain has the same characteristics, where WaitLatchOrSocket is used with fixed parameters and waiting on a socket which gets frequent receipts. # Is it intentional that the patch doesn't touch pgstat.c? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, Jul 19, 2018 at 10:30 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> Hmm. Why wait any longer? The cluster is broken. Is there some >> correctness reason to defer shutdown in any of these places? > > Well, please don't get me wrong. I don't object to backends' exit > on postmaster death, rather I'm for it. Maybe the idea of mine > came from somthing like this (perhaps newer one), in the meaning > of just why it is working in that way now. > > https://www.postgresql.org/message-id/21877.1294946166@sss.pgh.pa.us Thanks, that was interesting. It's from *before* the pipe solution was implemented though so I'm not sure it's relevant: the exit-as-soon-as-possible camp won that battle, we just missed a few places. >> He mentioned that syslogger.c is a special case. In my patch I added >> WL_EXIT_ON_PM_DEATH to SysLoggerMain()'s WaitLatch*() calls, because I >> have to or the new assertion fails. Hmm, yeah, that's not great >> because it might discard the last words of other backends. So here is >> a new version that treats syslogger.c specially and may have other >> performance benefits. > > Yeah. That seems good. Couldn't we reuse prepared WaitEventSet in > other places? For example PgstatCollectorMain has the same > characteristics, where WaitLatchOrSocket is used with fixed > parameters and waiting on a socket which gets frequent receipts. +1, but I'm considering that to be a separate project, or I'll never get this patch committed. It may be possible to have a small number of them reused in many places, and it may be possible for WaitLatchXXX() to reuse them automatically (so we don't have to change every call site). > # Is it intentional that the patch doesn't touch pgstat.c? Yes. pgstat.c still uses WL_POSTMASTER_DEATH because it does something special: it calls pgstat_write_statsfiles() before it exits. -- Thomas Munro http://www.enterprisedb.com
On Thu, Jul 19, 2018 at 11:51 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Thu, Jul 19, 2018 at 10:30 PM, Kyotaro HORIGUCHI > > Yeah. That seems good. Couldn't we reuse prepared WaitEventSet in > > other places? For example PgstatCollectorMain has the same > > characteristics, where WaitLatchOrSocket is used with fixed > > parameters and waiting on a socket which gets frequent receipts. > > +1, but I'm considering that to be a separate project, or I'll never > get this patch committed. It may be possible to have a small number > of them reused in many places, and it may be possible for > WaitLatchXXX() to reuse them automatically (so we don't have to change > every call site). > > > # Is it intentional that the patch doesn't touch pgstat.c? > > Yes. pgstat.c still uses WL_POSTMASTER_DEATH because it does > something special: it calls pgstat_write_statsfiles() before it exits. Rebased. -- Thomas Munro http://www.enterprisedb.com
Вложения
At Sun, 2 Sep 2018 07:04:19 +1200, Thomas Munro <thomas.munro@enterprisedb.com> wrote in <CAEepm=0=PkSXQ5oNU8BhY1DEzxw_EspsU14D_zAPnj+fBAjxFQ@mail.gmail.com> > > > # Is it intentional that the patch doesn't touch pgstat.c? > > > > Yes. pgstat.c still uses WL_POSTMASTER_DEATH because it does > > something special: it calls pgstat_write_statsfiles() before it exits. Mmm. Exactly.. > Rebased. Thank you for the new version. === In sysloger.c, cur_flags is (just set but) no longer used. === In latch.c, - The parentheses around the symbols don't seem to be needed. | (wakeEvents & (WL_EXIT_ON_PM_DEATH)) != 0 || | (wakeEvents & (WL_POSTMASTER_DEATH)) != 0); - The following assertion looks contradicting to the comment. | /* Postmaster-managed callers must handle postmaster death somehow. */ | Assert(!IsUnderPostmaster || | (wakeEvents & (WL_EXIT_ON_PM_DEATH)) != 0 || | (wakeEvents & (WL_POSTMASTER_DEATH)) != 0); (Maybe Assert(IsUnderPost && (wakeEv & (WL_EXI | WL_PO)) != 0);) And don't we need a description about this restriction in the function comment? - I think it may be better that the followings had parentheses around '&' expressions. | if (wakeEvents & WL_POSTMASTER_DEATH && IsUnderPostmaster) | if (wakeEvents & WL_EXIT_ON_PM_DEATH && IsUnderPostmaster) === All the caller sites of WaitLatch, WaitLatchOrSocket and WaitEventSetWait are covered by this patch and all them look fine. bgworker.c, pgstat.c, be-secure-openssl.c, be-secure.c: Not modified on purpose. WL_EXIT_POSTMASTER_DEATH is not proper to use there. pgarch.c, syncrep.c, walsender.c: Removed redundant check of postmaster death. syslogger.c: Left as it doesn't exit at postmaster death on purpose. Uses reusable wait event set. walrceiver.c: Adds new bailing out point in WalRcvWaitForStartPosition and it seems reasonable. shm_mq.c: Adds PMdie bail out in shm_mq_send/receive_bytes, wait_internal. It looks fine. postgres_fdw/connection.c: Adds pm_die bailout while getting result. This seems to be a bug fix. I'm fine with it included in this patch. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Ugrrr! PLEASE ignore this! It's not wrong at all.
--
Kyotaro Horiguchi
NTT Open Source Software Center
2018年9月6日(木) 18:58 Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>:
- The following assertion looks contradicting to the comment.
| /* Postmaster-managed callers must handle postmaster death somehow. */
| Assert(!IsUnderPostmaster ||
| (wakeEvents & (WL_EXIT_ON_PM_DEATH)) != 0 ||
| (wakeEvents & (WL_POSTMASTER_DEATH)) != 0);
Kyotaro Horiguchi
NTT Open Source Software Center
On Thu, Sep 6, 2018 at 9:57 PM Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > In sysloger.c, cur_flags is (just set but) no longer used. Right. Fixed. > === > In latch.c, > > - The parentheses around the symbols don't seem to be needed. > | (wakeEvents & (WL_EXIT_ON_PM_DEATH)) != 0 || > | (wakeEvents & (WL_POSTMASTER_DEATH)) != 0); Fixed. > And don't we need a description about this restriction in the > function comment? Ok, added. > - I think it may be better that the followings had parentheses > around '&' expressions. > > | if (wakeEvents & WL_POSTMASTER_DEATH && IsUnderPostmaster) > | if (wakeEvents & WL_EXIT_ON_PM_DEATH && IsUnderPostmaster) Fixed. Also I was a little inconsistent about whether to use implicit or explicit comparison with 0, and decided to go for the former. > All the caller sites of WaitLatch, WaitLatchOrSocket and > WaitEventSetWait are covered by this patch and all them look > fine. Thanks for the review! While rebasing, I also changed WL_EXIT_ON_PM_DEATH's means of exit from exit(1) to proc_exit(1). In master we're quite inconsistent about that as you can see from the lines removed by this patch, but the comments for proc_exit() seem to insist that it is the one true way out. Other than an elog(DEBUG3) message, the main difference between proc_exit() and direct exit() seems to be the PROFILE_PID_DIR stuff that changes directory on the way out the door for gprof users. It looks like per-pid gmon files could have been achieved by setting environment variables[1][2], but I guess there is some value in doing it the same way on every platform. [1] https://sourceware.org/git/?p=glibc.git;a=blob;f=gmon/gmon.c;hb=HEAD#l354 [2] https://github.com/freebsd/freebsd/blob/master/lib/libc/gmon/gmon.c#L154 -- Thomas Munro http://www.enterprisedb.com
Вложения
Thank you for the fix. At Tue, 23 Oct 2018 17:26:37 +1300, Thomas Munro <thomas.munro@enterprisedb.com> wrote in <CAEepm=3BxOHw63Anwyx=pyy7Ju8cW6T0ZXNeKQS=WZp80hv_rg@mail.gmail.com> > On Thu, Sep 6, 2018 at 9:57 PM Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > And don't we need a description about this restriction in the > > function comment? > > Ok, added. Thank you. It looks good. > While rebasing, I also changed WL_EXIT_ON_PM_DEATH's means of exit > from exit(1) to proc_exit(1). In master we're quite inconsistent > about that as you can see from the lines removed by this patch, but > the comments for proc_exit() seem to insist that it is the one true > way out. Other than an elog(DEBUG3) message, the main difference > between proc_exit() and direct exit() seems to be the PROFILE_PID_DIR > stuff that changes directory on the way out the door for gprof users. It looks exactly like that for me, too. > It looks like per-pid gmon files could have been achieved by setting > environment variables[1][2], but I guess there is some value in doing > it the same way on every platform. > > [1] https://sourceware.org/git/?p=glibc.git;a=blob;f=gmon/gmon.c;hb=HEAD#l354 > [2] https://github.com/freebsd/freebsd/blob/master/lib/libc/gmon/gmon.c#L154 Agreed. Anyway they don't allow the tweak for avworker. It seems fine for me. I'm going to mark this ReadyForCommitter after a few days of waiting for anyone who has more comments/opinions. https://commitfest.postgresql.org/20/1618/ regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Fri, 26 Oct 2018 14:13:51 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in <20181026.141351.09076928.horiguchi.kyotaro@lab.ntt.co.jp> > Thank you for the fix. > > At Tue, 23 Oct 2018 17:26:37 +1300, Thomas Munro <thomas.munro@enterprisedb.com> wrote in <CAEepm=3BxOHw63Anwyx=pyy7Ju8cW6T0ZXNeKQS=WZp80hv_rg@mail.gmail.com> > > On Thu, Sep 6, 2018 at 9:57 PM Kyotaro HORIGUCHI > > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > > And don't we need a description about this restriction in the > > > function comment? > > > > Ok, added. > > Thank you. It looks good. > > > While rebasing, I also changed WL_EXIT_ON_PM_DEATH's means of exit > > from exit(1) to proc_exit(1). In master we're quite inconsistent > > about that as you can see from the lines removed by this patch, but > > the comments for proc_exit() seem to insist that it is the one true > > way out. Other than an elog(DEBUG3) message, the main difference > > between proc_exit() and direct exit() seems to be the PROFILE_PID_DIR > > stuff that changes directory on the way out the door for gprof users. > > It looks exactly like that for me, too. > > > It looks like per-pid gmon files could have been achieved by setting > > environment variables[1][2], but I guess there is some value in doing > > it the same way on every platform. > > > > [1] https://sourceware.org/git/?p=glibc.git;a=blob;f=gmon/gmon.c;hb=HEAD#l354 > > [2] https://github.com/freebsd/freebsd/blob/master/lib/libc/gmon/gmon.c#L154 > > Agreed. Anyway they don't allow the tweak for avworker. > > It seems fine for me. I'm going to mark this ReadyForCommitter > after a few days of waiting for anyone who has more > comments/opinions. > > https://commitfest.postgresql.org/20/1618/ Seen no objection nor further comments. I marked this as "Ready for Committer". regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Thomas Munro <thomas.munro@enterprisedb.com> writes: > [ 0001-Add-WL_EXIT_ON_PM_DEATH-pseudo-event-v4.patch ] I took a quick look through this. I have no objection to the idea of letting the latch infrastructure do the proc_exit(1), but I'm wondering why this is in the thread that it's in. Is there any remaining connection to the original complaint about BSDen not coping well with lots of processes waiting on the same pipe? A couple minor code gripes: - rc = WaitLatch(MyLatch, - WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, - (nap.tv_sec * 1000L) + (nap.tv_usec / 1000L), - WAIT_EVENT_AUTOVACUUM_MAIN); + WaitLatch(MyLatch, + WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, + (nap.tv_sec * 1000L) + (nap.tv_usec / 1000L), + WAIT_EVENT_AUTOVACUUM_MAIN); I'd advise making the code in places like this look like (void) WaitLatch(MyLatch, ...); Otherwise, you are likely to draw complaints about "return value is sometimes ignored" from Coverity and other static analyzers. The (void) cast makes it explicit that you're intentionally ignoring the result value in this one place. @@ -1021,6 +1051,19 @@ WaitEventSetWait(WaitEventSet *set, long timeout, pgstat_report_wait_end(); + /* + * Exit immediately if the postmaster died and the caller asked for + * automatic exit in that case. + */ + if (set->exit_on_postmaster_death) + { + for (i = 0; i < returned_events; ++i) + { + if (occurred_events[i].events == WL_POSTMASTER_DEATH) + proc_exit(1); + } + } + return returned_events; } Since exit_on_postmaster_death = true is going to be the normal case, it seems a bit unfortunate that we have to incur this looping every time we go through WaitEventSetWait. Can't we put the handling of this in some spot where it only gets executed when we've detected WL_POSTMASTER_DEATH, like right after the PostmasterIsAliveInternal calls? if (!PostmasterIsAliveInternal()) { + if (set->exit_on_postmaster_death) + proc_exit(1); occurred_events->fd = PGINVALID_SOCKET; occurred_events->events = WL_POSTMASTER_DEATH; occurred_events++; returned_events++; } regards, tom lane
On Sat, Nov 17, 2018 at 5:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@enterprisedb.com> writes: > > [ 0001-Add-WL_EXIT_ON_PM_DEATH-pseudo-event-v4.patch ] > > I took a quick look through this. I have no objection to the idea of > letting the latch infrastructure do the proc_exit(1), but I'm wondering > why this is in the thread that it's in. Is there any remaining connection > to the original complaint about BSDen not coping well with lots of > processes waiting on the same pipe? It turned into a general clean-up of inconsistent postmaster death handling after the one-pipe-per-backend proposal was rejected. Perhaps I should have started a new thread, but it does in fact address most cases of processes calling PostmasterIsAlive() frequently, which was part of the original complaint. That falls into the last-mentioned category from this paragraph of the commit message: Repair all code that was previously ignoring postmaster death completely, or requesting the event but ignoring it, or requesting the event but then doing an unconditional PostmasterIsAlive() call every time through its event loop (which is an expensive syscall on platforms for which we don't have USE_POSTMASTER_DEATH_SIGNAL support). Other potential improvements that could be made on both sides: * We should probably be smarter about how often we call PostmasterIsAlive() in the recovery loop, which is the only remaining case like that (because it's a busy loop, given enough WAL to chew on, so there is no waiting primitive that would report pipe readiness). * It would be nice if more kernels supported parent death signals. * Based on this report, those kernels really should sort out their wakeup logic for duplicated pipe fds. I wonder if other multi-process applications have the same problem. Our logger pipe and the proposed checkpointer fsync pipe too, maybe, not sure. > I'd advise making the code in places like this look like > > (void) WaitLatch(MyLatch, ...); > > Otherwise, you are likely to draw complaints about "return value is > sometimes ignored" from Coverity and other static analyzers. The > (void) cast makes it explicit that you're intentionally ignoring > the result value in this one place. Done. > Since exit_on_postmaster_death = true is going to be the normal case, > it seems a bit unfortunate that we have to incur this looping every time > we go through WaitEventSetWait. Can't we put the handling of this in some > spot where it only gets executed when we've detected WL_POSTMASTER_DEATH, > like right after the PostmasterIsAliveInternal calls? > > if (!PostmasterIsAliveInternal()) > { > + if (set->exit_on_postmaster_death) > + proc_exit(1); > occurred_events->fd = PGINVALID_SOCKET; > occurred_events->events = WL_POSTMASTER_DEATH; > occurred_events++; > returned_events++; > } I was probably trying to avoid repeating the code in each of the 3 implementations of that function (poll, epoll, win32). But there is already other similar duplication there and I agree that is better. Done that way. Thanks for the review. -- Thomas Munro http://www.enterprisedb.com
Вложения
On Sat, Nov 17, 2018 at 2:27 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Thanks for the review. Pushed. -- Thomas Munro http://www.enterprisedb.com