Обсуждение: Pending query cancel defeats SIGQUIT
I've been doing an excess of immediate shutdowns lately, and that has turned up bugs old and new. This one goes back to 8.4 or earlier. If a query cancel is pending when a backend receives SIGQUIT, the cancel takes precedence and the backend survives: [local] test=# select nmtest_spin(false); Cancel request sent WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because anotherserver process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. ERROR: canceling statement due to user request Time: 4932.257 ms [local] test=# select 1; ?column? ---------- 1 (1 row) The errfinish() pertaining to that WARNING issues CHECK_FOR_INTERRUPTS(), and the query cancel pending since before the SIGQUIT arrived then takes effect. This is less bad on 9.4, because the postmaster will SIGKILL the backend after 5s. On older releases, the backend persists indefinitely. Let's fix this by holding interrupts for the duration of quickdie(); see attached patch. Surely we don't want any other kind of backend demise taking precedence over quickdie(). Unfortunately, this patch does not fully prevent that. If ImmediateInterruptOK==true, a SIGINT could arrive and longjmp() between the start of quickdie() and its PG_SETMASK() call. The only decently-portable way I know to close that race is to name SIGINT/SIGTERM in the SIGQUIT handler's sa_mask. In any event, that's a *far* narrower race and is a general problem shared by most of our signal use. I'm content to not fix it in this patch, but I propose adding it as a TODO. Here is the source code for the nmtest_spin() function used above: Datum nmtest_spin(PG_FUNCTION_ARGS) { bool no_sigquit = PG_GETARG_BOOL(0); if (no_sigquit) { sigset_t mask; sigemptyset(&mask); sigaddset(&mask, SIGQUIT); sigprocmask(SIG_BLOCK, &mask, NULL); } for (;;) sleep(1); } Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com
Вложения
Noah Misch-2 wrote > The errfinish() pertaining to that WARNING issues CHECK_FOR_INTERRUPTS(), > and > the query cancel pending since before the SIGQUIT arrived then takes > effect. > This is less bad on 9.4, because the postmaster will SIGKILL the backend > after > 5s. On older releases, the backend persists indefinitely. 9.4 == head or is this is typo? Your feelings on how far to back-patch? David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Pending-query-cancel-defeats-SIGQUIT-tp5770390p5770394.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On Tue, Sep 10, 2013 at 07:13:16PM -0700, David Johnston wrote: > Noah Misch-2 wrote > > The errfinish() pertaining to that WARNING issues CHECK_FOR_INTERRUPTS(), > > and > > the query cancel pending since before the SIGQUIT arrived then takes > > effect. > > This is less bad on 9.4, because the postmaster will SIGKILL the backend > > after > > 5s. On older releases, the backend persists indefinitely. > > 9.4 == head or is this is typo? 9.4 == head > Your feelings on how far to back-patch? All supported versions. The current behavior is a bug every way I look at it. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
* Noah Misch (noah@leadboat.com) wrote: > On Tue, Sep 10, 2013 at 07:13:16PM -0700, David Johnston wrote: > > Your feelings on how far to back-patch? > > All supported versions. The current behavior is a bug every way I look at it. Agreed. I noticed a stale backend hanging around after a cancel/shutdown and this sounds very likely to be the cause and it'd be really great to get it fixed. Thanks, Stephen
On Wed, Sep 11, 2013 at 10:38 AM, Stephen Frost <sfrost@snowman.net> wrote: > * Noah Misch (noah@leadboat.com) wrote: >> On Tue, Sep 10, 2013 at 07:13:16PM -0700, David Johnston wrote: >> > Your feelings on how far to back-patch? >> >> All supported versions. The current behavior is a bug every way I look at it. > > Agreed. I noticed a stale backend hanging around after a > cancel/shutdown and this sounds very likely to be the cause and it'd be > really great to get it fixed. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Sep 11, 2013 at 02:00:41PM -0400, Robert Haas wrote: > On Wed, Sep 11, 2013 at 10:38 AM, Stephen Frost <sfrost@snowman.net> wrote: > > * Noah Misch (noah@leadboat.com) wrote: > >> On Tue, Sep 10, 2013 at 07:13:16PM -0700, David Johnston wrote: > >> > Your feelings on how far to back-patch? > >> > >> All supported versions. The current behavior is a bug every way I look at it. > > > > Agreed. I noticed a stale backend hanging around after a > > cancel/shutdown and this sounds very likely to be the cause and it'd be > > really great to get it fixed. > > +1. Committed; TODO item added for the remaining race condition. -- Noah Misch EnterpriseDB http://www.enterprisedb.com