Preventing hangups in bgworker start/stop during DB shutdown

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Preventing hangups in bgworker start/stop during DB shutdown
Дата
Msg-id 661570.1608673226@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: Preventing hangups in bgworker start/stop during DB shutdown  (Craig Ringer <craig.ringer@enterprisedb.com>)
Re: Preventing hangups in bgworker start/stop during DB shutdown  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Список pgsql-hackers
Here's an attempt at closing the race condition discussed in [1]
(and in some earlier threads, though I'm too lazy to find them).

The core problem is that the bgworker management APIs were designed
without any thought for exception conditions, notably "we're not
gonna launch any more workers because we're shutting down the database".
A process waiting for a worker in WaitForBackgroundWorkerStartup or
WaitForBackgroundWorkerShutdown will wait forever, so that the database
fails to shut down without manual intervention.

I'd supposed that we would need some incompatible changes in those APIs
in order to fix this, but after further study it seems like we could
hack things by just acting as though a request that won't be serviced
has already run to completion.  I'm not terribly satisfied with that
as a long-term solution --- it seems to me that callers should be told
that there was a failure.  But this looks to be enough to solve the
lockup condition for existing callers, and it seems like it'd be okay
to backpatch.

Thoughts?

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/16785-c0207d8c67fb5f25%40postgresql.org

diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index 7ef6259eb5..b3ab8b0fa3 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -231,13 +231,15 @@ FindRegisteredWorkerBySlotNumber(int slotno)
 }

 /*
- * Notice changes to shared memory made by other backends.  This code
- * runs in the postmaster, so we must be very careful not to assume that
- * shared memory contents are sane.  Otherwise, a rogue backend could take
- * out the postmaster.
+ * Notice changes to shared memory made by other backends.
+ * Accept new dynamic worker requests only if allow_new_workers is true.
+ *
+ * This code runs in the postmaster, so we must be very careful not to assume
+ * that shared memory contents are sane.  Otherwise, a rogue backend could
+ * take out the postmaster.
  */
 void
-BackgroundWorkerStateChange(void)
+BackgroundWorkerStateChange(bool allow_new_workers)
 {
     int            slotno;

@@ -297,6 +299,15 @@ BackgroundWorkerStateChange(void)
             continue;
         }

+        /*
+         * If this is a dynamic worker request, and we aren't allowing new
+         * dynamic workers, then immediately mark it for termination; the next
+         * stanza will take care of cleaning it up.
+         */
+        if (slot->worker.bgw_restart_time == BGW_NEVER_RESTART &&
+            !allow_new_workers)
+            slot->terminate = true;
+
         /*
          * If the worker is marked for termination, we don't need to add it to
          * the registered workers list; we can just free the slot. However, if
@@ -503,12 +514,54 @@ BackgroundWorkerStopNotifications(pid_t pid)
     }
 }

+/*
+ * Cancel any not-yet-started worker requests that have BGW_NEVER_RESTART.
+ *
+ * This is called during a normal ("smart" or "fast") database shutdown.
+ * After this point, no new background workers will be started, so any
+ * dynamic worker requests should be killed off, allowing anything that
+ * might be waiting for them to clean up and exit.
+ *
+ * This function should only be called from the postmaster.
+ */
+void
+ForgetUnstartedBackgroundWorkers(void)
+{
+    slist_mutable_iter iter;
+
+    slist_foreach_modify(iter, &BackgroundWorkerList)
+    {
+        RegisteredBgWorker *rw;
+        BackgroundWorkerSlot *slot;
+
+        rw = slist_container(RegisteredBgWorker, rw_lnode, iter.cur);
+        Assert(rw->rw_shmem_slot < max_worker_processes);
+        slot = &BackgroundWorkerData->slot[rw->rw_shmem_slot];
+
+        /* If it's not yet started, and it's a dynamic worker ... */
+        if (slot->pid == InvalidPid &&
+            rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
+        {
+            /* ... then zap it, and notify anything that was waiting */
+            int            notify_pid = rw->rw_worker.bgw_notify_pid;
+
+            ForgetBackgroundWorker(&iter);
+            if (notify_pid != 0)
+                kill(notify_pid, SIGUSR1);
+        }
+    }
+}
+
 /*
  * Reset background worker crash state.
  *
  * We assume that, after a crash-and-restart cycle, background workers without
  * the never-restart flag should be restarted immediately, instead of waiting
- * for bgw_restart_time to elapse.
+ * for bgw_restart_time to elapse.  On the other hand, workers with that flag
+ * should be forgotten, because they are dynamic requests from processes that
+ * no longer exist.
+ *
+ * This function should only be called from the postmaster.
  */
 void
 ResetBackgroundWorkerCrashTimes(void)
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 5d09822c81..fa35bf4369 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -3796,6 +3796,13 @@ PostmasterStateMachine(void)
      */
     if (pmState == PM_STOP_BACKENDS)
     {
+        /*
+         * Forget any pending requests for dynamic background workers, since
+         * we're no longer willing to launch any new workers.  (If additional
+         * requests arrive, BackgroundWorkerStateChange will reject them.)
+         */
+        ForgetUnstartedBackgroundWorkers();
+
         /* Signal all backend children except walsenders */
         SignalSomeChildren(SIGTERM,
                            BACKEND_TYPE_ALL - BACKEND_TYPE_WALSND);
@@ -5153,7 +5160,8 @@ sigusr1_handler(SIGNAL_ARGS)
     /* Process background worker state change. */
     if (CheckPostmasterSignal(PMSIGNAL_BACKGROUND_WORKER_CHANGE))
     {
-        BackgroundWorkerStateChange();
+        /* Accept new dynamic worker requests only if not stopping. */
+        BackgroundWorkerStateChange(pmState < PM_STOP_BACKENDS);
         StartWorkerNeeded = true;
     }

diff --git a/src/include/postmaster/bgworker_internals.h b/src/include/postmaster/bgworker_internals.h
index f7e24664d5..dd6cabc45b 100644
--- a/src/include/postmaster/bgworker_internals.h
+++ b/src/include/postmaster/bgworker_internals.h
@@ -46,11 +46,12 @@ extern slist_head BackgroundWorkerList;

 extern Size BackgroundWorkerShmemSize(void);
 extern void BackgroundWorkerShmemInit(void);
-extern void BackgroundWorkerStateChange(void);
+extern void BackgroundWorkerStateChange(bool allow_new_workers);
 extern void ForgetBackgroundWorker(slist_mutable_iter *cur);
 extern void ReportBackgroundWorkerPID(RegisteredBgWorker *);
 extern void ReportBackgroundWorkerExit(slist_mutable_iter *cur);
 extern void BackgroundWorkerStopNotifications(pid_t pid);
+extern void ForgetUnstartedBackgroundWorkers(void);
 extern void ResetBackgroundWorkerCrashTimes(void);

 /* Function to start a background worker, called from postmaster.c */

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: Proposed patch for key managment
Следующее
От: Tom Lane
Дата:
Сообщение: Re: libpq compression