Re: assertion at postmaster start

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: assertion at postmaster start
Дата
Msg-id 27707.1566680146@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: assertion at postmaster start  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: assertion at postmaster start  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
I wrote:
> I think what this demonstrates is that that Assert is just wrong:
> we *can* reach PM_RUN with the flag still set, so we should do
>             StartupStatus = STARTUP_NOT_RUNNING;
>             FatalError = false;
> -            Assert(AbortStartTime == 0);
> +            AbortStartTime = 0;
>             ReachedNormalRunning = true;
>             pmState = PM_RUN;
> Probably likewise for the similar Assert in sigusr1_handler.

Poking further at this, I noticed that the code just above here completely
fails to do what the comments say it intends to do, which is restart the
startup process after we've SIGQUIT'd it.  That's because the careful
manipulation of StartupStatus in reaper (lines 2943ff in HEAD) is stomped
on by HandleChildCrash, which will just unconditionally reset it to
STARTUP_CRASHED (line 3507).  So we end up shutting down the database
after all, which is not what the intention seems to be.  Hence,
commit 45811be94 was still a few bricks shy of a load :-(.

I propose the attached.  I'm inclined to think that the risk/benefit
of back-patching this is not very good, so I just want to stick it in
HEAD, unless somebody can explain why dead_end children are likely to
crash in the field.

            regards, tom lane

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 3339804..ecb7892 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2920,7 +2920,9 @@ reaper(SIGNAL_ARGS)
              * during PM_STARTUP is treated as catastrophic. There are no
              * other processes running yet, so we can just exit.
              */
-            if (pmState == PM_STARTUP && !EXIT_STATUS_0(exitstatus))
+            if (pmState == PM_STARTUP &&
+                StartupStatus != STARTUP_SIGNALED &&
+                !EXIT_STATUS_0(exitstatus))
             {
                 LogChildExit(LOG, _("startup process"),
                              pid, exitstatus);
@@ -2937,11 +2939,24 @@ reaper(SIGNAL_ARGS)
              * then we previously sent the startup process a SIGQUIT; so
              * that's probably the reason it died, and we do want to try to
              * restart in that case.
+             *
+             * This stanza also handles the case where we sent a SIGQUIT
+             * during PM_STARTUP due to some dead_end child crashing: in that
+             * situation, if the startup process dies on the SIGQUIT, we need
+             * to transition to PM_WAIT_BACKENDS state which will allow
+             * PostmasterStateMachine to restart the startup process.  (On the
+             * other hand, the startup process might complete normally, if we
+             * were too late with the SIGQUIT.  In that case we'll fall
+             * through and commence normal operations.)
              */
             if (!EXIT_STATUS_0(exitstatus))
             {
                 if (StartupStatus == STARTUP_SIGNALED)
+                {
                     StartupStatus = STARTUP_NOT_RUNNING;
+                    if (pmState == PM_STARTUP && Shutdown == NoShutdown)
+                        pmState = PM_WAIT_BACKENDS;
+                }
                 else
                     StartupStatus = STARTUP_CRASHED;
                 HandleChildCrash(pid, exitstatus,
@@ -2954,7 +2969,7 @@ reaper(SIGNAL_ARGS)
              */
             StartupStatus = STARTUP_NOT_RUNNING;
             FatalError = false;
-            Assert(AbortStartTime == 0);
+            AbortStartTime = 0;
             ReachedNormalRunning = true;
             pmState = PM_RUN;

@@ -3504,7 +3519,7 @@ HandleChildCrash(int pid, int exitstatus, const char *procname)
     if (pid == StartupPID)
     {
         StartupPID = 0;
-        StartupStatus = STARTUP_CRASHED;
+        /* Caller adjusts StartupStatus, so don't touch it here */
     }
     else if (StartupPID != 0 && take_action)
     {
@@ -5100,7 +5115,7 @@ sigusr1_handler(SIGNAL_ARGS)
     {
         /* WAL redo has started. We're out of reinitialization. */
         FatalError = false;
-        Assert(AbortStartTime == 0);
+        AbortStartTime = 0;

         /*
          * Crank up the background tasks.  It doesn't matter if this fails,

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: LLVM breakage on seawasp
Следующее
От: Tom Lane
Дата:
Сообщение: Re: LLVM breakage on seawasp