Re: BUG #7643: Issuing a shutdown request while server startup leads to server hang

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: BUG #7643: Issuing a shutdown request while server startup leads to server hang
Дата
Msg-id 14768.1353445229@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: BUG #7643: Issuing a shutdown request while server startup leads to server hang  (Hari Babu <haribabu.kommi@huawei.com>)
Ответы Re: BUG #7643: Issuing a shutdown request while server startup leads to server hang  (Hari Babu <haribabu.kommi@huawei.com>)
Список pgsql-bugs
Hari Babu <haribabu.kommi@huawei.com> writes:
>> We're going to need more details about how to reproduce this.

> The problem occurs only when active server is restarting by just adding a
> recovery.conf file to the data directory.

Well, you can't just put an empty file there, but I eventually managed
to reproduce this with the suggested hack in xlog.c.

I think the key problem is that postmaster.c's sigusr1_handler() is
willing to start new children even after shutdown has been initiated.
I don't see any good reason for it to do that, so I think the
appropriate patch is as attached.

Changing that still leaves us with the postmaster thinking that the
eventual exit(1) of the startup process is a "crash".  This is mostly
cosmetic since it still shuts down okay, but we can fix it by reversing
the order of the first two checks in reaper() --- that is, if Shutdown
is set, we should prefer that code path even if we're in PM_STARTUP
state.

I concluded that it probably wasn't a good idea to have the additional
state transition in SIGINT handling.  Generally PM_STARTUP means "we're
running the startup process and nothing else", and that's useful state
info that we shouldn't throw away lightly.


            regards, tom lane

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b223feefbab0645667449f643c6c8adee3747ef0..6f93d93fa3f7577fb9157f0bea805c427e3605dd 100644
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
*************** pmdie(SIGNAL_ARGS)
*** 2261,2269 ****
              if (pmState == PM_RECOVERY)
              {
                  /*
!                  * Only startup, bgwriter, and checkpointer should be active
!                  * in this state; we just signaled the first two, and we don't
!                  * want to kill checkpointer yet.
                   */
                  pmState = PM_WAIT_BACKENDS;
              }
--- 2261,2269 ----
              if (pmState == PM_RECOVERY)
              {
                  /*
!                  * Only startup, bgwriter, walreceiver, and/or checkpointer
!                  * should be active in this state; we just signaled the first
!                  * three, and we don't want to kill checkpointer yet.
                   */
                  pmState = PM_WAIT_BACKENDS;
              }
*************** reaper(SIGNAL_ARGS)
*** 2355,2360 ****
--- 2355,2372 ----
              StartupPID = 0;

              /*
+              * Startup process exited in response to a shutdown request (or it
+              * completed normally regardless of the shutdown request).
+              */
+             if (Shutdown > NoShutdown &&
+                 (EXIT_STATUS_0(exitstatus) || EXIT_STATUS_1(exitstatus)))
+             {
+                 pmState = PM_WAIT_BACKENDS;
+                 /* PostmasterStateMachine logic does the rest */
+                 continue;
+             }
+
+             /*
               * Unexpected exit of startup process (including FATAL exit)
               * during PM_STARTUP is treated as catastrophic. There are no
               * other processes running yet, so we can just exit.
*************** reaper(SIGNAL_ARGS)
*** 2369,2386 ****
              }

              /*
-              * Startup process exited in response to a shutdown request (or it
-              * completed normally regardless of the shutdown request).
-              */
-             if (Shutdown > NoShutdown &&
-                 (EXIT_STATUS_0(exitstatus) || EXIT_STATUS_1(exitstatus)))
-             {
-                 pmState = PM_WAIT_BACKENDS;
-                 /* PostmasterStateMachine logic does the rest */
-                 continue;
-             }
-
-             /*
               * After PM_STARTUP, any unexpected exit (including FATAL exit) of
               * the startup process is catastrophic, so kill other children,
               * and set RecoveryError so we don't try to reinitialize after
--- 2381,2386 ----
*************** sigusr1_handler(SIGNAL_ARGS)
*** 4283,4289 ****
       * first. We don't want to go back to recovery in that case.
       */
      if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_STARTED) &&
!         pmState == PM_STARTUP)
      {
          /* WAL redo has started. We're out of reinitialization. */
          FatalError = false;
--- 4283,4289 ----
       * first. We don't want to go back to recovery in that case.
       */
      if (CheckPostmasterSignal(PMSIGNAL_RECOVERY_STARTED) &&
!         pmState == PM_STARTUP && Shutdown == NoShutdown)
      {
          /* WAL redo has started. We're out of reinitialization. */
          FatalError = false;
*************** sigusr1_handler(SIGNAL_ARGS)
*** 4300,4306 ****
          pmState = PM_RECOVERY;
      }
      if (CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) &&
!         pmState == PM_RECOVERY)
      {
          /*
           * Likewise, start other special children as needed.
--- 4300,4306 ----
          pmState = PM_RECOVERY;
      }
      if (CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) &&
!         pmState == PM_RECOVERY && Shutdown == NoShutdown)
      {
          /*
           * Likewise, start other special children as needed.
*************** sigusr1_handler(SIGNAL_ARGS)
*** 4331,4337 ****
          signal_child(SysLoggerPID, SIGUSR1);
      }

!     if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER))
      {
          /*
           * Start one iteration of the autovacuum daemon, even if autovacuuming
--- 4331,4338 ----
          signal_child(SysLoggerPID, SIGUSR1);
      }

!     if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER) &&
!         Shutdown == NoShutdown)
      {
          /*
           * Start one iteration of the autovacuum daemon, even if autovacuuming
*************** sigusr1_handler(SIGNAL_ARGS)
*** 4345,4351 ****
          start_autovac_launcher = true;
      }

!     if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER))
      {
          /* The autovacuum launcher wants us to start a worker process. */
          StartAutovacuumWorker();
--- 4346,4353 ----
          start_autovac_launcher = true;
      }

!     if (CheckPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER) &&
!         Shutdown == NoShutdown)
      {
          /* The autovacuum launcher wants us to start a worker process. */
          StartAutovacuumWorker();
*************** sigusr1_handler(SIGNAL_ARGS)
*** 4354,4360 ****
      if (CheckPostmasterSignal(PMSIGNAL_START_WALRECEIVER) &&
          WalReceiverPID == 0 &&
          (pmState == PM_STARTUP || pmState == PM_RECOVERY ||
!          pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY))
      {
          /* Startup Process wants us to start the walreceiver process. */
          WalReceiverPID = StartWalReceiver();
--- 4356,4363 ----
      if (CheckPostmasterSignal(PMSIGNAL_START_WALRECEIVER) &&
          WalReceiverPID == 0 &&
          (pmState == PM_STARTUP || pmState == PM_RECOVERY ||
!          pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY) &&
!         Shutdown == NoShutdown)
      {
          /* Startup Process wants us to start the walreceiver process. */
          WalReceiverPID = StartWalReceiver();

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #7685: last_value() not consistent throughout window partition
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: BUG #7676: pgSocketCheck dosen`t return