Re: Race conditions with checkpointer and shutdown
От | Tom Lane |
---|---|
Тема | Re: Race conditions with checkpointer and shutdown |
Дата | |
Msg-id | 28461.1555624659@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: Race conditions with checkpointer and shutdown (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: Race conditions with checkpointer and shutdown
(Michael Paquier <michael@paquier.xyz>)
|
Список | pgsql-hackers |
Robert Haas <robertmhaas@gmail.com> wrote (in the other thread): > Any idea whether it's something newly-introduced or of long standing? It's the latter. I searched the buildfarm database for failure logs including the string "server does not shut down" within the last three years, and got all of the hits attached. Not all of these look like the failure pattern Michael pointed to, but enough of them do to say that the problem has existed since at least mid-2017. To be concrete, we have quite a sample of cases where a standby server has received a "fast shutdown" signal and acknowledged that in its log, but it never gets to the expected "shutting down" message, meaning it never starts the shutdown checkpoint let alone finishes it. The oldest case that clearly looks like that is https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=nightjar&dt=2017-06-02%2018%3A54%3A29 A significant majority of the recent cases look just like the piculet failure Michael pointed to, that is we fail to shut down the "london" server while it's acting as standby in the recovery/t/009_twophase.pl test. But there are very similar failures in other tests. I also notice that the population of machines showing the problem seems heavily skewed towards, um, weird cases. For instance, in the set that have shown this type of failure since January, we have dragonet: uses JIT francolin: --disable-spinlocks gull: armv7 mereswine: armv7 piculet: --disable-atomics sidewinder: amd64, but running netbsd 7 (and this was 9.6, note) spurfowl: fairly generic amd64 This leads me to suspect that the problem is (a) some very low-level issue in spinlocks or or latches or the like, or (b) a timing problem that just doesn't show up on generic Intel-oid platforms. The timing theory is maybe a bit stronger given that one test case shows this more often than others. I've not got any clear ideas beyond that. Anyway, this is *not* new in v12. regards, tom lane sysname | snapshot | stage | l ---------------+---------------------+---------------------+------------------------------------- jacana | 2016-07-23 06:15:32 | pg_upgradeCheck | pg_ctl: server does not shut down\r pademelon | 2016-08-14 03:49:36 | ECPG-Check | pg_ctl: server does not shut down mereswine | 2017-02-13 14:24:37 | Check | pg_ctl: server does not shut down arapaima | 2017-03-04 20:06:10 | StopDb-C:4 | pg_ctl: server does not shut down nightjar | 2017-06-02 18:54:29 | SubscriptionCheck | pg_ctl: server does not shut down nightjar | 2017-06-02 19:54:11 | SubscriptionCheck | pg_ctl: server does not shut down nightjar | 2017-06-03 15:54:05 | SubscriptionCheck | pg_ctl: server does not shut down nightjar | 2017-06-03 17:54:18 | SubscriptionCheck | pg_ctl: server does not shut down nightjar | 2017-06-03 21:54:09 | SubscriptionCheck | pg_ctl: server does not shut down nightjar | 2017-06-04 00:54:09 | SubscriptionCheck | pg_ctl: server does not shut down nightjar | 2017-06-04 16:34:32 | SubscriptionCheck | pg_ctl: server does not shut down nightjar | 2017-06-04 17:54:16 | SubscriptionCheck | pg_ctl: server does not shut down hornet | 2017-06-05 16:22:09 | SubscriptionCheck | pg_ctl: server does not shut down nightjar | 2017-06-05 16:54:09 | SubscriptionCheck | pg_ctl: server does not shut down nightjar | 2017-06-05 20:26:24 | SubscriptionCheck | pg_ctl: server does not shut down nightjar | 2017-06-06 03:30:02 | SubscriptionCheck | pg_ctl: server does not shut down nightjar | 2017-06-06 15:54:18 | SubscriptionCheck | pg_ctl: server does not shut down hornet | 2017-06-06 17:10:02 | SubscriptionCheck | pg_ctl: server does not shut down nightjar | 2017-06-06 18:54:27 | SubscriptionCheck | pg_ctl: server does not shut down nightjar | 2017-06-07 00:54:07 | SubscriptionCheck | pg_ctl: server does not shut down nightjar | 2017-06-07 02:54:06 | SubscriptionCheck | pg_ctl: server does not shut down nightjar | 2017-06-07 15:12:15 | SubscriptionCheck | pg_ctl: server does not shut down nightjar | 2017-06-07 17:54:07 | SubscriptionCheck | pg_ctl: server does not shut down nightjar | 2017-06-07 18:54:06 | SubscriptionCheck | pg_ctl: server does not shut down sungazer | 2017-06-07 19:46:53 | SubscriptionCheck | pg_ctl: server does not shut down nightjar | 2017-06-07 21:03:43 | SubscriptionCheck | pg_ctl: server does not shut down nightjar | 2017-06-08 01:54:07 | SubscriptionCheck | pg_ctl: server does not shut down nightjar | 2017-06-08 15:54:10 | SubscriptionCheck | pg_ctl: server does not shut down nightjar | 2017-06-08 16:57:03 | SubscriptionCheck | pg_ctl: server does not shut down nightjar | 2017-06-08 17:54:09 | SubscriptionCheck | pg_ctl: server does not shut down nightjar | 2017-06-08 19:54:06 | SubscriptionCheck | pg_ctl: server does not shut down nudibranch | 2017-07-23 08:31:12 | pg_upgradeCheck | pg_ctl: server does not shut down nudibranch | 2017-08-06 08:31:25 | pg_upgradeCheck | pg_ctl: server does not shut down spurfowl | 2017-08-21 18:43:08 | BinInstallCheck | pg_ctl: server does not shut down arapaima | 2017-10-10 22:12:01 | pg_upgradeCheck | pg_ctl: server does not shut down calliphoridae | 2017-10-11 20:32:24 | recoveryCheck | pg_ctl: server does not shut down arapaima | 2017-10-12 16:36:18 | pg_upgradeCheck | pg_ctl: server does not shut down arapaima | 2017-10-12 16:36:18 | pg_upgradeCheck | pg_ctl: server does not shut down arapaima | 2017-10-13 10:26:13 | test-decoding-check | pg_ctl: server does not shut down arapaima | 2017-10-14 07:40:05 | pg_upgradeCheck | pg_ctl: server does not shut down arapaima | 2017-10-14 23:22:28 | Check | pg_ctl: server does not shut down arapaima | 2017-10-15 10:32:28 | test-decoding-check | pg_ctl: server does not shut down arapaima | 2017-10-16 08:40:39 | Check | pg_ctl: server does not shut down arapaima | 2017-10-17 20:32:25 | pg_upgradeCheck | pg_ctl: server does not shut down arapaima | 2017-10-18 13:11:05 | pg_upgradeCheck | pg_ctl: server does not shut down arapaima | 2017-10-18 23:47:09 | Check | pg_ctl: server does not shut down arapaima | 2017-10-19 09:41:15 | Check | pg_ctl: server does not shut down spurfowl | 2017-11-15 13:29:06 | BinInstallCheck | pg_ctl: server does not shut down baiji | 2017-11-27 23:00:01 | Check | pg_ctl: server does not shut down\r skink | 2017-11-30 08:47:34 | recoveryCheck | pg_ctl: server does not shut down skink | 2017-12-02 14:30:01 | recoveryCheck | pg_ctl: server does not shut down gharial | 2017-12-15 09:12:45 | Check | pg_ctl: server does not shut down nudibranch | 2017-12-17 09:33:02 | pg_upgradeCheck | pg_ctl: server does not shut down nudibranch | 2018-01-20 10:08:33 | Check | pg_ctl: server does not shut down nudibranch | 2018-01-27 09:44:39 | Check | pg_ctl: server does not shut down nudibranch | 2018-02-01 09:49:26 | test-decoding-check | pg_ctl: server does not shut down mereswine | 2018-03-01 11:05:39 | Check | pg_ctl: server does not shut down topminnow | 2018-03-10 12:45:57 | pg_rewindCheck | pg_ctl: server does not shut down mereswine | 2018-03-14 08:10:08 | Check | pg_ctl: server does not shut down gull | 2018-03-30 02:10:15 | Check | pg_ctl: server does not shut down aholehole | 2018-04-08 11:37:13 | Check | pg_ctl: server does not shut down aholehole | 2018-04-08 21:41:39 | Check | pg_ctl: server does not shut down aholehole | 2018-04-10 23:37:22 | Check | pg_ctl: server does not shut down aholehole | 2018-04-13 16:06:17 | Check | pg_ctl: server does not shut down aholehole | 2018-04-13 23:59:24 | StopDb-C:1 | pg_ctl: server does not shut down aholehole | 2018-04-14 20:09:25 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-04-17 11:59:57 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-04-19 07:31:31 | Check | pg_ctl: server does not shut down aholehole | 2018-04-20 08:02:00 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-04-21 04:21:33 | Check | pg_ctl: server does not shut down aholehole | 2018-04-21 23:06:17 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-04-22 17:03:22 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-04-22 17:03:22 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-04-23 13:08:51 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-04-24 07:52:02 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-04-25 01:32:16 | Check | pg_ctl: server does not shut down gull | 2018-04-25 02:10:12 | Check | pg_ctl: server does not shut down aholehole | 2018-04-26 17:42:38 | Check | pg_ctl: server does not shut down aholehole | 2018-04-28 05:58:59 | Check | pg_ctl: server does not shut down aholehole | 2018-04-30 13:11:22 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-05-03 02:36:32 | StopDb-C:4 | pg_ctl: server does not shut down aholehole | 2018-05-04 11:07:37 | Check | pg_ctl: server does not shut down aholehole | 2018-05-04 18:02:59 | test-decoding-check | pg_ctl: server does not shut down aholehole | 2018-05-06 02:52:23 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-05-07 19:16:33 | Check | pg_ctl: server does not shut down aholehole | 2018-05-08 13:03:52 | StopDb-C:4 | pg_ctl: server does not shut down aholehole | 2018-05-09 04:36:06 | Check | pg_ctl: server does not shut down aholehole | 2018-05-09 10:41:59 | Check | pg_ctl: server does not shut down aholehole | 2018-05-09 16:41:32 | test-decoding-check | pg_ctl: server does not shut down aholehole | 2018-05-10 14:06:04 | Check | pg_ctl: server does not shut down aholehole | 2018-05-11 09:00:33 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-05-12 04:07:50 | Check | pg_ctl: server does not shut down aholehole | 2018-05-15 22:11:06 | test-decoding-check | pg_ctl: server does not shut down aholehole | 2018-05-16 10:14:21 | test-decoding-check | pg_ctl: server does not shut down aholehole | 2018-05-16 23:01:57 | Check | pg_ctl: server does not shut down aholehole | 2018-05-17 07:06:03 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-05-17 14:22:36 | test-decoding-check | pg_ctl: server does not shut down aholehole | 2018-05-18 00:15:19 | test-decoding-check | pg_ctl: server does not shut down aholehole | 2018-05-18 11:23:03 | test-decoding-check | pg_ctl: server does not shut down aholehole | 2018-05-18 22:36:45 | test-decoding-check | pg_ctl: server does not shut down aholehole | 2018-05-19 11:54:21 | Check | pg_ctl: server does not shut down aholehole | 2018-05-19 19:06:08 | StopDb-C:4 | pg_ctl: server does not shut down aholehole | 2018-05-20 09:32:53 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-05-20 16:47:17 | test-decoding-check | pg_ctl: server does not shut down aholehole | 2018-05-21 03:36:24 | test-decoding-check | pg_ctl: server does not shut down aholehole | 2018-05-21 14:38:21 | test-decoding-check | pg_ctl: server does not shut down aholehole | 2018-05-22 03:02:05 | Check | pg_ctl: server does not shut down aholehole | 2018-05-22 11:06:05 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-05-23 12:13:37 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-05-23 21:05:13 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-05-24 08:29:25 | test-decoding-check | pg_ctl: server does not shut down aholehole | 2018-05-25 14:08:16 | Check | pg_ctl: server does not shut down aholehole | 2018-05-25 19:11:35 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-05-27 16:56:10 | StopDb-C:1 | pg_ctl: server does not shut down aholehole | 2018-05-29 16:09:14 | StopDb-C:1 | pg_ctl: server does not shut down aholehole | 2018-05-30 15:07:44 | Check | pg_ctl: server does not shut down aholehole | 2018-05-31 13:10:53 | Check | pg_ctl: server does not shut down aholehole | 2018-05-31 21:12:12 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-05-31 21:12:12 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-06-04 17:07:20 | Check | pg_ctl: server does not shut down aholehole | 2018-06-05 00:12:13 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-06-06 17:06:03 | Check | pg_ctl: server does not shut down aholehole | 2018-06-06 21:49:02 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-06-06 21:49:02 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-06-07 06:53:06 | StopDb-C:1 | pg_ctl: server does not shut down aholehole | 2018-06-07 19:36:02 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-06-08 15:47:15 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-06-09 07:11:38 | StopDb-C:1 | pg_ctl: server does not shut down aholehole | 2018-06-09 21:54:40 | Check | pg_ctl: server does not shut down aholehole | 2018-06-10 08:12:11 | test-decoding-check | pg_ctl: server does not shut down aholehole | 2018-06-12 00:12:15 | test-decoding-check | pg_ctl: server does not shut down aholehole | 2018-06-13 23:29:02 | StopDb-C:4 | pg_ctl: server does not shut down aholehole | 2018-06-15 04:38:36 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-06-15 04:38:36 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-06-15 13:54:08 | Check | pg_ctl: server does not shut down aholehole | 2018-06-15 20:19:02 | Check | pg_ctl: server does not shut down aholehole | 2018-06-16 08:09:36 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-06-17 04:38:19 | StopDb-C:1 | pg_ctl: server does not shut down aholehole | 2018-06-17 19:47:19 | Check | pg_ctl: server does not shut down aholehole | 2018-06-18 04:06:04 | StopDb-C:4 | pg_ctl: server does not shut down aholehole | 2018-06-18 16:23:17 | Check | pg_ctl: server does not shut down aholehole | 2018-06-19 18:45:19 | test-decoding-check | pg_ctl: server does not shut down aholehole | 2018-06-20 09:15:42 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-06-20 09:15:42 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-06-21 04:41:45 | Check | pg_ctl: server does not shut down aholehole | 2018-06-21 13:06:30 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-06-21 23:50:37 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-06-22 13:11:55 | test-decoding-check | pg_ctl: server does not shut down aholehole | 2018-06-23 02:09:06 | Check | pg_ctl: server does not shut down aholehole | 2018-06-23 13:12:49 | test-decoding-check | pg_ctl: server does not shut down aholehole | 2018-06-25 01:06:02 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-06-25 01:06:02 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-06-25 08:25:08 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-06-25 16:10:48 | test-decoding-check | pg_ctl: server does not shut down aholehole | 2018-06-26 03:34:45 | pg_upgradeCheck | pg_ctl: server does not shut down aholehole | 2018-06-26 12:47:21 | StopDb-C:1 | pg_ctl: server does not shut down gull | 2018-08-24 03:27:16 | recoveryCheck | pg_ctl: server does not shut down mereswine | 2018-08-24 08:10:05 | recoveryCheck | pg_ctl: server does not shut down mereswine | 2018-08-29 09:26:52 | pg_rewindCheck | pg_ctl: server does not shut down gull | 2018-08-30 02:10:05 | recoveryCheck | pg_ctl: server does not shut down gull | 2018-09-06 02:10:16 | recoveryCheck | pg_ctl: server does not shut down gull | 2018-09-14 02:10:05 | recoveryCheck | pg_ctl: server does not shut down sungazer | 2018-09-18 19:44:53 | subscriptionCheck | pg_ctl: server does not shut down gull | 2018-09-26 02:10:06 | recoveryCheck | pg_ctl: server does not shut down mereswine | 2018-10-01 08:10:04 | recoveryCheck | pg_ctl: server does not shut down gull | 2018-10-03 02:10:05 | recoveryCheck | pg_ctl: server does not shut down gull | 2018-10-07 03:45:39 | recoveryCheck | pg_ctl: server does not shut down gull | 2018-10-07 05:07:06 | recoveryCheck | pg_ctl: server does not shut down mereswine | 2018-10-07 08:10:05 | recoveryCheck | pg_ctl: server does not shut down gull | 2018-10-13 02:10:06 | recoveryCheck | pg_ctl: server does not shut down gull | 2018-10-13 05:21:13 | recoveryCheck | pg_ctl: server does not shut down gull | 2018-10-29 03:25:28 | recoveryCheck | pg_ctl: server does not shut down mereswine | 2018-11-15 11:55:56 | recoveryCheck | pg_ctl: server does not shut down mereswine | 2018-11-16 09:10:13 | recoveryCheck | pg_ctl: server does not shut down mereswine | 2018-11-30 10:34:17 | recoveryCheck | pg_ctl: server does not shut down mereswine | 2018-11-30 11:44:54 | recoveryCheck | pg_ctl: server does not shut down gull | 2018-12-08 04:29:34 | recoveryCheck | pg_ctl: server does not shut down mereswine | 2019-01-30 05:18:59 | recoveryCheck | pg_ctl: server does not shut down francolin | 2019-02-07 07:55:41 | recoveryCheck | pg_ctl: server does not shut down mereswine | 2019-02-20 03:10:06 | recoveryCheck | pg_ctl: server does not shut down mereswine | 2019-02-22 03:10:05 | recoveryCheck | pg_ctl: server does not shut down spurfowl | 2019-02-25 11:02:04 | pg_rewindCheck | pg_ctl: server does not shut down sidewinder | 2019-02-25 16:45:00 | recoveryCheck | pg_ctl: server does not shut down mereswine | 2019-03-06 03:10:05 | recoveryCheck | pg_ctl: server does not shut down piculet | 2019-03-23 22:28:59 | recoveryCheck | pg_ctl: server does not shut down gull | 2019-04-11 06:25:20 | recoveryCheck | pg_ctl: server does not shut down mereswine | 2019-04-12 06:31:30 | recoveryCheck | pg_ctl: server does not shut down dragonet | 2019-04-16 06:14:01 | recoveryCheck | pg_ctl: server does not shut down (188 rows)
В списке pgsql-hackers по дате отправления: