Re: prevent immature WAL streaming
От | Tom Lane |
---|---|
Тема | Re: prevent immature WAL streaming |
Дата | |
Msg-id | 45597.1637694259@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: prevent immature WAL streaming (Alvaro Herrera <alvherre@alvh.no-ip.org>) |
Ответы |
Re: prevent immature WAL streaming
(Alvaro Herrera <alvherre@alvh.no-ip.org>)
|
Список | pgsql-hackers |
We're *still* not out of the woods with 026_overwrite_contrecord.pl, as we are continuing to see occasional "mismatching overwritten LSN" failures, further down in the test where it tries to start up the standby: sysname | branch | snapshot | stage | l ------------+---------------+---------------------+---------------+------------------------------------------------------------------------------------------------------------ spurfowl | REL_13_STABLE | 2021-10-18 03:56:26 | recoveryCheck | 2021-10-18 00:08:09.324 EDT [2455:6] FATAL: mismatchingoverwritten LSN 0/1FFE018 -> 0/1FFE000 sidewinder | HEAD | 2021-10-19 04:32:36 | recoveryCheck | 2021-10-19 06:46:23.168 CEST [26393:6] FATAL: mismatchingoverwritten LSN 0/1FFE018 -> 0/1FFE000 francolin | REL9_6_STABLE | 2021-10-26 01:41:39 | recoveryCheck | 2021-10-26 01:48:05.646 UTC [3417202][][1/0:0] FATAL: mismatching overwritten LSN 0/1FFE018 -> 0/1FFE000 petalura | HEAD | 2021-11-05 00:20:03 | recoveryCheck | 2021-11-05 02:58:12.146 CET [61848fb3.28d157:6] FATAL: mismatching overwritten LSN 0/1FFE018 -> 0/1FFE000 lapwing | REL_11_STABLE | 2021-11-05 17:24:49 | recoveryCheck | 2021-11-05 17:39:29.741 UTC [9831:6] FATAL: mismatchingoverwritten LSN 0/1FFE014 -> 0/1FFE000 morepork | HEAD | 2021-11-10 02:51:12 | recoveryCheck | 2021-11-10 04:03:33.576 CET [73561:6] FATAL: mismatchingoverwritten LSN 0/1FFE018 -> 0/1FFE000 petalura | HEAD | 2021-11-16 15:20:03 | recoveryCheck | 2021-11-16 18:16:47.875 CET [6193e77f.35b87f:6] FATAL: mismatching overwritten LSN 0/1FFE018 -> 0/1FFE000 morepork | HEAD | 2021-11-17 03:45:36 | recoveryCheck | 2021-11-17 04:57:04.359 CET [32089:6] FATAL: mismatchingoverwritten LSN 0/1FFE018 -> 0/1FFE000 spurfowl | REL_10_STABLE | 2021-11-22 22:21:03 | recoveryCheck | 2021-11-22 17:29:35.520 EST [16011:6] FATAL: mismatchingoverwritten LSN 0/1FFE018 -> 0/1FFE000 (9 rows) Looking at adjacent successful runs, it seems that the exact point where the "missing contrecord" starts varies substantially, even after our previous fix to disable autovacuum in this test. How could that be? It's probably for the best though, because I think this is exposing an actual bug that we would not have seen if the start point were completely consistent. I have not dug into the code, but it looks to me like if the "consistent recovery state" is reached exactly at a page boundary (0/1FFE000 in all these cases), then the standby expects that to be what the OVERWRITE_CONTRECORD record will point at. But actually it points to the first WAL record on that page, resulting in a bogus failure. regards, tom lane
В списке pgsql-hackers по дате отправления: