Re: pg_rewind test race condition..?

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: pg_rewind test race condition..?
Дата
Msg-id 5540277D.8020309@iki.fi
обсуждение исходный текст
Ответ на pg_rewind test race condition..?  (Stephen Frost <sfrost@snowman.net>)
Ответы Re: pg_rewind test race condition..?  (Stephen Frost <sfrost@snowman.net>)
Re: pg_rewind test race condition..?  (Oleksii Kliukin <alexk@hintbits.com>)
Список pgsql-hackers
On 04/28/2015 11:02 AM, Stephen Frost wrote:
> Heikki,
>
>    Not sure if anyone else is seeing this, but I'm getting regression
>    test failures when running the pg_rewind tests pretty consistently
>    with 'make check'.  Specifically with "basic remote", I'm getting:
>
> source and target cluster are on the same timeline
> Failure, exiting
>
>    in regress_log/pg_rewind_log_basic_remote.
>
>    If I throw a "sleep(5);" into t/001_basic.pl before the call to
>    RewindTest::run_pg_rewind($test_mode); then everything works fine.

The problem seems to be that when the standby is promoted, it's a 
so-called "fast promotion", where it writes an end-of-recovery record 
and starts accepting queries before creating a real checkpoint. 
pg_rewind looks at the TLI in the latest checkpoint, as it's in the 
control file, but that isn't updated until the checkpoint completes. I 
don't see it on my laptop normally, but I can reproduce it if I insert a 
"sleep(5)" in StartupXLog, just before it requests the checkpoint:

--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7173,7 +7173,10 @@ StartupXLOG(void)      * than is appropriate now that we're not in standby mode anymore.      */
   if (fast_promoted)
 
+    {
+        sleep(5);         RequestCheckpoint(CHECKPOINT_FORCE);
+    } }

The simplest fix would be to force a checkpoint in the regression test, 
before running pg_rewind. It's a bit of a cop out, since you'd still get 
the same issue when you tried to do the same thing in the real world. It 
should be rare in practice - you'd not normally run pg_rewind 
immediately after promoting the standby - but a better error message at 
least would be nice..
- Heikki




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ian Barwick
Дата:
Сообщение: Re: pg_basebackup, tablespace mapping and path canonicalization
Следующее
От: Jim Nasby
Дата:
Сообщение: Re: Feedback on getting rid of VACUUM FULL