Re: pg_rewind test race condition..?

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: pg_rewind test race condition..?
Дата
Msg-id 20150429130328.GW30322@tamriel.snowman.net
обсуждение исходный текст
Ответ на Re: pg_rewind test race condition..?  (Heikki Linnakangas <hlinnaka@iki.fi>)
Ответы Re: pg_rewind test race condition..?  (Heikki Linnakangas <hlinnaka@iki.fi>)
Re: pg_rewind test race condition..?  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-hackers
Heikki,

* Heikki Linnakangas (hlinnaka@iki.fi) wrote:
> The problem seems to be that when the standby is promoted, it's a
> so-called "fast promotion", where it writes an end-of-recovery
> record and starts accepting queries before creating a real
> checkpoint. pg_rewind looks at the TLI in the latest checkpoint, as
> it's in the control file, but that isn't updated until the
> checkpoint completes. I don't see it on my laptop normally, but I
> can reproduce it if I insert a "sleep(5)" in StartupXLog, just
> before it requests the checkpoint:

Ah, interesting.

> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -7173,7 +7173,10 @@ StartupXLOG(void)
>       * than is appropriate now that we're not in standby mode anymore.
>       */
>      if (fast_promoted)
> +    {
> +        sleep(5);
>          RequestCheckpoint(CHECKPOINT_FORCE);
> +    }
>  }
>
> The simplest fix would be to force a checkpoint in the regression
> test, before running pg_rewind. It's a bit of a cop out, since you'd
> still get the same issue when you tried to do the same thing in the
> real world. It should be rare in practice - you'd not normally run
> pg_rewind immediately after promoting the standby - but a better
> error message at least would be nice..

Forcing a checkpoint in the regression tests and then providing a better
error message sounds reasonable to me.  I agree that it's very unlikely
to happen in the real world, even when you're bouncing between systems
for upgrades, etc, you're unlikely to do it fast enough for this issue
to exhibit itself, and a better error message would help any users who
manage to run into this (perhaps during their own testing).

Another thought would be to provide an option to pg_rewind to have it do
an explicit checkpoint before it reads the control file..  I'm not
against having it simply always do it as I don't see pg_rewind being a
commonly run thing, but I know some environments have very heavy
checkpoints and that might not be ideal.
Thanks!
    Stephen

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Oleg Bartunov
Дата:
Сообщение: Re: Selectivity estimation for intarray
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: Additional role attributes && superuser review