Re: Hot Backup with rsync fails at pg_clog if under load

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Hot Backup with rsync fails at pg_clog if under load
Дата
Msg-id 4EA900E5.9070905@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Hot Backup with rsync fails at pg_clog if under load  (Florian Pflug <fgp@phlo.org>)
Ответы Re: Hot Backup with rsync fails at pg_clog if under load
Re: Hot Backup with rsync fails at pg_clog if under load
Список pgsql-hackers
On 27.10.2011 02:29, Florian Pflug wrote:
> Per my theory about the cause of the problem in my other mail, I think you
> might see StartupCLOG failures even during crash recovery, provided that
> wal_level was set to hot_standby when the primary crashed. Here's how
>
> 1) We start a checkpoint, and get as far as LogStandbySnapshot()
> 2) A backend does AssignTransactionId, and gets as far as GetTransactionoId().
>    The assigned XID requires CLOG extension.
> 3) The checkpoint continues, and LogStandbySnapshot () advances the
>    checkpoint's nextXid to the XID assigned in (2).
> 4) We crash after writing the checkpoint record, but before the CLOG
>    extension makes it to the disk, and before any trace of the XID assigned
>    in (2) makes it to the xlog.
>
> Then StartupCLOG() would fail at the end of recovery, because we'd end up
> with a nextXid whose corresponding CLOG page doesn't exist.

No, clog extension is WAL-logged while holding the XidGenLock. At step 
3, LogStandbySnapshot() would block until the clog-extension record is 
written to WAL, so crash recovery would see and replay that record 
before calling StartupCLOG().

That can happen during hot standby, though, because StartupCLOG() is 
called earlier.

> My suggestion is to fix the CLOG problem in that same way that you fixed
> the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before
> CheckPointGuts().
>
> Here's what I image CreateCheckPoint() should look like:
>
> 1) LogStandbySnapshot() and fill out oldestActiveXid
> 2) Fill out REDO
> 3) Wait for concurrent commits
> 4) Fill out nextXid and the other fields
> 5) CheckPointGuts()
> 6) Rest
>
> It's then no longer necessary for LogStandbySnapshot() do modify
> the nextXid, since we fill out nextXid after LogStandbySnapshot() and
> will thus derive a higher value than LogStandbySnapshot() would have.

Hmm, I don't think that fully fixes the problem. Even if you're certain 
that CheckPointGuts() has fsync'd the clog page to disk, VACUUM might 
decide to truncate it away again while the checkpoint is running.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Hot Backup with rsync fails at pg_clog if under load
Следующее
От: Fujii Masao
Дата:
Сообщение: Re: Updated version of pg_receivexlog