Re: Hot Backup with rsync fails at pg_clog if under load

Поиск
Список
Период
Сортировка
От Florian Pflug
Тема Re: Hot Backup with rsync fails at pg_clog if under load
Дата
Msg-id 01A2E400-AC30-48D5-8DD7-9F789D736CE7@phlo.org
обсуждение исходный текст
Ответ на Re: Hot Backup with rsync fails at pg_clog if under load  (Chris Redekop <chris@replicon.com>)
Список pgsql-hackers
On Oct26, 2011, at 17:36 , Chris Redekop wrote:
> > And I think they also reported that if they didn't run hot standby,
> > but just normal recovery into a new master, it didn't have the problem
> > either, i.e. without hotstandby, recovery ran, properly extended the
> > clog, and then ran as a new master fine.
>
> Yes this is correct...attempting to start as hotstandby will produce the
> pg_clog error repeatedly and then without changing anything else, just
> turning hot standby off it will start up successfully.

Yup, because with hot standby disabled (on the client side), StartupCLOG()
happens after recovery has completed. That, at the very least, makes the
problem very unlikely to occur in the non-hot-standby case. I'm not sure
it's completely impossible, though.

Per my theory about the cause of the problem in my other mail, I think you
might see StartupCLOG failures even during crash recovery, provided that
wal_level was set to hot_standby when the primary crashed. Here's how

1) We start a checkpoint, and get as far as LogStandbySnapshot()
2) A backend does AssignTransactionId, and gets as far as GetTransactionoId().  The assigned XID requires CLOG
extension.
3) The checkpoint continues, and LogStandbySnapshot () advances the  checkpoint's nextXid to the XID assigned in (2).
4) We crash after writing the checkpoint record, but before the CLOG  extension makes it to the disk, and before any
traceof the XID assigned  in (2) makes it to the xlog. 

Then StartupCLOG() would fail at the end of recovery, because we'd end up
with a nextXid whose corresponding CLOG page doesn't exist.

> > This fits the OP's observation ob the
> > problem vanishing when pg_start_backup() does an immediate checkpoint.
>
> Note that this is *not* the behaviour I'm seeing....it's possible it happens
> more frequently without the immediate checkpoint, but I am seeing it happen
> even with the immediate checkpoint.

Yeah, I should have said "of the problem's likelihood decreasing" instead
of "vanishing". The point is, the longer the checkpoint takes, the higher
the chance the nextId is advanced far enough to require a CLOG extension.

That alone isn't enough to trigger the error - the CLOG extension must also
*not* make it to the disk before the checkpoint completes - but it's
a required precondition for the error to occur.

best regards,
Florian Pflug



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Chris Redekop
Дата:
Сообщение: Re: Hot Backup with rsync fails at pg_clog if under load
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Hot Backup with rsync fails at pg_clog if under load