END_OF_RECOVERY shutdowns and ResetUnloggedRelations()
От | Andres Freund |
---|---|
Тема | END_OF_RECOVERY shutdowns and ResetUnloggedRelations() |
Дата | |
Msg-id | 20140912112246.GA4984@alap3.anarazel.de обсуждение исходный текст |
Ответы |
Re: END_OF_RECOVERY shutdowns and ResetUnloggedRelations()
(Andres Freund <andres@2ndquadrant.com>)
Re: END_OF_RECOVERY shutdowns and ResetUnloggedRelations() (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
Hi, Abhijit and I investigated a customer problem which has showed that crash recovery + unlogged relations don't always work well together: A condensed version of how crash recovery works is: StartupXLOG() { ... if (ControlFile->state != DB_SHUTDOWNED) InRecovery = true; if (InRecovery) { ResetUnloggedRelations(UNLOGGED_RELATION_CLEANUP); .... /* perform crash recovery till the end of WAL */ ... CreateCheckPoint(CHECKPOINT_END_OF_RECOVERY | CHECKPOINT_IMMEDIATE); ... } PreallocXlogFiles(EndOfLog); if (InRecovery) ResetUnloggedRelations(UNLOGGED_RELATION_INIT); ... /* finish startup */ } the second important part is: CreateCheckPoint(flags) { ... if (flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY)) shutdown = true; ... if (shutdown) ControlFile->state = DB_SHUTDOWNED; UpdateControlFile(); ... } If you consider a crash due to ENOSPC the problem is that first crash recovery is performed. Then a checkpoint is performed which is marked as END_OF_RECOVERY - which marks the database as being properly shut down! So far, while not pretty, so good. The problem is that if we crash after the CreateCheckPoint(), e.g. because of xlog preallocation or the new files created in ResetUnloggedRelations(), the server will restart *but* will *not* perform crash recovery anymore as pg_control marked as DB_SHUTDOWNED! That leaves you with a database which has all the _init forks, but not the main forks... Leading to scary an unexpected errors. Should somebody google this: The problem can be fixed by forcing the server into crash recovery again using an immediate shutdown. Personally I think it's quite the mistake that END_OF_RECOVERY checkpoints are treated as shutdown checkpoints. The claimed reason that: * * Note that we write a shutdown checkpoint rather than an on-line * one. This is not particularly critical,but since we may be * assigning a new TLI, using a shutdown checkpoint allows us to have * the rule that TLIonly changes in shutdown checkpoints, which * allows some extra error checking in xlog_redo. * and /* * An end-of-recovery checkpoint is really a shutdown checkpoint, just * issued at a different time. */ isn't very convincing as those checks could just as well be saved in a flags argument in the checkpoint. The likelihood of this confusion causing further bugs (IIRC it already has caused a couple) seems high. What I like even less is that pg_control is actually marked as DB_SHUTDOWNED due to END_OF_RECOVERY. That's just plain wrong. Obviously the database was *NOT* shutdown peacefully. I don't see active bugs due it besides this, but I think it's likely to either have or create futher ones. Because at least the former is something that obviously we can't (and don't want) to change in the back branches I think the solution for this particular problem is to simply move the ResetUnloggedRelations() call a couple lines up to before the CreateCheckPoint(). That should fix this. Comments, other opinions? Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Heikki LinnakangasДата:
Сообщение: Re: implement subject alternative names support for SSL connections
Следующее
От: Andres FreundДата:
Сообщение: Re: END_OF_RECOVERY shutdowns and ResetUnloggedRelations()