Re: "using previous checkpoint record at" maybe not the greatest idea?

Поиск
Список
Период
Сортировка
От David G. Johnston
Тема Re: "using previous checkpoint record at" maybe not the greatest idea?
Дата
Msg-id CAKFQuwa+YTwiPXpqMy3gUseE=v9JmEV0GAF9ToikH6-Ns-rKtQ@mail.gmail.com
обсуждение исходный текст
Ответ на "using previous checkpoint record at" maybe not the greatest idea?  (Andres Freund <andres@anarazel.de>)
Ответы Re: "using previous checkpoint record at" maybe not the greatest idea?  (Andres Freund <andres@anarazel.de>)
Re: "using previous checkpoint record at" maybe not the greatest idea?  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-hackers
On Mon, Feb 1, 2016 at 4:58 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,

currently if, when not in standby mode, we can't read a checkpoint
record, we automatically fall back to the previous checkpoint, and start
replay from there.

Doing so without user intervention doesn't actually seem like a good
idea. While not super likely, it's entirely possible that doing so can
wreck a cluster, that'd otherwise easily recoverable. Imagine e.g. a
tablespace being dropped - going back to the previous checkpoint very
well could lead to replay not finishing, as the directory to create
files in doesn't even exist.

As there's, afaics, really no "legitimate" reasons for needing to go
back to the previous checkpoint I don't think we should do so in an
automated fashion.

All the cases where I could find logs containing "using previous
checkpoint record at" were when something else had already gone pretty
badly wrong. Now that obviously doesn't have a very large significance,
because in the situations where it "just worked" are unlikely to be
reported...

Am I missing a reason for doing this by default?

​Learning by reading here...

"""
After a checkpoint has been made and the log flushed, the checkpoint's position is saved in the file pg_control. Therefore, at the start of recovery, the server first reads pg_control and then the checkpoint record; then it performs the REDO operation by scanning forward from the log position indicated in the checkpoint record. Because the entire content of data pages is saved in the log on the first page modification after a checkpoint (assuming full_page_writes is not disabled), all pages changed since the checkpoint will be restored to a consistent state.

To deal with the case where pg_control is corrupt, we should support the possibility of scanning existing log segments in reverse order — newest to oldest — in order to find the latest checkpoint. This has not been implemented yet. pg_control is small enough (less than one disk page) that it is not subject to partial-write problems, and as of this writing there have been no reports of database failures due solely to the inability to read pg_control itself. So while it is theoretically a weak spot, pg_control does not seem to be a problem in practice.
​"""​

​The above comment appears out-of-date if this post describes what presently happens.

Also, I was​ under the impression that tablespace commands resulted in checkpoints so that the state of the file system could be presumed current...

I don't know enough internals but its seems like we'd need to distinguish between an interrupted checkpoint (pull the plug during checkpoint) and one that supposedly completed without interruption but then was somehow corrupted (solar flares).  The former seem legitimate for auto-skip while the later do not.

David J.

 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Steele
Дата:
Сообщение: Re: [PROPOSAL] Client Log Output Filtering
Следующее
От: Andres Freund
Дата:
Сообщение: checkpoints after database start/immediate checkpoints