Avoiding adjacent checkpoint records

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Avoiding adjacent checkpoint records
Дата
Msg-id 13147.1339009725@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: Avoiding adjacent checkpoint records  (Robert Haas <robertmhaas@gmail.com>)
Re: Avoiding adjacent checkpoint records  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
In commit 18fb9d8d21a28caddb72c7ffbdd7b96d52ff9724, Simon modified the
rule for when to skip checkpoints on the grounds that not enough
activity has happened since the last one.  However, that commit left the
comment block about it in a nonsensical state:
   * If this isn't a shutdown or forced checkpoint, and we have not switched   * to the next WAL file since the start
ofthe last checkpoint, skip the   * checkpoint.  The idea here is to avoid inserting duplicate checkpoints   * when the
systemis idle. That wastes log space, and more importantly it   * exposes us to possible loss of both current and
previouscheckpoint   * records if the machine crashes just as we're writing the update.   * (Perhaps it'd make even
moresense to checkpoint only when the previous   * checkpoint record is in a different xlog page?)
 

The new code entirely fails to prevent writing adjacent checkpoint
records, because what it checks is the distance from the previous
checkpoint's REDO pointer, not the previous checkpoint record itself.
So the concern raised in the last two sentences of the comment isn't
being addressed at all: if we corrupt the current page of WAL while
trying to write the new checkpoint record, we risk losing the previous
checkpoint record too.  Should the system then crash, there is enough
logic to back up to the second previous checkpoint record and roll
forward from there --- but since we've lost the last checkpoint and up
to one page's worth of preceding WAL records, there is no guarantee that
we'll manage to reach a database state that is consistent with data
already flushed out to disk during the last checkpoint.

I started to make a quick patch to add an additional check on the
location of the previous checkpoint record, so that we'd skip a new
checkpoint unless we'd moved to a new page of WAL.  However, if we
really want to take this risk seriously, ISTM that allowing adjacent
checkpoint records is bad all the time, not only for non-forced
checkpoints.

What I'm now thinking is that a more appropriate way to address that
risk is to force a skip to a new page (not segment) of WAL after we
write a checkpoint record.  This won't waste much WAL space in view
of the new rule to avoid checkpoints more than once per segment on
average.

On the other hand, you could argue that this concern is entirely
hypothetical, and we're already basically assuming that once a WAL
record has been flushed to disk it's safe there even if we're still
writing more stuff into the same page.  If we don't want to assume
that, then any XLogFlush would have to include skip-to-new-page,
and that's not going to be cheap.

Thoughts?
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: 9.3: load path to mitigate load penalty for checksums
Следующее
От: Andres Freund
Дата:
Сообщение: Re: WalSndWakeup() and synchronous_commit=off