Re: Checkpoint cost, looks like it is WAL/CRC

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Checkpoint cost, looks like it is WAL/CRC
Дата
Msg-id 18494.1120415712@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Checkpoint cost, looks like it is WAL/CRC  (Greg Stark <gsstark@mit.edu>)
Список pgsql-hackers
Greg Stark <gsstark@mit.edu> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Partial writes.  Without the full-page image, we do not have enough
>> information in WAL to reconstruct the correct page contents.

> Sure, but why not?

> If a 8k page contains 16 low level segments on disk and the old data is
> AAAAAAAAAAAAAAAA and the new data is AAABAAACAAADAAAE then the WAL would
> contain the B, C, D, and E. Shouldn't that be enough to reconstruct the page?

It might contain parts of it ... scattered across a large number of WAL
entries ... but I don't think that's enough to reconstruct the page.
As an example, a btree insert WAL record will say "insert this tuple
at position N, shifting the other entries accordingly"; that does not
give you the ability to reconstruct entries that shifted across sector
boundaries, as they may not be present in the on-disk data of either
sector.  You're also going to have fairly serious problems interpreting
the page contents if what's on disk includes the effects of multiple
WAL records beyond the record you are currently looking at.

We could possibly do it if we added more information to the WAL records,
but that strikes me as a net loss: essentially it would pay the penalty
all the time instead of only on the first write after a checkpoint.

Also, you are assuming that the content of each sector is uniquely ---
and determinably --- either "old data" or "new data", not for example
"unreadable because partially written".
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Greg Stark
Дата:
Сообщение: Re: Checkpoint cost, looks like it is WAL/CRC
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: Fix for cross compilation