Re: corrupt pages detected by enabling checksums

Поиск

Список

Период

Сортировка

От	Jeff Davis
Тема	Re: corrupt pages detected by enabling checksums
Дата	9 мая 2013 г. 22:53:45
Msg-id	1368140021.24407.111.camel@jdavis обсуждение
Ответ на	Re: corrupt pages detected by enabling checksums (Greg Stark <stark@mit.edu>)
Список	pgsql-hackers

Дерево обсуждения

On Thu, 2013-05-09 at 23:13 +0100, Greg Stark wrote:
> However it is possible to reduce the window...

Sounds reasonable.

It's fairly limited though -- the window is already a checkpoint
(typically 5-30 minutes), and we'd bring that down an order of magnitude
(10s). I speculate that, if it got corrupted within 30 minutes, it
probably got corrupted at the time of being written (as happened in Jeff
Janes's case, due to a bug).

So, the question is: if the WAL is corrupted on write, does reducing the
window significantly increase the chances that the wal writer will hang
around long enough before a crash to flush this other file?

On the other hand, checkpoint hides any corrupt WAL records by not
replaying them, whereas your scheme would identify that there is a
problem.

I don't think this would have helped Jeff Janes's case because I think
the crashes were happening too quickly. But that is artificial, so it
may help in real cases.

I just had a thought: we don't necessarily need to flush the auxiliary
file each time; merely writing it to the kernel buffers would help a
lot. Maybe an extra write() of the auxiliary file during a WAL flush
isn't so bad; and combined with periodic fsync()s of the auxiliary file,
should offer a lot of coverage against problems.

Regards,Jeff Davis

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: corrupt pages detected by enabling checksums