Re: corrupt pages detected by enabling checksums
От | Jeff Davis |
---|---|
Тема | Re: corrupt pages detected by enabling checksums |
Дата | |
Msg-id | 1368140021.24407.111.camel@jdavis обсуждение исходный текст |
Ответ на | Re: corrupt pages detected by enabling checksums (Greg Stark <stark@mit.edu>) |
Список | pgsql-hackers |
On Thu, 2013-05-09 at 23:13 +0100, Greg Stark wrote: > However it is possible to reduce the window... Sounds reasonable. It's fairly limited though -- the window is already a checkpoint (typically 5-30 minutes), and we'd bring that down an order of magnitude (10s). I speculate that, if it got corrupted within 30 minutes, it probably got corrupted at the time of being written (as happened in Jeff Janes's case, due to a bug). So, the question is: if the WAL is corrupted on write, does reducing the window significantly increase the chances that the wal writer will hang around long enough before a crash to flush this other file? On the other hand, checkpoint hides any corrupt WAL records by not replaying them, whereas your scheme would identify that there is a problem. I don't think this would have helped Jeff Janes's case because I think the crashes were happening too quickly. But that is artificial, so it may help in real cases. I just had a thought: we don't necessarily need to flush the auxiliary file each time; merely writing it to the kernel buffers would help a lot. Maybe an extra write() of the auxiliary file during a WAL flush isn't so bad; and combined with periodic fsync()s of the auxiliary file, should offer a lot of coverage against problems. Regards,Jeff Davis
В списке pgsql-hackers по дате отправления: