Re: corrupt pages detected by enabling checksums
От | Jim Nasby |
---|---|
Тема | Re: corrupt pages detected by enabling checksums |
Дата | |
Msg-id | 518FFE23.8070102@nasby.net обсуждение исходный текст |
Ответ на | Re: corrupt pages detected by enabling checksums (Jeff Davis <pgsql@j-davis.com>) |
Список | pgsql-hackers |
On 5/9/13 5:18 PM, Jeff Davis wrote: > On Thu, 2013-05-09 at 14:28 -0500, Jim Nasby wrote: >> What about moving some critical data from the beginning of the WAL >> record to the end? That would make it easier to detect that we don't >> have a complete record. It wouldn't necessarily replace the CRC >> though, so maybe that's not good enough. >> >> Actually, what if we actually *duplicated* some of the same WAL header >> info at the end of the record? Given a reasonable amount of data that >> would damn-near ensure that a torn record was detected, because the >> odds of having the exact same sequence of random bytes would be so >> low. Potentially even just duplicating the LSN would suffice. > > I think both of these ideas have some false positives and false > negatives. > > If the corruption happens at the record boundary, and wipes out the > special information at the end of the record, then you might think it > was not fully flushed, and we're in the same position as today. > > If the WAL record is large, and somehow the beginning and the end get > written to disk but not the middle, then it will look like corruption; > but really the WAL was just not completely flushed. This seems pretty > unlikely, but not impossible. > > That being said, I like the idea of introducing some extra checks if a > perfect solution is not possible. Yeah, I don't think a perfect solution is possible, short of attempting to tie directly into the filesystem (ie: on a journalingFS have some way to essentially treat the FS journal as WAL). One additional step we might be able to take would be to scan forward looking for a record that would tell us when an fsyncmust have occurred (heck, maybe we should add an fsync WAL record...). If we find a corrupt WAL record followed by anfsync we know that we've now lost data. That closes some of the holes. Actually, that might handle all the holes... >> On the separate write idea, if that could be controlled by a GUC I >> think it'd be worth doing. Anyone that needs to worry about this >> corner case probably has hardware that would support that. > > It sounds pretty easy to do that naively. I'm just worried that the > performance will be so bad for so many users that it's not a very > reasonable choice. > > Today, it would probably make more sense to just use sync rep. If the > master's WAL is corrupt, and it starts up too early, then that should be > obvious when you try to reconnect streaming replication. I haven't tried > it, but I'm assuming that it gives a useful error message. I wonder if there are DW environments that are too large to keep a SR copy but would be able to afford the double-write overhead. BTW, isn't performance what killed the double-buffer idea? -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
В списке pgsql-hackers по дате отправления: