Re: corrupt pages detected by enabling checksums
| От | Jeff Davis | 
|---|---|
| Тема | Re: corrupt pages detected by enabling checksums | 
| Дата | |
| Msg-id | 1368137927.24407.85.camel@jdavis обсуждение исходный текст | 
| Ответ на | Re: corrupt pages detected by enabling checksums (Jim Nasby <jim@nasby.net>) | 
| Ответы | Re: corrupt pages detected by enabling checksums | 
| Список | pgsql-hackers | 
On Thu, 2013-05-09 at 14:28 -0500, Jim Nasby wrote: > What about moving some critical data from the beginning of the WAL > record to the end? That would make it easier to detect that we don't > have a complete record. It wouldn't necessarily replace the CRC > though, so maybe that's not good enough. > > Actually, what if we actually *duplicated* some of the same WAL header > info at the end of the record? Given a reasonable amount of data that > would damn-near ensure that a torn record was detected, because the > odds of having the exact same sequence of random bytes would be so > low. Potentially even just duplicating the LSN would suffice. I think both of these ideas have some false positives and false negatives. If the corruption happens at the record boundary, and wipes out the special information at the end of the record, then you might think it was not fully flushed, and we're in the same position as today. If the WAL record is large, and somehow the beginning and the end get written to disk but not the middle, then it will look like corruption; but really the WAL was just not completely flushed. This seems pretty unlikely, but not impossible. That being said, I like the idea of introducing some extra checks if a perfect solution is not possible. > On the separate write idea, if that could be controlled by a GUC I > think it'd be worth doing. Anyone that needs to worry about this > corner case probably has hardware that would support that. It sounds pretty easy to do that naively. I'm just worried that the performance will be so bad for so many users that it's not a very reasonable choice. Today, it would probably make more sense to just use sync rep. If the master's WAL is corrupt, and it starts up too early, then that should be obvious when you try to reconnect streaming replication. I haven't tried it, but I'm assuming that it gives a useful error message. Regards,Jeff Davis
В списке pgsql-hackers по дате отправления: