On Thu, 27 Oct 2005, Tom Lane wrote:
> "Jim C. Nasby" <jnasby@pervasive.com> writes:
> > On Wed, Oct 26, 2005 at 09:29:23PM -0400, Tom Lane wrote:
> >> Could you send me the whole file (off-list)?
>
> > Ok, will send URL as soon as I have it from client.
>
> Well, the answer is that there's nothing wrong with that index except
> that four consecutive pages near the end (32K total) have been zeroed
> out :-(
[snip]
> Bottom line is that index searches probably ought to have some
> non-Assert defenses against zeroed-out pages. Obviously we can't
> expect to catch every flavor of data corruption, but this particular
> one has been seen before...
Definately. I've seen faulty hardware somehow zero blocks where I would
have expected random data. I wonder if we can test with PageIsNew(), which
is very inexpensive. The question is: what do we do when we detect this?
>
> BTW, Jim, any thoughts about how the index got corrupted? Have you
> had any crashes on that machine lately?
Have spoken with Jim on IRC, he says that there have been several crashes
recently due to a faulty disk array. I guess the zeroing could be an
outcome of the faulty disk. I wonder if the crash the faulty disk resulted
in could have been caused some where around mdextend() where we create a
zero'd page but before we could have written out the initialised page.
If this happened 4 times in a row it could account for the problem. It
does seem a bit unlikely thought.
That being said, is there any reason where don't extend the file with a
PageInit()'d block instead of a zero'd file?
Thanks,
Gavin