Обсуждение: Invalid page header

Поиск
Список
Период
Сортировка

Invalid page header

От
Ireneusz Pluta
Дата:
Hello,

I have a server, 8.4.3, where I get intermittent and rather rare cases
of "invalid page headers". Quick search over the pg lists shows a
general advice to "check your hardware". Yes, I need to schedule a
downtime and perform some checks.

However, let me also share with you what I noticed and maybe you can
comment or suggest more than that.

As I said, I already had a few cases of invalid page header on that
server, but did not take an extensive care of them, as they always were
related to the same table, or its indexes. They could be easily dropped
and rebuilt, because that table depended on other tables. So I was happy
with doing just that. There were just a few such cases within 10 months
of lifetime of this server (and that was the actual reason I reported
autovacuum getting messed with invalid page header not taken care of for
a long time, earlier this year).

But the last time the invalid page header happened to another table,
which, actually, is a master source for many other tables in my
database, so I had to really take care of this case. What I have noticed
about this case was:

- this is a costantly growing table collecting raw information. The data
contained in the damaged page was accessed several times after its
insertion within a few hours, before finally a yet another access ended
with "invalid page header" error.

- there was exactly one page damaged. No other damages around. The
system is running on freebsd7.2, ufs with 16k block size, on a raid10
with 256 stripe size, if this matters

- when playing with pg_filedump I noticed that last pages of the table
are always initially reported as damaged, as they come, then, as newer
pages get allocated and filled, these initially bad pages "become
valid", as in the following example repeating the same pg_filedump.

[pgsql@gil ~]$ pg_filedump data/base/18319/36870.43 | grep -B9 -i
"invalid header" | grep ^Block
Block 7460 ********************************************************
Block 11457 ********************************************************
Block 11460 ********************************************************
Block 11461 ********************************************************
[pgsql@gil ~]$ pg_filedump data/base/18319/36870.43 | grep -B9 -i
"invalid header" | grep ^Block
Block 7460 ********************************************************
Block 11460 ********************************************************
Block 11461 ********************************************************
Block 11462 ********************************************************
[pgsql@gil ~]$ pg_filedump data/base/18319/36870.43 | grep -B9 -i
"invalid header" | grep ^Block
Block 7460 ********************************************************
Block 11461 ********************************************************
Block 11462 ********************************************************
Block 11463 ********************************************************

- Block 7460 above is the one which actually got currupted. In spite I
zeroed it with the zero_damaged_pages option it is still reported as invalid

Do the above remarks indicate that something else, other than
hard-to-find hardware issue, might be tracked in a more detailed way?

Thanks

Irek.


Re: Invalid page header

От
Tom Lane
Дата:
Ireneusz Pluta <ipluta@wp.pl> writes:
> - when playing with pg_filedump I noticed that last pages of the table
> are always initially reported as damaged, as they come, then, as newer
> pages get allocated and filled, these initially bad pages "become
> valid", as in the following example repeating the same pg_filedump.

This doesn't seem terribly surprising.  A newly-added page on disk will
be initially filled with zeroes, which I think pg_filedump will complain
about.  It won't get overwritten with "valid" data until the page is
next written, either because of a checkpoint or because the buffer space
is needed for another page.  pg_filedump can't see the state of the page
within the server's buffers, which is what counts here.

            regards, tom lane