Re: regression test failed when enabling checksum

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: regression test failed when enabling checksum
Дата
Msg-id CAMkU=1x=261iP1rJz8Z1YJBqnnNUGtJ9yMUaLcQqxKkVKu8iDg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: regression test failed when enabling checksum  (Andres Freund <andres@2ndquadrant.com>)
Ответы Re: regression test failed when enabling checksum  (Andres Freund <andres@2ndquadrant.com>)
Re: regression test failed when enabling checksum  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
On Wed, Apr 3, 2013 at 2:31 AM, Andres Freund <andres@2ndquadrant.com> wrote:


I just checked and unfortunately your dump doesn't contain all that much
valid WAL:
...
 
So just two checkpoint records.

Unfortunately I  fear that won't be enough to diagnose the problem,
could you reproduce it with a higher wal_keep_segments?

I've been trying, but see message "commit dfda6ebaec67 versus wal_keep_segments".


Looking at some of the log files more, I see that vacuum is involved, but in some way I don't understand.  The crash always happens on a test cycle immediately after the sleep that allows the autovac to kick in and finish.  So the events goes something like this:

...
run the frantic updating of "foo" until crash
recovery
query "foo" and verify the results are consistent with expectations
sleep to allow autovac to do its job.
truncate "foo" and repopulate it.
run the frantic updating of "foo" until crash
recovery
attempt to query "foo" but get the checksum failure.

What the vacuum is doing that corrupts the system in a way that survives the truncate is a mystery to me.

Also, at one point I had the harness itself exit as soon as it detected the problem, but I failed to have it shut down the server.  So the server keep running idle and having autovac do its thing, which produced some interesting log output:

WARNING:  relation "foo" page 45 is uninitialized --- fixing
WARNING:  relation "foo" page 46 is uninitialized --- fixing
...
WARNING:  relation "foo" page 72 is uninitialized --- fixing
WARNING:  relation "foo" page 73 is uninitialized --- fixing
WARNING:  page verification failed, calculated checksum 54570 but expected 34212
ERROR:  invalid page in block 74 of relation base/16384/4931589

This happened 3 times.  Every time, the warnings started on page 45, and they continued up until the invalid page was found (which varied, being 74, 86, and 74 again)

I wonder if the bug is in checksums, or if the checksums are doing their job by finding some other bug.  And why did those uninitialized pages trigger warnings when they were autovacced, but not when they were seq scanned in a query?

Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)
Следующее
От: Robert Haas
Дата:
Сообщение: Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)