Re: hung backends stuck in spinlock heavy endless loop

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: hung backends stuck in spinlock heavy endless loop
Дата
Msg-id 54B91E68.7030400@vmware.com
обсуждение исходный текст
Ответ на Re: hung backends stuck in spinlock heavy endless loop  (Merlin Moncure <mmoncure@gmail.com>)
Ответы Re: hung backends stuck in spinlock heavy endless loop  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
On 01/16/2015 04:05 PM, Merlin Moncure wrote:
> On Thu, Jan 15, 2015 at 5:10 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> On Thu, Jan 15, 2015 at 3:00 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>> Running this test on another set of hardware to verify -- if this
>>> turns out to be a false alarm which it may very well be, I can only
>>> offer my apologies!  I've never had a new drive fail like that, in
>>> that manner.  I'll burn the other hardware in overnight and report
>>> back.
>
> huh -- well possibly. not.  This is on a virtual machine attached to a
> SAN.  It ran clean for several (this is 9.4 vanilla, asserts off,
> checksums on) hours then the starting having issues:
>
> [cds2 21952 2015-01-15 22:54:51.833 CST 5502]WARNING:  page
> verification failed, calculated checksum 59143 but expected 59137 at
> character 20

The calculated checksum is suspiciously close to to the expected one. It 
could be coincidence, but the previous checksum warning you posted was 
also quite close:

> [cds2 18347 2015-01-15 15:58:29.955 CST 1779]WARNING:  page
> verification failed, calculated checksum 28520 but expected 28541

I believe the checksum algorithm is supposed to mix the bits quite 
thoroughly, so that a difference in a single byte in the input will lead 
to a completely different checksum. However, we add the block number to 
the checksum last:

>     /* Mix in the block number to detect transposed pages */
>     checksum ^= blkno;
>
>     /*
>      * Reduce to a uint16 (to fit in the pd_checksum field) with an offset of
>      * one. That avoids checksums of zero, which seems like a good idea.
>      */
>     return (checksum % 65535) + 1;

It looks very much like that a page has for some reason been moved to a 
different block number. And that's exactly what Peter found out in his 
investigation too; an index page was mysteriously copied to a different 
block with identical content.

- Heikki




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Merlin Moncure
Дата:
Сообщение: Re: hung backends stuck in spinlock heavy endless loop
Следующее
От: Andres Freund
Дата:
Сообщение: Re: hung backends stuck in spinlock heavy endless loop