Re: 9.4 checksum error in recovery with btree index

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: 9.4 checksum error in recovery with btree index
Дата
Msg-id 5379DDBE.2010703@vmware.com
обсуждение исходный текст
Ответ на Re: 9.4 checksum error in recovery with btree index  (Jeff Janes <jeff.janes@gmail.com>)
Ответы Re: 9.4 checksum error in recovery with btree index
Список pgsql-hackers
On 05/18/2014 06:30 AM, Jeff Janes wrote:
> On Saturday, May 17, 2014, Heikki Linnakangas <hlinnakangas@vmware.com>
> wrote:
>
>> On 05/17/2014 12:28 AM, Jeff Janes wrote:
>>
>>> More fun with my torn page injection test program on 9.4.
>>>
>>> 24171  2014-05-16 14:00:44.934 PDT:WARNING:  01000: page verification
>>> failed, calculated checksum 21100 but expected 3356
>>> 24171  2014-05-16 14:00:44.934 PDT:CONTEXT:  xlog redo split_l: rel
>>> 1663/16384/16405 left 35191, right 35652, next 34666, level 0, firstright
>>> 192
>>> 24171  2014-05-16 14:00:44.934 PDT:LOCATION:  PageIsVerified,
>>> bufpage.c:145
>>> 24171  2014-05-16 14:00:44.934 PDT:FATAL:  XX001: invalid page in block
>>> 34666 of relation base/16384/16405
>>> 24171  2014-05-16 14:00:44.934 PDT:CONTEXT:  xlog redo split_l: rel
>>> 1663/16384/16405 left 35191, right 35652, next 34666, level 0, firstright
>>> 192
>>> 24171  2014-05-16 14:00:44.934 PDT:LOCATION:  ReadBuffer_common,
>>> bufmgr.c:483
>>>
>>>
>>> I've seen this twice now, the checksum failure was both times for the
>>> block
>>> labelled "next" in the redo record.  Is this another case where the block
>>> needs to be reinitialized upon replay?
>>>
>>
>> Hmm, it looks like I fumbled the numbering of the backup blocks in the
>> b-tree split WAL record (in 9.4). I blame the comments; the comments where
>> the record is generated numbers the backup blocks starting from 1, but
>> XLR_BKP_BLOCK(x) and RestoreBackupBlock(...) used in replay number them
>> starting from 0.
>>
>> Attached is a patch that I think fixes them. In addition to the
>> rnext-reference, clearing the incomplete-split flag in the child page, had
>> a similar numbering mishap.
>>
>
> The seems to have fixed it.

Okay, thanks, committed.

Your torn-page generator seems to be very good at finding bugs - any 
chance you could publish it?

I wonder if it could've caught the similar mishap in the clearing of the 
incomplete-split flag. I think you'd a checkpoint to begin in the very 
narrow window between splitting a page and inserting the parent pointer.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dilip kumar
Дата:
Сообщение: Re: Allowing join removals for more join types
Следующее
От: "Erik Rijkers"
Дата:
Сообщение: pg_isready --username seems an empty promise