Re: corrupt pages detected by enabling checksums

Поиск

Список

Период

Сортировка

От	Andres Freund
Тема	Re: corrupt pages detected by enabling checksums
Дата	4 апреля 2013 г. 03:59:00
Msg-id	20130404005843.GD19178@awork2.anarazel.de обсуждение исходный текст
Ответ на	Re: corrupt pages detected by enabling checksums (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: corrupt pages detected by enabling checksums (Andres Freund <andres@2ndquadrant.com>)
Список	pgsql-hackers

Дерево обсуждения

On 2013-04-03 20:45:51 -0400, Tom Lane wrote:
> andres@anarazel.de (Andres Freund) writes:
> > Looking at the page lsn's with dd I noticed something peculiar:
> 
> > page 0:
> > 01 00 00 00 18 c2 00 31 => 1/3100C218
> > page 1:
> > 01 00 00 00 80 44 01 31 => 1/31014480
> > page 10:
> > 01 00 00 00 60 ce 05 31 => 1/3105ce60
> > page 43:
> > 01 00 00 00 58 7a 16 31 => 1/31167a58
> > page 44:
> > 01 00 00 00 f0 99 16 31 => 1/311699f0
> > page 45:
> > 00 00 00 00 00 00 00 00 => 0/0
> > page 90:
> > 01 00 00 00 90 17 1d 32 => 1/321d1790
> > page 91:
> > 01 00 00 00 38 ef 1b 32 => 1/321bef38
> 
> > So we have written out pages that are after pages without a LSN that
> > have an LSN thats *beyond* the point XLOG has successfully been written
> > to disk (1/31169A38)?
> 
> If you're looking into the FPIs, those would contain the page's older
> LSN, not the one assigned by the current WAL record.

Nope, thats from the heap, and the LSNs are *newer* than what startup
recovered to. I am pretty sure by now we are missing out on valid WAL, I
am just not sure why.

Unfortunately we can't easily diagnose what happened at:
27692  2013-04-03 10:09:15.647 PDT:LOG:  incorrect resource manager data checksum in record at 1/31169A68
since the startup process wrote its end of recovery checkpoint there:
rmgr: XLOG        len (rec/tot):     72/   104, tx:          0, lsn: 1/31169A68, prev 1/31169A38, bkp: 0000, desc:
checkpoint:redo 1/31169A68; tli 1; prev tli 1; fpw true; xid 0/26254999; oid 843781; multi 1; offset 0; oldest xid 1799
inDB 1; oldest multi 1 in DB 1; oldest running xid 0; shutdown
 

Starting from a some blocks in that wal segments later:
pg_xlogdump /tmp/tmp/data2/pg_xlog/000000010000000100000031 -s 1/3116c000 -n 10
first record is after 1/3116C000, at 1/3116D9D8, skipping over 6616 bytes
rmgr: Heap        len (rec/tot):     51/    83, tx:   26254999, lsn: 1/3116D9D8, prev 1/3116BA20, bkp: 0000, desc:
update:rel 1663/16384/835589; tid 38/148 xmax 26254999 ; new tid 44/57 xmax 0
 
rmgr: Btree       len (rec/tot):     34/    66, tx:   26254999, lsn: 1/3116DA30, prev 1/3116D9D8, bkp: 0000, desc:
insert:rel 1663/16384/835590; tid 25/319
 
rmgr: Heap        len (rec/tot):     51/    83, tx:   26255000, lsn: 1/3116DA78, prev 1/3116DA30, bkp: 0000, desc:
update:rel 1663/16384/835589; tid 19/214 xmax 26255000 ; new tid 44/58 xmax 0
 

the records continue again.

Greetings,


Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Tom Lane
Дата: 04 апреля 2013 г., 03:45:58
Сообщение: Re: corrupt pages detected by enabling checksums

Следующее

От: Ian Lawrence Barwick
Дата: 04 апреля 2013 г., 04:11:38
Сообщение: Minor erratum for 9.2.4 release notes

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: corrupt pages detected by enabling checksums

Предыдущее

Следующее