Another PANIC corrupt index/crash ...any thoughts?

Поиск
Список
Период
Сортировка
От Jeff Amiel
Тема Another PANIC corrupt index/crash ...any thoughts?
Дата
Msg-id 433140.68997.qm@web65505.mail.ac4.yahoo.com
обсуждение исходный текст
Ответы Re: Another PANIC corrupt index/crash ...any thoughts?  (Scott Marlowe <scott.marlowe@gmail.com>)
Re: Another PANIC corrupt index/crash ...any thoughts?  (Scott Marlowe <scott.marlowe@gmail.com>)
Список pgsql-general
About a month ago I posted about a database crash possibly caused by corrupt index..

Dec 30 17:41:57 db-1 postgres[28957]: [ID 748848 local0.crit] [34004622-1] 2009-12-30 17:41:57.825 CST    28957PANIC:
rightsibling 2019 of block 2018 is not next child of 1937 in index "sl_log_2_idx1" 

Has since happened again with a DIFFERENT index (interestingly also a slony related index)

Jan 29 15:17:42 db-1 postgres[29025]: [ID 748848 local0.crit] [4135622-1] 2010-01-29 15:17:42.915 CST    29025PANIC:
rightsibling 183 of block 182 is not next child of 158 in index "sl_seqlog_idx" 

I re-indexed the table.......and restarted the database and all appears well (shut down autovacuum and slony for a
whilefirst to get feet underneath and then restarted after a few hours with no apparent ill effects) 

Coincidentally (or not) started getting disk errors about a minute AFTER the above error (db storage is on a fibre
attachedSAN) 

/var/log/archive/log-2010-01-29.log:Jan 29 15:18:50 db-1 scsi_vhci: [ID 734749 kern.warning] WARNING: vhci_scsi_reset
0x1
/var/log/archive/log-2010-01-29.log:Jan 29 15:18:50 db-1 scsi: [ID 243001 kern.warning] WARNING:
/pci@0,0/pci10de,5d@d/pci1077,142@0/fp@0,0(fcp1): 
/var/log/archive/log-2010-01-29.log:Jan 29 15:18:52 db-1 scsi: [ID 107833 kern.warning] WARNING:
/scsi_vhci/disk@g000b08001c001958(sd9): 
/var/log/archive/log-2010-01-29.log:Jan 29 15:18:52 db-1 scsi: [ID 107833 kern.notice]     Requested Block: 206265378
             Error Block: 206265378 
/var/log/archive/log-2010-01-29.log:Jan 29 15:18:52 db-1 scsi: [ID 107833 kern.notice]     Vendor: Pillar
             Serial Number:              
/var/log/archive/log-2010-01-29.log:Jan 29 15:18:52 db-1 scsi: [ID 107833 kern.notice]     Sense Key: Unit Attention
/var/log/archive/log-2010-01-29.log:Jan 29 15:18:52 db-1 scsi: [ID 107833 kern.notice]     ASC: 0x29 (power on, reset,
orbus reset occurred), ASCQ: 0x0, FRU: 0x0 

Stack trace from recent crash is below:

Program terminated with signal 6, Aborted.
#0  0xfed00c57 in _lwp_kill () from /lib/libc.so.1
(gdb) bt
#0  0xfed00c57 in _lwp_kill () from /lib/libc.so.1
#1  0xfecfe40e in thr_kill () from /lib/libc.so.1
#2  0xfecad083 in raise () from /lib/libc.so.1
#3  0xfec90b19 in abort () from /lib/libc.so.1
#4  0x0821b6ea in errfinish (dummy=0) at elog.c:471
#5  0x0821c58f in elog_finish (elevel=22, fmt=0x82b7200 "right sibling %u of block %u is not next child of %u in index
\"%s\"")at elog.c:964 
#6  0x0809e0a8 in _bt_pagedel (rel=0x8602f78, buf=377580, stack=0x881d660, vacuum_full=0 '\0') at nbtpage.c:1141
#7  0x0809f73d in btvacuumscan (info=0x8043f60, stats=0x8578410, callback=0, callback_state=0x0, cycleid=20894) at
nbtree.c:936
#8  0x0809fb6d in btbulkdelete (fcinfo=0x0) at nbtree.c:547
#9  0x0821f268 in FunctionCall4 (flinfo=0x0, arg1=0, arg2=0, arg3=0, arg4=0) at fmgr.c:1215
#10 0x0809a7a7 in index_bulk_delete (info=0x8043f60, stats=0x0, callback=0x812fea0 <lazy_tid_reaped>,
callback_state=0x85765e8)at indexam.c:573 
#11 0x0812fe2c in lazy_vacuum_index (indrel=0x8602f78, stats=0x85769c8, vacrelstats=0x85765e8) at vacuumlazy.c:660
#12 0x08130432 in lazy_vacuum_rel (onerel=0x8602140, vacstmt=0x85d9f48) at vacuumlazy.c:487
#13 0x0812e7e8 in vacuum_rel (relid=140353352, vacstmt=0x85d9f48, expected_relkind=114 'r') at vacuum.c:1107
#14 0x0812f832 in vacuum (vacstmt=0x85d9f48, relids=0x85d9f38) at vacuum.c:400
#15 0x08186cee in AutoVacMain (argc=0, argv=0x0) at autovacuum.c:914
#16 0x08187150 in autovac_start () at autovacuum.c:178
#17 0x0818bec5 in ServerLoop () at postmaster.c:1252
#18 0x0818d045 in PostmasterMain (argc=3, argv=0x83399a8) at postmaster.c:966
#19 0x08152ba6 in main (argc=3, argv=0x83399a8) at main.c:188

Any thoughts on how I should proceed?
We are planning an upgrade to 8.4 in the short-term, but I can see no evidence of fixes since the 8.2 version that
wouldrelate to index corruption.  I have no real evidence of bad disks...iostat -E reports: 

# iostat -E
sd2       Soft Errors: 1 Hard Errors: 4 Transport Errors: 0
Vendor: Pillar   Product: Axiom 300        Revision: 0000 Serial No:
Size: 2.20GB <2200567296 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 4 Recoverable: 0
Illegal Request: 1 Predictive Failure Analysis: 0
sd3       Soft Errors: 1 Hard Errors: 32 Transport Errors: 0
Vendor: Pillar   Product: Axiom 300        Revision: 0000 Serial No:
Size: 53.95GB <53948448256 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 32 Recoverable: 0
Illegal Request: 1 Predictive Failure Analysis: 0
sd7       Soft Errors: 1 Hard Errors: 40 Transport Errors: 8
Vendor: Pillar   Product: Axiom 300        Revision: 0000 Serial No:
Size: 53.95GB <53948448256 bytes>
Media Error: 0 Device Not Ready: 1 No Device: 33 Recoverable: 0
Illegal Request: 1 Predictive Failure Analysis: 0
sd8       Soft Errors: 1 Hard Errors: 34 Transport Errors: 0
Vendor: Pillar   Product: Axiom 300        Revision: 0000 Serial No:
Size: 107.62GB <107622432256 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 34 Recoverable: 0
Illegal Request: 1 Predictive Failure Analysis: 0
sd9       Soft Errors: 1 Hard Errors: 32 Transport Errors: 2
Vendor: Pillar   Product: Axiom 300        Revision: 0000 Serial No:
Size: 215.80GB <215796153856 bytes>
Media Error: 0 Device Not Ready: 1 No Device: 29 Recoverable: 0
Illegal Request: 1 Predictive Failure Analysis: 0

Any insight would be appreciated.
PostgreSQL 8.2.12 on i386-pc-solaris2.10, compiled by GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath)





В списке pgsql-general по дате отправления:

Предыдущее
От: "Greg Sabino Mullane"
Дата:
Сообщение: Re: Versions RSS page is missing version(s)
Следующее
От: Yeb Havinga
Дата:
Сообщение: Re: Can LISTEN/NOTIFY deal with more than 100 every second?