Hanging backends and possible index corruption
От | Bernd Helmle |
---|---|
Тема | Hanging backends and possible index corruption |
Дата | |
Msg-id | 41AD83B5DEFB895033035BF1@apophis.credativ.lan обсуждение исходный текст |
Ответы |
Re: Hanging backends and possible index corruption
(Andres Freund <andres@2ndquadrant.com>)
Re: Hanging backends and possible index corruption (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
We are currently analyzing an issue at one of our customers PostgreSQL database. The current version is 9.1.6 (update to 9.1.7 is scheduled for next monday, no downtime possible before). It runs on POWER7 (pSeries 740) on an RHEL6.3 64-bit LPAR. The packages are built from PGDG SVN sources, no special tweaks added. We saw no hardware related errors on this machine, nor any crashes. What currently happens on this machine are hanging statements (SELECTs and INSERTs occasionally) with 100% CPU. After some investigation it turned out that the corresponding backends are seeking within an index file over and over again in a loop. Looking into the hanging queries i've recognized certain keys which seems to have the problem, other keys used in the WHERE condition run smoothly. Turning off index and bitmap index scans caused the suspicious keys to return results, too. So i've saved the index file (normal BTree index with a single bigint column), did a REINDEX and the problem was gone. Looking at the index file with pg_filedump and pgbtreecheck from Alvaro gave me the following output: pgbtreecheck gives warnings about pages' parents and then loops visiting the same pages over and over again: NOTICE: fast root: block 290 at level 2 NOTICE: setting page parents WARNING: block 12709 already had a parent (8840); new parent is 12177 WARNING: block 12710 already had a parent (12439); new parent is 10835 NOTICE: done setting parents NOTICE: Checking forward scan of level 0, starting at block 1 -- loop starts WARNING: right sibling 12710 does not point left to me (11680); points to 10924 instead Looking into the relevant pages and their prev and next pointers give me the following: pg_filedump -i -R 11680 ~/tmp/100252789 | grep Blocks Blocks: Previous (12651) Next (12710) Level (0) CycleId (0) pg_filedump -i -R 10924 ~/tmp/100252789 | grep Blocks Blocks: Previous (10923) Next (12710) Level (0) CycleId (0) pg_filedump -i -R 12710 ~/tmp/100252789 | grep Blocks Blocks: Previous (10924) Next (10925) Level (0) CycleId (0) pg_filedump -i -R 10925 ~/tmp/100252789 | grep Blocks Blocks: Previous (12710) Next (10926) Level (0) CycleId (0) pg_filedump -i -R 12709 ~/tmp/100252789 | grep Blocks Blocks: Previous (8849) Next (8850) Level (0) CycleId (0) pg_filedump -i -R 8840 ~/tmp/100252789 | grep Blocks Blocks: Previous (8555) Next (9125) Level (1) CycleId (0) pg_filedump -i -R 12439 ~/tmp/100252789 | grep Blocks Blocks: Previous (11405) Next (11690) Level (1) CycleId (0) $ pg_filedump -i -R 12177 ~/tmp/100252789 | grep Blocks Blocks: Previous (11690) Next (0) Level (1) CycleId (0) $ pg_filedump -i -R 10835 ~/tmp/100252789 | grep Blocks Blocks: Previous (10550) Next (11120) Level (1) CycleId (0) This symptom happened three times in the last couple of weeks now. Looking at the numbers doesn't give me the impression that some flaky hardware could be involved. What else can we do to track down this problem, any suggestions? -- Thanks Bernd
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Bruce MomjianДата:
Сообщение: Using COPY FREEZE with pg_restore --single-transaction
Следующее
От: Peter EisentrautДата:
Сообщение: Re: BUG #6510: A simple prompt is displayed using wrong charset