Обсуждение: Hanging backends and possible index corruption
We are currently analyzing an issue at one of our customers PostgreSQL database. The current version is 9.1.6 (update to 9.1.7 is scheduled for next monday, no downtime possible before). It runs on POWER7 (pSeries 740) on an RHEL6.3 64-bit LPAR. The packages are built from PGDG SVN sources, no special tweaks added. We saw no hardware related errors on this machine, nor any crashes. What currently happens on this machine are hanging statements (SELECTs and INSERTs occasionally) with 100% CPU. After some investigation it turned out that the corresponding backends are seeking within an index file over and over again in a loop. Looking into the hanging queries i've recognized certain keys which seems to have the problem, other keys used in the WHERE condition run smoothly. Turning off index and bitmap index scans caused the suspicious keys to return results, too. So i've saved the index file (normal BTree index with a single bigint column), did a REINDEX and the problem was gone. Looking at the index file with pg_filedump and pgbtreecheck from Alvaro gave me the following output: pgbtreecheck gives warnings about pages' parents and then loops visiting the same pages over and over again: NOTICE: fast root: block 290 at level 2 NOTICE: setting page parents WARNING: block 12709 already had a parent (8840); new parent is 12177 WARNING: block 12710 already had a parent (12439); new parent is 10835 NOTICE: done setting parents NOTICE: Checking forward scan of level 0, starting at block 1 -- loop starts WARNING: right sibling 12710 does not point left to me (11680); points to 10924 instead Looking into the relevant pages and their prev and next pointers give me the following: pg_filedump -i -R 11680 ~/tmp/100252789 | grep Blocks Blocks: Previous (12651) Next (12710) Level (0) CycleId (0) pg_filedump -i -R 10924 ~/tmp/100252789 | grep Blocks Blocks: Previous (10923) Next (12710) Level (0) CycleId (0) pg_filedump -i -R 12710 ~/tmp/100252789 | grep Blocks Blocks: Previous (10924) Next (10925) Level (0) CycleId (0) pg_filedump -i -R 10925 ~/tmp/100252789 | grep Blocks Blocks: Previous (12710) Next (10926) Level (0) CycleId (0) pg_filedump -i -R 12709 ~/tmp/100252789 | grep Blocks Blocks: Previous (8849) Next (8850) Level (0) CycleId (0) pg_filedump -i -R 8840 ~/tmp/100252789 | grep Blocks Blocks: Previous (8555) Next (9125) Level (1) CycleId (0) pg_filedump -i -R 12439 ~/tmp/100252789 | grep Blocks Blocks: Previous (11405) Next (11690) Level (1) CycleId (0) $ pg_filedump -i -R 12177 ~/tmp/100252789 | grep Blocks Blocks: Previous (11690) Next (0) Level (1) CycleId (0) $ pg_filedump -i -R 10835 ~/tmp/100252789 | grep Blocks Blocks: Previous (10550) Next (11120) Level (1) CycleId (0) This symptom happened three times in the last couple of weeks now. Looking at the numbers doesn't give me the impression that some flaky hardware could be involved. What else can we do to track down this problem, any suggestions? -- Thanks Bernd
On 2013-01-25 16:24:52 +0100, Bernd Helmle wrote: > We are currently analyzing an issue at one of our customers PostgreSQL > database. > > The current version is 9.1.6 (update to 9.1.7 is scheduled for next monday, > no downtime possible before). It runs on POWER7 (pSeries 740) on an RHEL6.3 > 64-bit LPAR. The packages are built from PGDG SVN sources, no special tweaks > added. We saw no hardware related errors on this machine, nor any crashes. > > What currently happens on this machine are hanging statements (SELECTs and > INSERTs occasionally) with 100% CPU. After some investigation it turned out > that the corresponding backends are seeking within an index file over and > over again in a loop. Looking into the hanging queries i've recognized > certain keys which seems to have the problem, other keys used in the WHERE > condition run smoothly. Turning off index and bitmap index scans caused the > suspicious keys to return results, too. > > So i've saved the index file (normal BTree index with a single bigint > column), did a REINDEX and the problem was gone. Looking at the index file > with pg_filedump and pgbtreecheck from Alvaro gave me the following output: > > pgbtreecheck gives warnings about pages' parents and then loops visiting the > same pages over and over again: > > NOTICE: fast root: block 290 at level 2 > NOTICE: setting page parents > WARNING: block 12709 already had a parent (8840); new parent is 12177 > WARNING: block 12710 already had a parent (12439); new parent is 10835 > NOTICE: done setting parents > NOTICE: Checking forward scan of level 0, starting at block 1 > > -- loop starts > > WARNING: right sibling 12710 does not point left to me (11680); points to > 10924 instead > > Looking into the relevant pages and their prev and next pointers give me the > following: > > pg_filedump -i -R 11680 ~/tmp/100252789 | grep Blocks > Blocks: Previous (12651) Next (12710) Level (0) CycleId (0) > > pg_filedump -i -R 10924 ~/tmp/100252789 | grep Blocks > Blocks: Previous (10923) Next (12710) Level (0) CycleId (0) > > pg_filedump -i -R 12710 ~/tmp/100252789 | grep Blocks > Blocks: Previous (10924) Next (10925) Level (0) CycleId (0) > > pg_filedump -i -R 10925 ~/tmp/100252789 | grep Blocks > Blocks: Previous (12710) Next (10926) Level (0) CycleId (0) > > pg_filedump -i -R 12709 ~/tmp/100252789 | grep Blocks > Blocks: Previous (8849) Next (8850) Level (0) CycleId (0) > > pg_filedump -i -R 8840 ~/tmp/100252789 | grep Blocks > Blocks: Previous (8555) Next (9125) Level (1) CycleId (0) > > pg_filedump -i -R 12439 ~/tmp/100252789 | grep Blocks > Blocks: Previous (11405) Next (11690) Level (1) CycleId (0) > > $ pg_filedump -i -R 12177 ~/tmp/100252789 | grep Blocks > Blocks: Previous (11690) Next (0) Level (1) CycleId (0) > > $ pg_filedump -i -R 10835 ~/tmp/100252789 | grep Blocks > Blocks: Previous (10550) Next (11120) Level (1) CycleId (0) > > This symptom happened three times in the last couple of weeks now. Looking > at the numbers doesn't give me the impression that some flaky hardware could > be involved. What else can we do to track down this problem, any > suggestions? Did you reindex after upgrading to 9.1.6? Did you ever have any crashes or failovers before upgrading to 9.1.6? I have seen pretty similar symptoms caused by "Fix persistence marking of shared buffers during WAL replay" in 9.1.6. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
--On 25. Januar 2013 16:28:16 +0100 Andres Freund <andres@2ndquadrant.com> wrote: > Did you reindex after upgrading to 9.1.6? Did you ever have any crashes > or failovers before upgrading to 9.1.6? > I have seen pretty similar symptoms caused by "Fix persistence marking > of shared buffers during WAL replay" in 9.1.6. Hmm it happened only on a single heavily used table for now and this table was REINDEXed twice after updating to 9.1.6 (every time the issue occured). -- Thanks Bernd
Bernd Helmle <mailings@oopsware.de> writes:
> So i've saved the index file (normal BTree index with a single bigint 
> column), did a REINDEX and the problem was gone. Looking at the index file 
> with pg_filedump and pgbtreecheck from Alvaro gave me the following output:
> ...
Don't know how careful pgbtreecheck is.  The pg_filedump output isn't
very helpful because you filtered away the flags, so we can't tell if
any of these pages are deleted.  (If they are, the duplicate-looking
links might not be errors, since we intentionally don't reset a deleted
page's left/right links when deleting it.)
Could we see the whole special-space dump for each of the pages you're
worried about?
One thought that occurs to me is that POWER is a weak-memory-ordering
architecture, so that it's a tenable idea that this has something to do
with changing page links while not holding sufficient lock on the page.
I don't see btree doing that anywhere, but ...
BTW, how long has this installation been around, and when did you start
seeing funny behavior?  Can you say with reasonable confidence that the
bug was *not* present in any older PG versions?
        regards, tom lane
			
		
--On 25. Januar 2013 20:37:32 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Don't know how careful pgbtreecheck is.  The pg_filedump output isn't
> very helpful because you filtered away the flags, so we can't tell if
> any of these pages are deleted.  (If they are, the duplicate-looking
> links might not be errors, since we intentionally don't reset a deleted
> page's left/right links when deleting it.)
>
Ah, wasn't aware of this.
> Could we see the whole special-space dump for each of the pages you're
> worried about?
>
Attached
> One thought that occurs to me is that POWER is a weak-memory-ordering
> architecture, so that it's a tenable idea that this has something to do
> with changing page links while not holding sufficient lock on the page.
> I don't see btree doing that anywhere, but ...
>
> BTW, how long has this installation been around, and when did you start
> seeing funny behavior?  Can you say with reasonable confidence that the
> bug was *not* present in any older PG versions?
>
This machine started in production around august last year with (afair)
9.1.5. There also were performance and stress tests on this machine before
it went into production, with no noticable problems.
However, what i missed before is that there were some trouble with the
storage multipaths. Seems early january the machine lost some of it's paths
to the SAN, but they were recovered a few seconds later...so i cannot
exclude this as the cause anymore, though the paths are redundant. What
strikes me is that the index was recreated in the meantime after this
issue...
We will watch this machine the next couple of weeks more closely, if the
issue comes back again.
--
Thanks
    Bernd