Обсуждение: pg 8.4 crashing.
I've got a problem with pg 8.4.4 crashing with a sig 6 abort due to a panic on failing to add a right sibling in an index. Log output: PANIC: failed to add item to the right sibling in index "logged_in_uid" STATEMENT: INSERT INTO logged_in ... Now, the column that's failing here is a serial that isn't referenced in the SQL statement, i.e. it gets the default nextval. OS: Ubuntu 8.04, Pg: 8.4.4 The only arg to ./configure was to change the prefix dir to ~/pg84 I've had this problem before, but didn't have the time to track it down. Now I do, and it's on a server I can play on a bit if I need to. -- To understand recursion, one must first understand recursion.
On Thu, Sep 23, 2010 at 2:28 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > I've got a problem with pg 8.4.4 crashing with a sig 6 abort due to a > panic on failing to add a right sibling in an index. Log output: > > PANIC: failed to add item to the right sibling in index "logged_in_uid" > STATEMENT: INSERT INTO logged_in ... > > Now, the column that's failing here is a serial that isn't referenced > in the SQL statement, i.e. it gets the default nextval. > > OS: Ubuntu 8.04, > Pg: 8.4.4 The only arg to ./configure was to change the prefix dir to ~/pg84 > > I've had this problem before, but didn't have the time to track it > down. Now I do, and it's on a server I can play on a bit if I need > to. Hm, is this repeatable? Have you ruled out corruption? merlin
Scott Marlowe <scott.marlowe@gmail.com> writes: > I've got a problem with pg 8.4.4 crashing with a sig 6 abort due to a > panic on failing to add a right sibling in an index. Log output: > PANIC: failed to add item to the right sibling in index "logged_in_uid" > STATEMENT: INSERT INTO logged_in ... If you can apply this patch: http://archives.postgresql.org/pgsql-committers/2010-08/msg00365.php it should tell you which index page is causing the problem. Then please dump that page with pg_filedump and send it in. regards, tom lane
On Thu, Sep 23, 2010 at 2:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Scott Marlowe <scott.marlowe@gmail.com> writes: >> I've got a problem with pg 8.4.4 crashing with a sig 6 abort due to a >> panic on failing to add a right sibling in an index. Log output: > >> PANIC: failed to add item to the right sibling in index "logged_in_uid" >> STATEMENT: INSERT INTO logged_in ... > > If you can apply this patch: > http://archives.postgresql.org/pgsql-committers/2010-08/msg00365.php > it should tell you which index page is causing the problem. Then > please dump that page with pg_filedump and send it in. Patch applied. This crash happens on average about once a day. Happened twice yesterday, but hasn't happened today. I'll report back when / if it does it again. -- To understand recursion, one must first understand recursion.
On Fri, Sep 24, 2010 at 5:06 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Thu, Sep 23, 2010 at 8:57 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >> On Thu, Sep 23, 2010 at 2:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Scott Marlowe <scott.marlowe@gmail.com> writes: >>>> I've got a problem with pg 8.4.4 crashing with a sig 6 abort due to a >>>> panic on failing to add a right sibling in an index. Log output: >>> >>>> PANIC: failed to add item to the right sibling in index "logged_in_uid" >>>> STATEMENT: INSERT INTO logged_in ... >>> >>> If you can apply this patch: >>> http://archives.postgresql.org/pgsql-committers/2010-08/msg00365.php >>> it should tell you which index page is causing the problem. Then >>> please dump that page with pg_filedump and send it in. >> >> Patch applied. This crash happens on average about once a day. >> Happened twice yesterday, but hasn't happened today. I'll report back >> when / if it does it again. > > OK, got an error on it today. Looks like a corrupted index. Note > that this is on a machine that was very well tested, but who knows, a > single bit error could have occurred. pg_filedump attached. P.s. I reindexed it to get rid of the error. I'll keep an eye for it coming back. -- To understand recursion, one must first understand recursion.
Scott Marlowe <scott.marlowe@gmail.com> writes: >> On Thu, Sep 23, 2010 at 2:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> If you can apply this patch: >>> http://archives.postgresql.org/pgsql-committers/2010-08/msg00365.php >>> it should tell you which index page is causing the problem. �Then >>> please dump that page with pg_filedump and send it in. > OK, got an error on it today. Looks like a corrupted index. Note > that this is on a machine that was very well tested, but who knows, a > single bit error could have occurred. pg_filedump attached. Hm, I should've specified that -i -f options produce the most useful output from pg_filedump :-(. It's pretty obvious that you've got 500+ bytes worth of seriously corrupted data there, but at least in this view it's hard to tell if there's any clear pattern to the bad data. I'd be inclined to suspect disk subsystem malfeasance not a RAM problem. In particular I'm suspicious that the original corruption amounted to exactly one 512-byte sector. If the index page wasn't full at the time, it's possible that additional insertions could have occurred without provoking obvious errors. That could have shifted and enlarged the damaged area to what we see here, which looks to be 540 bytes starting at page offset 552 (if I counted on my fingers correctly). I count seven sane-looking line pointer items within the damaged-looking range, which account for 7*4 = 28 bytes, so if those got inserted later then there was exactly 512 bytes worth of damage initially. It's harder to tell whether there were exactly ten insertions before the damaged area to shift it up from offset 512 to offset 552, but given that the other math comes out right I'm prepared to bet a nickle or two on this theory. regards, tom lane