Обсуждение: pg 8.4 crashing.

Поиск
Список
Период
Сортировка

pg 8.4 crashing.

От
Scott Marlowe
Дата:
I've got a problem with pg 8.4.4 crashing with a sig 6 abort due to a
panic on failing to add a right sibling in an index.  Log output:

PANIC:  failed to add item to the right sibling in index "logged_in_uid"
STATEMENT:  INSERT INTO logged_in ...

Now, the column that's failing here is a serial that isn't referenced
in the SQL statement, i.e. it gets the default nextval.

OS: Ubuntu 8.04,
Pg: 8.4.4  The only arg to ./configure was to change the prefix dir to ~/pg84

I've had this problem before, but didn't have the time to track it
down.  Now I do, and it's on a server I can play on a bit if I need
to.

--
To understand recursion, one must first understand recursion.

Re: pg 8.4 crashing.

От
Merlin Moncure
Дата:
On Thu, Sep 23, 2010 at 2:28 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> I've got a problem with pg 8.4.4 crashing with a sig 6 abort due to a
> panic on failing to add a right sibling in an index.  Log output:
>
> PANIC:  failed to add item to the right sibling in index "logged_in_uid"
> STATEMENT:  INSERT INTO logged_in ...
>
> Now, the column that's failing here is a serial that isn't referenced
> in the SQL statement, i.e. it gets the default nextval.
>
> OS: Ubuntu 8.04,
> Pg: 8.4.4  The only arg to ./configure was to change the prefix dir to ~/pg84
>
> I've had this problem before, but didn't have the time to track it
> down.  Now I do, and it's on a server I can play on a bit if I need
> to.

Hm, is this repeatable? Have you ruled out corruption?

merlin

Re: pg 8.4 crashing.

От
Tom Lane
Дата:
Scott Marlowe <scott.marlowe@gmail.com> writes:
> I've got a problem with pg 8.4.4 crashing with a sig 6 abort due to a
> panic on failing to add a right sibling in an index.  Log output:

> PANIC:  failed to add item to the right sibling in index "logged_in_uid"
> STATEMENT:  INSERT INTO logged_in ...

If you can apply this patch:
http://archives.postgresql.org/pgsql-committers/2010-08/msg00365.php
it should tell you which index page is causing the problem.  Then
please dump that page with pg_filedump and send it in.

            regards, tom lane

Re: pg 8.4 crashing.

От
Scott Marlowe
Дата:
On Thu, Sep 23, 2010 at 2:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Scott Marlowe <scott.marlowe@gmail.com> writes:
>> I've got a problem with pg 8.4.4 crashing with a sig 6 abort due to a
>> panic on failing to add a right sibling in an index.  Log output:
>
>> PANIC:  failed to add item to the right sibling in index "logged_in_uid"
>> STATEMENT:  INSERT INTO logged_in ...
>
> If you can apply this patch:
> http://archives.postgresql.org/pgsql-committers/2010-08/msg00365.php
> it should tell you which index page is causing the problem.  Then
> please dump that page with pg_filedump and send it in.

Patch applied.  This crash happens on average about once a day.
Happened twice yesterday, but hasn't happened today.  I'll report back
when / if it does it again.

--
To understand recursion, one must first understand recursion.

Re: pg 8.4 crashing.

От
Scott Marlowe
Дата:
On Fri, Sep 24, 2010 at 5:06 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Thu, Sep 23, 2010 at 8:57 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>> On Thu, Sep 23, 2010 at 2:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Scott Marlowe <scott.marlowe@gmail.com> writes:
>>>> I've got a problem with pg 8.4.4 crashing with a sig 6 abort due to a
>>>> panic on failing to add a right sibling in an index.  Log output:
>>>
>>>> PANIC:  failed to add item to the right sibling in index "logged_in_uid"
>>>> STATEMENT:  INSERT INTO logged_in ...
>>>
>>> If you can apply this patch:
>>> http://archives.postgresql.org/pgsql-committers/2010-08/msg00365.php
>>> it should tell you which index page is causing the problem.  Then
>>> please dump that page with pg_filedump and send it in.
>>
>> Patch applied.  This crash happens on average about once a day.
>> Happened twice yesterday, but hasn't happened today.  I'll report back
>> when / if it does it again.
>
> OK, got an error on it today.  Looks like a corrupted index.  Note
> that this is on a machine that was very well tested, but who knows, a
> single bit error could have occurred.  pg_filedump attached.

P.s. I reindexed it to get rid of the error.  I'll keep an eye for it
coming back.


--
To understand recursion, one must first understand recursion.

Re: pg 8.4 crashing.

От
Tom Lane
Дата:
Scott Marlowe <scott.marlowe@gmail.com> writes:
>> On Thu, Sep 23, 2010 at 2:17 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> If you can apply this patch:
>>> http://archives.postgresql.org/pgsql-committers/2010-08/msg00365.php
>>> it should tell you which index page is causing the problem. �Then
>>> please dump that page with pg_filedump and send it in.

> OK, got an error on it today.  Looks like a corrupted index.  Note
> that this is on a machine that was very well tested, but who knows, a
> single bit error could have occurred.  pg_filedump attached.

Hm, I should've specified that -i -f options produce the most useful
output from pg_filedump :-(.  It's pretty obvious that you've got 500+
bytes worth of seriously corrupted data there, but at least in this view
it's hard to tell if there's any clear pattern to the bad data.

I'd be inclined to suspect disk subsystem malfeasance not a RAM problem.
In particular I'm suspicious that the original corruption amounted to
exactly one 512-byte sector.  If the index page wasn't full at the time,
it's possible that additional insertions could have occurred without
provoking obvious errors.  That could have shifted and enlarged the
damaged area to what we see here, which looks to be 540 bytes starting
at page offset 552 (if I counted on my fingers correctly).  I count
seven sane-looking line pointer items within the damaged-looking range,
which account for 7*4 = 28 bytes, so if those got inserted later then
there was exactly 512 bytes worth of damage initially.  It's harder
to tell whether there were exactly ten insertions before the damaged
area to shift it up from offset 512 to offset 552, but given that the
other math comes out right I'm prepared to bet a nickle or two on this
theory.

            regards, tom lane