Re: PANIC: btree_split_redo: lost left sibling?

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: PANIC: btree_split_redo: lost left sibling?
Дата
Msg-id 8856.1092770968@sss.pgh.pa.us
обсуждение исходный текст
Ответ на PANIC: btree_split_redo: lost left sibling?  (Andrew Sukow <creoe@shaw.ca>)
Список pgsql-general
Andrew Sukow <creoe@shaw.ca> writes:
> Our postgres system crashed and upon restarting it our database had the following errors.  The error log was 4.5 gigs
whichis much larger than usual.  We looked online for information about lost left siblings and how to fix the data and
notlose the 400 million records we have.  Anyone have an idea what's the matter and what the fix is? 

> PANIC:  btree_split_redo: lost left sibling

Looking at the code, the most probable explanation seems to be that the
WAL log contains a reference to a btree page that doesn't exist on disk
(ie, the index file on disk is too short to contain that page number).
The code is panicing because it expects that page should exist already.
I have to agree with it --- it would seem you are suffering from
filesystem misfeasance.  Are you close to being out of disk space
by any chance?

What I would suggest doing is modifying the error message (it's in
src/backend/access/nbtree/nbtxlog.c, about line 256 in 7.4) to report
the index's DB/relfileno and the block number it's failing to access.
Or if you built with debug enabled, you could gdb the core dump and
extract those numbers that way.  Knowing the file and the length it
needs to be, you could append zeroes to the file to make it long enough,
and then the replay should succeed.

A quicker-and-dirtier solution is to pass extend = true instead of false
to the XLogReadBuffer just above this, but I counsel doing the file
extensions manually as sketched above, so that you will know exactly
which index(es) have got this problem.  If I were doing this I would
certainly want to manually REINDEX those indexes afterwards.  The
specific page that's being requested will be filled in correctly from
the WAL entry, but who knows what else is wrong elsewhere in the index?

BTW, what do you mean by "the error log was 4.5 gigs"?  What you showed
us was only 10 lines.

            regards, tom lane

В списке pgsql-general по дате отправления:

Предыдущее
От: Devrim GUNDUZ
Дата:
Сообщение: (S)RPMS for 7.4.4 released.
Следующее
От: Frank van Vugt
Дата:
Сообщение: Re: Does a 'stable' deferred trigger execution order exist? -> answer: yes