Re: Tracking down log segment corruption

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Tracking down log segment corruption
Дата
Msg-id 13174.1272819723@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Tracking down log segment corruption  (Gordon Shannon <gordo169@gmail.com>)
Ответы Re: Tracking down log segment corruption  (Gordon Shannon <gordo169@gmail.com>)
Список pgsql-general
Gordon Shannon <gordo169@gmail.com> writes:
> I just got ran into the same problem.  Both servers are running 8.4.3, and
> the standby server had been running for 2 days, processing many thousands of
> logs successfully.  Here's my error:

> 4158   2010-05-02 11:12:09 EDT [26445]LOG:  restored log file
> "0000000100003C77000000C3" from archive
> 4158   2010-05-02 11:12:09 EDT [26446]LOG:  restored log file
> "0000000100003C77000000C4" from archive
> 4158   2010-05-02 11:12:09 EDT [26447]WARNING:  specified item offset is too
> large
> 4158   2010-05-02 11:12:09 EDT [26448]CONTEXT:  xlog redo insert: rel
> 48777166/22362/48778276; tid 2/2
> 4158   2010-05-02 11:12:09 EDT [26449]PANIC:  btree_insert_redo: failed to
> add item
> 4158   2010-05-02 11:12:09 EDT [26450]CONTEXT:  xlog redo insert: rel
> 48777166/22362/48778276; tid 2/2
> 4151   2010-05-02 11:12:09 EDT [1]LOG:  startup process (PID 4158) was
> terminated by signal 6: Aborted
> 4151   2010-05-02 11:12:09 EDT [2]LOG:  terminating any other active server
> processes

Hmm ... AFAICS the only way to get that message when the incoming TID's
offsetNumber is only 2 is for the index page to be completely empty
(not zeroes, else PageAddItem's sanity check would have triggered,
but valid and empty).  What that smells like is a software bug, like
failing to emit a WAL record in a case where it was necessary.  Can you
identify which index this was?  (Look for relfilenode 48778276 in the
database with OID 22362.)  If so, can you give us any hints about
unusual things that might have been done with that index?

> Any suggestions?

As far as recovering goes, there's probably not much you can do except
resync the standby from scratch.  But it would be nice to get to the
bottom of the problem, so that we can fix the bug.  Have you got an
archive of this xlog segment and the ones before it, and would you be
willing to let a developer look at them?

            regards, tom lane

В списке pgsql-general по дате отправления:

Предыдущее
От: Gordon Shannon
Дата:
Сообщение: Re: Tracking down log segment corruption
Следующее
От: Gordon Shannon
Дата:
Сообщение: Re: Tracking down log segment corruption