Re: failure with pg_dump

Поиск
Список
Период
Сортировка
От Mija Lee
Тема Re: failure with pg_dump
Дата
Msg-id 475EBBFD.1020005@scharp.org
обсуждение исходный текст
Ответ на Re: failure with pg_dump  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: failure with pg_dump  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-novice
We've had a number of odd things that have been going on that I can't
really explain, and that don't seem to result in log entries. Here's
some info:

- this is running 8.2.4 on a solaris 10 machine
- I reran the dump after posting and these problems did not reoccur
- We have a number of replicated schemas and tables on this server.
There were other problems with the replication that happened earlier in
the evening.
- we have been having some very odd problems where our replication
scripts hang intermittantly. For the life of me I can't figure out why,
but when this happens, I look for processes that are idle in transaction
that are more than one day old and kill them. That seems to allow the
replication to finish.  I have a few users that use a variety of
products to view and manipulate the data in these tables (tableau,
access, excel, ems, phppgadmin, dbvisualizer) and it seems like some
connections/transactions never terminate, but I can't figure out which
ones or why. I've been struggling with this problem for some time, but
have never had an issue with the stalled replication affecting the dump.
I was actually hoping that this error would help shed light on the
replication problem.

Mija

Tom Lane wrote:
> Mija Lee <mija@scharp.org> writes:
>> I have a script that I use to do regular dumps of my database. Over the
>> weekend it failed, and produced the following error message. I'm not
>> sure why this would have happened, how I would find out which index is
>> referenced by 136451098, or where this select came from.
>
> It sounds like system catalog corruption, which is not good :-(.
>
>> pg_dump.sqlhost: Error message from server: ERROR:  cache lookup failed
>> for index 136451098
>> pg_dump.sqlhost: The command was: SELECT t.tableoid, t.oid, t.relname as
>> indexname, pg_catalog.pg_get_indexdef(i.indexrelid) as indexdef,
>> t.relnatts as indnkeys, i.indkey, i.indisclustered, c.contype,
>> c.conname, c.tableoid as contableoid, c.oid as conoid, (SELECT spcname
>> FROM pg_catalog.pg_tablespace s WHERE s.oid = t.reltablespace) as
>> tablespace, array_to_string(t.reloptions, ', ') as options FROM
>> pg_catalog.pg_index i JOIN pg_catalog.pg_class t ON (t.oid =
>> i.indexrelid) LEFT JOIN pg_catalog.pg_depend d ON (d.classid =
>> t.tableoid AND d.objid = t.oid AND d.deptype = 'i') LEFT JOIN
>> pg_catalog.pg_constraint c ON (d.refclassid = c.tableoid AND d.refobjid
>> = c.oid) WHERE i.indrelid = '136451090'::pg_catalog.oid ORDER BY indexname
>
> That looks like pg_dump's query to get information about the indexes of
> a particular table.  So apparently the problem index is one of the ones
> for the table with OID 136451090.  The easiest way to find out which one
> that is is
>     select '136451090'::regclass;
> Trying \d on each of that table's indexes in succession would tell you
> which one is trashed.
>
> As for fixing it, the $64 question is how extensive is the catalog
> corruption.  I see no very good reason to hope that only this one index
> is affected :-(.  What you probably want to do is try to get a clean
> pg_dump then initdb and reload --- at least that's how I'd approach it,
> rather than hoping that there's no lurking problems remaining after you
> hack your way around the one you can see.
>
> What I'd try first is a REINDEX on pg_class.  If that doesn't help,
> try to delete the pg_index row linking 136451098 and 136451090.
>
> What PG version is this, anyway, and did anything weird happen on your
> system that might explain data corruption?
>
>             regards, tom lane

В списке pgsql-novice по дате отправления:

Предыдущее
От: Isaac Vetter
Дата:
Сообщение: Re: schema question
Следующее
От: Tom Lane
Дата:
Сообщение: Re: failure with pg_dump