Обсуждение: How to reindex when unable to open relation?

Поиск
Список
Период
Сортировка

How to reindex when unable to open relation?

От
Darren Reed
Дата:
Inserts into my table started generating an error message (see below),
and there was a log message that suggested I reindex.

postgres seems to start and stop ok, but no operations on the tables
seem to work.

l# /usr/pkg/bin/pg_dump -s -U postgres
pg_dump: failed sanity check, parent table OID 16401 of pg_rewrite entry
OID 16403 not found

Is this recoverable without using a backup?

Darren

ERROR:  could not open relation with OID 16399
STATEMENT:   INSERT INTO table (cola,colb,colc,cold,cole) VALUES
(1,'fred',2,'john',true);
LOG:  unexpected EOF on client connection
ERROR:  could not open relation with OID 16399
STATEMENT:   SELECT * FROM table WHERE (ignore IS NOT TRUE) AND (region
= 'usa') ORDER BY date DESC LIMIT 300 OFFSET 0;
ERROR:  index "pg_attrdef_adrelid_adnum_index" contains unexpected zero
page at block 1
HINT:  Please REINDEX it.
STATEMENT:  LOCK TABLE public.ifl IN ACCESS SHARE MODE
FATAL:  database "posrgres" does not exist
ERROR:  could not open relation with OID 16399
STATEMENT:  SELECT COUNT(*) FROM table;
ERROR:  syntax error at or near ";" at character 9
STATEMENT:  REINDEX
       ;
ERROR:  syntax error at or near "table" at character 9
STATEMENT:  REINDEX table;
ERROR:  syntax error at or near "help" at character 1
STATEMENT:  help REINDEX;
ERROR:  could not open relation with OID 16399
STATEMENT:  REINDEX TABLE table;
ERROR:  syntax error at or near ";" at character 17
STATEMENT:  REINDEX DATABASE;
NOTICE:  table "pg_class" was reindexed
NOTICE:  table "sql_sizing" was reindexed
NOTICE:  table "sql_sizing_profiles" was reindexed
NOTICE:  table "sql_features" was reindexed
NOTICE:  table "sql_implementation_info" was reindexed
NOTICE:  table "sql_languages" was reindexed
NOTICE:  table "sql_packages" was reindexed
NOTICE:  table "sql_parts" was reindexed
NOTICE:  table "pg_statistic" was reindexed
NOTICE:  table "pg_type" was reindexed
NOTICE:  table "pg_attribute" was reindexed
NOTICE:  table "pg_proc" was reindexed
NOTICE:  table "pg_autovacuum" was reindexed
NOTICE:  table "pg_attrdef" was reindexed
NOTICE:  table "pg_constraint" was reindexed
NOTICE:  table "pg_inherits" was reindexed
ERROR:  could not open relation with OID 16389
STATEMENT:  REINDEX DATABASE postgres;
NOTICE:  table "pg_class" was reindexed
NOTICE:  table "pg_statistic" was reindexed
NOTICE:  table "pg_type" was reindexed
NOTICE:  table "pg_attribute" was reindexed
NOTICE:  table "pg_proc" was reindexed
NOTICE:  table "pg_autovacuum" was reindexed
NOTICE:  table "pg_attrdef" was reindexed
NOTICE:  table "pg_constraint" was reindexed
NOTICE:  table "pg_inherits" was reindexed
NOTICE:  table "pg_index" was reindexed
NOTICE:  table "pg_operator" was reindexed
NOTICE:  table "pg_opclass" was reindexed
NOTICE:  table "pg_am" was reindexed
NOTICE:  table "pg_amop" was reindexed
NOTICE:  table "pg_amproc" was reindexed
NOTICE:  table "pg_language" was reindexed
NOTICE:  table "pg_largeobject" was reindexed
NOTICE:  table "pg_aggregate" was reindexed
NOTICE:  table "pg_rewrite" was reindexed
NOTICE:  table "pg_trigger" was reindexed
NOTICE:  table "pg_description" was reindexed
NOTICE:  table "pg_cast" was reindexed
NOTICE:  table "pg_namespace" was reindexed
NOTICE:  table "pg_convtableion" was reindexed
NOTICE:  table "pg_depend" was reindexed
NOTICE:  table "pg_class" was reindexed
NOTICE:  table "sql_sizing" was reindexed
NOTICE:  table "sql_sizing_profiles" was reindexed
NOTICE:  table "sql_features" was reindexed
NOTICE:  table "sql_implementation_info" was reindexed
NOTICE:  table "sql_languages" was reindexed
NOTICE:  table "sql_packages" was reindexed
NOTICE:  table "sql_parts" was reindexed
NOTICE:  table "pg_statistic" was reindexed
NOTICE:  table "pg_type" was reindexed
NOTICE:  table "pg_attribute" was reindexed
NOTICE:  table "pg_proc" was reindexed
NOTICE:  table "pg_autovacuum" was reindexed
NOTICE:  table "pg_attrdef" was reindexed
NOTICE:  table "pg_constraint" was reindexed
NOTICE:  table "pg_inherits" was reindexed
ERROR:  could not open relation with OID 16389
STATEMENT:  REINDEX DATABASE postgres;
ERROR:  could not open relation with OID 16393
STATEMENT:  SELECT COUNT(*) FROM picphone;
LOG:  received smart shutdown request
LOG:  shutting down
LOG:  database system is shut down
LOG:  database system was shut down at 2008-02-25 22:03:28 PST
LOG:  checkpoint record is at 1/FB15C10
LOG:  redo record is at 1/FB15C10; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 0/1348953; next OID: 16544
LOG:  next MultiXactId: 1; next MultiXactOffset: 0
LOG:  database system is ready
ERROR:  could not open relation with OID 16399
STATEMENT:  SELECT COUNT(*) FROM table;



Re: How to reindex when unable to open relation?

От
"Shoaib Mir"
Дата:
On Tue, Feb 26, 2008 at 5:13 PM, Darren Reed <darrenr+postgres@fastmail.net> wrote:
Inserts into my table started generating an error message (see below),
and there was a log message that suggested I reindex.

postgres seems to start and stop ok, but no operations on the tables
seem to work.

l# /usr/pkg/bin/pg_dump -s -U postgres
pg_dump: failed sanity check, parent table OID 16401 of pg_rewrite entry
OID 16403 not found

Is this recoverable without using a backup?



Looks like to me corrupted system catalog!

I guess when you dropped the table it didn't delete the record from pg_rewrite was not deleted when the parent table was dropped. So you can try doing something:

delete from pg_rewrite where oid = 16403;

And see if this can fix the problem..

--
Shoaib Mir
Fujitsu Australia Software Technology
shoaibm[@]fast.fujitsu.com.au

Re: How to reindex when unable to open relation?

От
Tom Lane
Дата:
"Shoaib Mir" <shoaibmir@gmail.com> writes:
> On Tue, Feb 26, 2008 at 5:13 PM, Darren Reed <darrenr+postgres@fastmail.net>
> wrote:
>> pg_dump: failed sanity check, parent table OID 16401 of pg_rewrite entry
>> OID 16403 not found
>>
>> Is this recoverable without using a backup?

> Looks like to me corrupted system catalog!

Yah.

> I guess when you dropped the table it didn't delete the record from
> pg_rewrite was not deleted when the parent table was dropped. So you can try
> doing something:
> delete from pg_rewrite where oid = 16403;

That's not gonna help for these:

>> ERROR:  could not open relation with OID 16399
>> ERROR:  index "pg_attrdef_adrelid_adnum_index" contains unexpected zero
>> page at block 1
>> ERROR:  could not open relation with OID 16389
>> ERROR:  could not open relation with OID 16393

I'm afraid Darren's DB has taken multiple mortal hits --- the evidence
here suggests that at least one pg_class page and at least one
pg_attrdef_adrelid_adnum_index page have been silently zeroed out by
*something*.  The index could be reconstituted by REINDEX, if its
underlying table is undamaged; but no amount of reindexing will bring
back missing table rows :-(

There's no info here to suggest exactly what went wrong; Darren, do you
want to fess up to any hardware problems, tripped-over power cords, or
such?  What PG version is this anyway?

            regards, tom lane

Re: How to reindex when unable to open relation?

От
Darren Reed
Дата:
Tom Lane wrote:
> ...
> There's no info here to suggest exactly what went wrong; Darren, do you
> want to fess up to any hardware problems, tripped-over power cords, or
> such?  What PG version is this anyway?
>

This is 8.2.6... and I'm starting to believe the hardware is just borked,
with all of the other troubles I've had.

In an earlier bringup of the database, I received an error intended for
Linux kernels on a BSD box about a page not being zero...

Darren


Re: How to reindex when unable to open relation?

От
Tom Lane
Дата:
Darren Reed <darrenr+postgres@fastmail.net> writes:
> Tom Lane wrote:
>> ...
>> There's no info here to suggest exactly what went wrong; Darren, do you
>> want to fess up to any hardware problems, tripped-over power cords, or
>> such?  What PG version is this anyway?

> This is 8.2.6... and I'm starting to believe the hardware is just borked,
> with all of the other troubles I've had.

Well, there are no known data-loss bugs in 8.2.6, with of course the
operative word being "known" ...

> In an earlier bringup of the database, I received an error intended for
> Linux kernels on a BSD box about a page not being zero...

Yeah, I know exactly which message you mean; it was a defense against a
bug that was present for a relatively short time in a few Linux kernel
versions, and could only be exposed on multi-CPU hardware even so.
If you're seeing that and other problems on a BSD kernel then I agree
that hardware issues seem like a likely explanation.  Have you tried
running memtest86 and any other hardware diagnostics you can lay your
hands on?

            regards, tom lane

Re: How to reindex when unable to open relation?

От
Darren Reed
Дата:
Shoaib Mir wrote:
> On Tue, Feb 26, 2008 at 5:13 PM, Darren Reed
> <darrenr+postgres@fastmail.net <mailto:darrenr+postgres@fastmail.net>>
> wrote:
>
>     Inserts into my table started generating an error message (see below),
>     and there was a log message that suggested I reindex.
>
>     postgres seems to start and stop ok, but no operations on the tables
>     seem to work.
>
>     l# /usr/pkg/bin/pg_dump -s -U postgres
>     pg_dump: failed sanity check, parent table OID 16401 of pg_rewrite
>     entry
>     OID 16403 not found
>
>     Is this recoverable without using a backup?
>
>
>
> Looks like to me corrupted system catalog!
>
> I guess when you dropped the table it didn't delete the record from
> pg_rewrite was not deleted when the parent table was dropped. So you
> can try doing something:

...

I didn't drop the table!

This happened amidst a script doing inserts!

postgres=# delete from pg_rewrite where oid = 16403;
DELETE 1
postgres=# \q
firewall# !?stop
/etc/rc.d/postgres stop
waiting for server to shut down.... done
server stopped
firewall# /etc/rc.d/postgres start
server starting
firewall# /usr/pkg/bin/psql -U postgres
psql: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

Hmmm, from the logfile, signal 11, so it core dumped...
Of course the default built binary is -O2...
(gdb) where
#0  0x082114b8 in RelationCacheInitializePhase2 ()
#1  0x0821fa67 in InitPostgres ()
#2  0x081a7d8b in PostgresMain ()
#3  0x08184882 in ServerLoop ()
#4  0x0818545c in PostmasterMain ()
#5  0x0814be1f in main ()

Darren

LOG:  received smart shutdown request
LOG:  shutting down
LOG:  database system is shut down
LOG:  database system was shut down at 2008-02-25 22:42:13 PST
LOG:  checkpoint record is at 1/FB17328
LOG:  redo record is at 1/FB17328; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 0/1348963; next OID: 16544
LOG:  next MultiXactId: 1; next MultiXactOffset: 0
LOG:  database system is ready
LOG:  server process (PID 21040) was terminated by signal 11
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted at 2008-02-25 22:42:16 PST
LOG:  checkpoint record is at 1/FB17328
LOG:  redo record is at 1/FB17328; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 0/1348963; next OID: 16544
LOG:  next MultiXactId: 1; next MultiXactOffset: 0
LOG:  database system was not properly shut down; automatic recovery in
progress
LOG:  record with zero length at 1/FB17370
LOG:  redo is not required
LOG:  database system is ready
LOG:  server process (PID 26412) was terminated by signal 11
LOG:  terminating any other active server processes
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted at 2008-02-25 22:42:22 PST
LOG:  checkpoint record is at 1/FB17370
LOG:  redo record is at 1/FB17370; undo record is at 0/0; shutdown TRUE
LOG:  next transaction ID: 0/1348963; next OID: 16544
LOG:  next MultiXactId: 1; next MultiXactOffset: 0
LOG:  database system was not properly shut down; automatic recovery in
progress
LOG:  record with zero length at 1/FB173B8
LOG:  redo is not required
LOG:  database system is ready


Re: How to reindex when unable to open relation?

От
Tom Lane
Дата:
Darren Reed <darrenr+postgres@fastmail.net> writes:
> Hmmm, from the logfile, signal 11, so it core dumped...
> Of course the default built binary is -O2...
> (gdb) where
> #0  0x082114b8 in RelationCacheInitializePhase2 ()
> #1  0x0821fa67 in InitPostgres ()
> #2  0x081a7d8b in PostgresMain ()
> #3  0x08184882 in ServerLoop ()
> #4  0x0818545c in PostmasterMain ()
> #5  0x0814be1f in main ()

Hm, if you are really lucky, this is because of a corrupt
pg_internal.init file.  Fortunately for you, that's just a cache;
try removing (all of the) pg_internal.init files and see if the
DB will start.

            regards, tom lane

Re: How to reindex when unable to open relation?

От
Darren Reed
Дата:
Tom Lane wrote:
> Darren Reed <darrenr+postgres@fastmail.net> writes:
> > Hmmm, from the logfile, signal 11, so it core dumped...
> > Of course the default built binary is -O2...
> > (gdb) where
> > #0  0x082114b8 in RelationCacheInitializePhase2 ()
> > #1  0x0821fa67 in InitPostgres ()
> > #2  0x081a7d8b in PostgresMain ()
> > #3  0x08184882 in ServerLoop ()
> > #4  0x0818545c in PostmasterMain ()
> > #5  0x0814be1f in main ()
>
> Hm, if you are really lucky, this is because of a corrupt
> pg_internal.init file.  Fortunately for you, that's just a cache;
> try removing (all of the) pg_internal.init files and see if the
> DB will start.
>

How do I know which files those are?

I see nothing named that although grep suggests they did exist...
# grep pg_internal.init */*
Binary file base/1 matches
Binary file base/10818 matches
Binary file base/10819 matches

Darren


Re: How to reindex when unable to open relation?

От
Tom Lane
Дата:
Darren Reed <darrenr+postgres@fastmail.net> writes:
> Tom Lane wrote:
>> Hm, if you are really lucky, this is because of a corrupt
>> pg_internal.init file.

> How do I know which files those are?

find $PGDATA -name pg_internal.init

            regards, tom lane

Re: How to reindex when unable to open relation?

От
Darren Reed
Дата:
Tom Lane wrote:
> Darren Reed <darrenr+postgres@fastmail.net> writes:
> > Tom Lane wrote:
> >> Hm, if you are really lucky, this is because of a corrupt
> >> pg_internal.init file.
>
> > How do I know which files those are?
>
> find $PGDATA -name pg_internal.init
>

Doesn't exist.

Darren


Re: How to reindex when unable to open relation?

От
Tom Lane
Дата:
Darren Reed <darrenr+postgres@fastmail.net> writes:
>>> How do I know which files those are?
>>
>> find $PGDATA -name pg_internal.init

> Doesn't exist.

Sucks to be you, then :-(

I'm curious though exactly where the failure is, because there's not
much in RelationCacheInitializePhase2 that looks like it could crash,
other than the init-file reading (I assume load_relcache_init_file
got inlined into RelationCacheInitializePhase2).  Do you want to try
rebuilding the backend with --enable-debug so we can get a better
traceback?

            regards, tom lane

Re: How to reindex when unable to open relation?

От
Darren Reed
Дата:
Tom Lane wrote:
> Darren Reed <darrenr+postgres@fastmail.net> writes:
> >>> How do I know which files those are?
> >>
> >> find $PGDATA -name pg_internal.init
>
> > Doesn't exist.
>
> Sucks to be you, then :-(
>
> I'm curious though exactly where the failure is, because there's not
> much in RelationCacheInitializePhase2 that looks like it could crash,
> other than the init-file reading (I assume load_relcache_init_file
> got inlined into RelationCacheInitializePhase2).  Do you want to try
> rebuilding the backend with --enable-debug so we can get a better
> traceback?
>

#0  0x08298378 in RelationCacheInitializePhase2 () at relcache.c:2394
2394                    LOAD_CRIT_INDEX(AttributeRelidNumIndexId);
(gdb) where
#0  0x08298378 in RelationCacheInitializePhase2 () at relcache.c:2394
#1  0x082ac19c in InitPostgres (dbname=0x83ba450 "postgres",
    username=0x83c1428 "postgres") at postinit.c:459
#2  0x0820b533 in PostgresMain (argc=4, argv=0x83c1458,
    username=0x83c1428 "postgres") at postgres.c:3143
#3  0x081dc572 in BackendRun (port=0x83bc800) at postmaster.c:2934
#4  0x081dbb7e in BackendStartup (port=0x83bc800) at postmaster.c:2561
#5  0x081d985c in ServerLoop () at postmaster.c:1214
#6  0x081d9292 in PostmasterMain (argc=3, argv=0xbfbfeb14) at
postmaster.c:966
#7  0x08187989 in main (argc=3, argv=0xbfbfeb14) at main.c:188
(gdb) p ird
$1 = (Relation) 0x0
(gdb) p oldcxt
$2 = (MemoryContext) 0x83de258
(gdb) p needNewCacheFile
$3 = 1 '\001'
(gdb) p status
$4 = {hashp = 0x8, curBucket = 2, curEntry = 0x8360777}
(gdb) p idhentry
$5 = (RelIdCacheEnt *) 0x836b168

Darren


Re: How to reindex when unable to open relation?

От
Tom Lane
Дата:
Darren Reed <darrenr+postgres@fastmail.net> writes:
> Tom Lane wrote:
>> I'm curious though exactly where the failure is, because there's not
>> much in RelationCacheInitializePhase2 that looks like it could crash,

> #0  0x08298378 in RelationCacheInitializePhase2 () at relcache.c:2394
> 2394                    LOAD_CRIT_INDEX(AttributeRelidNumIndexId);
> (gdb) where
> #0  0x08298378 in RelationCacheInitializePhase2 () at relcache.c:2394
> #1  0x082ac19c in InitPostgres (dbname=0x83ba450 "postgres",
>     username=0x83c1428 "postgres") at postinit.c:459

Hmph.  AFAICS the only possible way it could dump core right there is
if RelationBuildDesc returned NULL or a bogus pointer.  And
RelationBuildDesc does have a NULL return defined: if it couldn't find
a pg_class row for the relation.  (So we probably ought to add an
explicit error test to LOAD_CRIT_INDEX --- surprising it hasn't been
seen before.)  Anyway it seems that the problem is you are missing
the pg_class row for pg_attribute_relid_attnum_index.  And possibly
other stuff too; this crash is mighty early in initialization and
only a couple of core tables have been touched yet.  What I fear
is that one or more whole pages of pg_class have been lost --- either
zeroed out or corrupted in a way that heapscan doesn't notice.  You
might be able to learn something from pg_filedump if you're interested
in pursuing this for forensic purposes, but I'm afraid the odds of
getting anything back from this database are mighty low.

            regards, tom lane