Обсуждение: can't start postgresql
Hi All, I've got a problem w/ one of pgsql installations. It can't start: [postgres@db ~]$ /usr/pgsql/bin/postgres -D /usr/pgsql/data PANIC: failed to re-find parent key in "23724" Aborted This is PostgreSQL 8.1.5 compiled from sources, on Fedora Core4. Originally it was 8.1.2, and the bug appeared on v8.1.2; I've compiled and installed v8.1.5 over it. Autovacuum is enabled. How can I fix this ? -- Best Regards, Igor Shevchenko
On Tue, 14 Nov 2006 02:23:39 +0200 Igor Shevchenko <igor@carcass.ath.cx> wrote: Try with: > [postgres@db ~]$ /usr/pgsql/bin/initdb -D /usr/pgsql/data Bye Enrico -- If Bill Gates had a penny for everytime Windows crashed,he'd be a multi-billionaire by now .......oh look, he already is!!!! scotty@linuxtime.it - Skype:sscotty71 http://www.linuxtime.it/enricopirozzi
On Tuesday 14 November 2006 10:52, Enrico wrote: > Try with: > > [postgres@db ~]$ /usr/pgsql/bin/initdb -D /usr/pgsql/data Thanks, but this is not working - [postgres@db ~]$ /usr/pgsql/bin/initdb -D /usr/pgsql/data The files belonging to this database system will be owned by user "postgres". This user must also own the server process. The database cluster will be initialized with locale C. initdb: directory "/usr/pgsql/data" exists but is not empty If you want to create a new database system, either remove or empty the directory "/usr/pgsql/data" or run initdb with an argument other than "/usr/pgsql/data". -- Best Regards, Igor Shevchenko
> Thanks, but this is not working - > > [postgres@db ~]$ /usr/pgsql/bin/initdb -D /usr/pgsql/data > The files belonging to this database system will be owned by user "postgres". > This user must also own the server process. > > The database cluster will be initialized with locale C. > > initdb: directory "/usr/pgsql/data" exists but is not empty > If you want to create a new database system, either remove or empty > the directory "/usr/pgsql/data" or run initdb > with an argument other than "/usr/pgsql/data". of course if want to use /usr/pgsql/bin/initdb -D /usr/pgsql/data directory /usr/pgsql/data must be empty and user postgres must be the owner, but I don't understand, what is your problem? Enrico -- If Bill Gates had a penny for everytime Windows crashed,he'd be a multi-billionaire by now .......oh look, he already is!!!! scotty@linuxtime.it - Skype:sscotty71 http://www.linuxtime.it/enricopirozzi
He's just trying to do what you told him. But the use of initdb won't correct the problem Igor got. There's something wrong with his database so that he can't start the postmaster process: > PANIC: failed to re-find parent key in "23724" Unfortunately i don't have any idea how to solve this issue :( -- Matthias > -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Enrico > Sent: Tuesday, November 14, 2006 4:22 PM > To: pgsql-general@postgresql.org > Subject: Re: [GENERAL] can't start postgresql > > > > > Thanks, but this is not working - > > > > [postgres@db ~]$ /usr/pgsql/bin/initdb -D /usr/pgsql/data > > The files belonging to this database system will be owned > by user "postgres". > > This user must also own the server process. > > > > The database cluster will be initialized with locale C. > > > > initdb: directory "/usr/pgsql/data" exists but is not empty > > If you want to create a new database system, either remove or empty > > the directory "/usr/pgsql/data" or run initdb > > with an argument other than "/usr/pgsql/data". > > of course if want to use /usr/pgsql/bin/initdb -D /usr/pgsql/data > directory /usr/pgsql/data must be empty and user postgres > must be the owner, > but I don't understand, what is your problem? > > Enrico >
Matthias.Pitzl@izb.de wrote: > He's just trying to do what you told him. > But the use of initdb won't correct the problem Igor got. There's something > wrong with his database so that he can't start the postmaster process: > > PANIC: failed to re-find parent key in "23724" > > Unfortunately i don't have any idea how to solve this issue :( I'd think starting a standalone backend and issuing a "reindex database" should be enough to get him started. Now, the problem is figuring _how_ the index got in that state; or even _what_ index is the problematic one. (I think it would be possible to find out by setting the "log_error_verbosity" parameter to "verbose"). If it's a hardware problem, one would think it deserves some diagnosis. It could be the btree bug Tom fixed last week, but I'm not sure if this is really a consequence of it. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 11/13/06, Igor Shevchenko <igor@carcass.ath.cx> wrote: > Hi All, > > I've got a problem w/ one of pgsql installations. It can't start: > > [postgres@db ~]$ /usr/pgsql/bin/postgres -D /usr/pgsql/data > PANIC: failed to re-find parent key in "23724" > Aborted > > This is PostgreSQL 8.1.5 compiled from sources, on Fedora Core4. Originally it > was 8.1.2, and the bug appeared on v8.1.2; I've compiled and installed v8.1.5 > over it. Autovacuum is enabled. is this a fresh installation? or do you have an existing database here that crashed or had some other problem? merlin
Hi Merlin, On Tuesday 14 November 2006 17:36, you wrote: > On 11/13/06, Igor Shevchenko <igor@carcass.ath.cx> wrote: > > I've got a problem w/ one of pgsql installations. It can't start: > > > > [postgres@db ~]$ /usr/pgsql/bin/postgres -D /usr/pgsql/data > > PANIC: failed to re-find parent key in "23724" > > Aborted > > > > This is PostgreSQL 8.1.5 compiled from sources, on Fedora Core4. > > Originally it was 8.1.2, and the bug appeared on v8.1.2; I've compiled > > and installed v8.1.5 over it. Autovacuum is enabled. > > is this a fresh installation? or do you have an existing database here > that crashed or had some other problem? Yes it was a working installation, 32g of data, on v8.1.2. Pgsql wouldn't come up, so I've upgraded it to v8.1.5 (from sources), and tried to start it in a single-process mode, with no luck - it was still failing with the same, abovementioned error message. We had a backup and had already restored from it, but I still have the crashed db saved elsewhere, for testing/debugging. -- Best Regards, Igor Shevchenko
On Tuesday 14 November 2006 17:34, Alvaro Herrera wrote:
> Matthias.Pitzl@izb.de wrote:
> > He's just trying to do what you told him.
> > But the use of initdb won't correct the problem Igor got. There's
> > something
> >
> > wrong with his database so that he can't start the postmaster process:
> > > PANIC:  failed to re-find parent key in "23724"
> >
> > Unfortunately i don't have any idea how to solve this issue :(
>
> I'd think starting a standalone backend and issuing a "reindex database"
> should be enough to get him started.  Now, the problem is figuring _how_
> the index got in that state; or even _what_ index is the problematic
> one.  (I think it would be possible to find out by setting the
> "log_error_verbosity" parameter to "verbose").
The problem is, it won't start even in the standalone mode. It prints the same
error message and dies. I was trying to start it connected to my main db,
template0, template1 and postgres, no change. The index corruption (assuming
this IS an index corruption) must've happened with system table(s). Is there
any way to diagose that ? Maybe reindex a specific table without starting
postmaster, or remove certain index files physically (e.g. for pg_class,
etc) ?
> If it's a hardware problem, one would think it deserves some diagnosis.
> It could be the btree bug Tom fixed last week, but I'm not sure if this
> is really a consequence of it.
Can't rule out an HW problem. I'd think this IS an HW problem, but we've
started using an updated version of our software on this server a week ago,
and it creates/drops alot of tables, temporary and not (kind of in-house
table partitioning), so this could be an (exceptionally rare) pgsql bug with
pg_class / pg_index / etc handling. A side question: is it safe to
create/drop tables from pl/pgsql functions, called directly or via a
trigger ?
Our previous version of the software was using a light version of what we're
trying to do now, and there was / maybe still a problem:
We're processing data in chunks, and each chunk is completely separate from
others, so using a special table per chunk seems to be a natural thing to do.
We keep these in a separate tablespace "w". There's one non-unique
(integer,varchar) index per each such table; both fields are non-null. Both
table and index are created via a pl/pgsql function. An index is created
after the data import, during the same transaction:
a) begin
b) select create_w_table(jobid,false) -- false means don't create index
c) insert into w.tableX select *....
d) select create_w_table(jobid,true) -- true means create index
e) commit
Both table and index are dropped at the same time, using another pl/pgsql
function.
create_w_table is rarely called from yet another pl/pgsql function.
Everything's working fine, except that sometimes an index can, um, "go off the
rails":
* not listed by "psql"'s \d w.tableX
* queries are not using it
* it's still listed in the pg_class (seen by "select * from pg_class where
relname='tblX_idx'")
* a corresponding "pg_index" row references a non-existing pg_class entry via
pg_index.indrelid
I'm seeing this several times a week, for over 5 months now. Here's my fix
procedure:
* fetch a list of probematic indexes:
select pc.relname,pc.oid,pc_idx.oid from pg_class pc left join pg_class pc_idx
on (pc_idx.relname=pc.relname || '_idx') where pc.relname ~ '^table\\d+$' and
(pc_idx.oid is NULL OR not exists (select 1 from pg_index where
pg_index.indexrelid=pc_idx.oid and indrelid=pc.oid))
* fix each with:
update pg_index set indrelid=${right one from pg_class} where
indexrelid=${index oid from pg_class};
update pg_class set relhasindex='t' where oid=${table oid}; -- AFAICR this is
redundant, as relhasindex is still true
now I can drop this index (usually it's out of sync), vacuum the table and
recreate index:
drop index w.tableX_idx;
vacuum full verbose w.tableX;
create index tableX_idx on w.tableX(...);
analyze w.tableX;
I've seen this on pgsql versions from v8.1.2 till v8.1.4, can't confirm with
v8.1.5 yet. Autovacuum's on; daily "vacuum verbose analyze" and "reindex" are
ran over pg_class, pg_index, a bunch of our own small tables, and recently
over pg_depend, pg_type and pg_statistic.
--
Best Regards,
Igor Shevchenko
			
		On Tue, 14 Nov 2006 21:26:02 +0200, Igor Shevchenko wrote: >> I'd think starting a standalone backend and issuing a "reindex database" >> should be enough to get him started. Now, the problem is figuring _how_ >> the index got in that state; or even _what_ index is the problematic >> one. (I think it would be possible to find out by setting the >> "log_error_verbosity" parameter to "verbose"). > > The problem is, it won't start even in the standalone mode. It prints the same > error message and dies. I was trying to start it connected to my main db, > template0, template1 and postgres, no change. The index corruption (assuming > this IS an index corruption) must've happened with system table(s). Is there > any way to diagose that ? Maybe reindex a specific table without starting > postmaster, or remove certain index files physically (e.g. for pg_class, > etc) ? I think you want to use -P to ignore the indexes on startup: postgres -D /usr/local/pgsql/data -P my_database Does that help - then you might get the chance to do a REINDEX? Regards John
On Wednesday 15 November 2006 01:41, you wrote: > On Tue, 14 Nov 2006 21:26:02 +0200, Igor Shevchenko wrote: > >> I'd think starting a standalone backend and issuing a "reindex database" > >> should be enough to get him started. Now, the problem is figuring _how_ > >> the index got in that state; or even _what_ index is the problematic > >> one. (I think it would be possible to find out by setting the > >> "log_error_verbosity" parameter to "verbose"). > > > > The problem is, it won't start even in the standalone mode. It prints the > > same error message and dies. I was trying to start it connected to my > > main db, template0, template1 and postgres, no change. The index > > corruption (assuming this IS an index corruption) must've happened with > > system table(s). Is there any way to diagose that ? Maybe reindex a > > specific table without starting postmaster, or remove certain index files > > physically (e.g. for pg_class, etc) ? > > I think you want to use -P to ignore the indexes on startup: > > postgres -D /usr/local/pgsql/data -P my_database > > Does that help - then you might get the chance to do a REINDEX? Thanks for the pointer! I thought it would help, but to my surprise, it didn't. Im still seeing this : [postgres@db ~]$ /backup/pgsql/bin/postgres -D /backup/pgsql/data -P template1 PANIC: failed to re-find parent key in "23724" Aborted -- Best Regards, Igor Shevchenko
On Tuesday 14 November 2006 17:21, Enrico wrote: > > Thanks, but this is not working - > > > > [postgres@db ~]$ /usr/pgsql/bin/initdb -D /usr/pgsql/data > > The files belonging to this database system will be owned by user > > "postgres". This user must also own the server process. > > > > The database cluster will be initialized with locale C. > > > > initdb: directory "/usr/pgsql/data" exists but is not empty > > If you want to create a new database system, either remove or empty > > the directory "/usr/pgsql/data" or run initdb > > with an argument other than "/usr/pgsql/data". > > of course if want to use /usr/pgsql/bin/initdb -D /usr/pgsql/data > directory /usr/pgsql/data must be empty and user postgres must be the > owner, but I don't understand, what is your problem? The problem is that the db was not empty (32g of data). -- Best Regards, Igor Shevchenko
Igor Shevchenko <igor@carcass.ath.cx> writes: > I've got a problem w/ one of pgsql installations. It can't start: > [postgres@db ~]$ /usr/pgsql/bin/postgres -D /usr/pgsql/data > PANIC: failed to re-find parent key in "23724" > Aborted Try applying this patch: http://archives.postgresql.org/pgsql-committers/2006-11/msg00004.php regards, tom lane