after using pg_resetxlog, db lost

От: Shea,Dan [CIS]
Тема: after using pg_resetxlog, db lost
Дата: ,
Msg-id: 644D07D3D59D8F408CD01AC2F833D8C62B9209@cisxa.cmc.int.ec.gc.ca
(см: обсуждение, исходный текст)
Ответы: Re: after using pg_resetxlog, db lost  (Tom Lane)
Список: pgsql-performance

Скрыть дерево обсуждения

after using pg_resetxlog, db lost  ("Shea,Dan [CIS]", )
 Re: after using pg_resetxlog, db lost  (Tom Lane, )
  Re: after using pg_resetxlog, db lost  (Richard Huxton, )
   Re: after using pg_resetxlog, db lost  (Tom Lane, )
 Re: after using pg_resetxlog, db lost  ("Shea,Dan [CIS]", )
  Re: after using pg_resetxlog, db lost  (Tom Lane, )
 Re: after using pg_resetxlog, db lost  ("Shea,Dan [CIS]", )
  Re: after using pg_resetxlog, db lost  (Tom Lane, )
 Re: after using pg_resetxlog, db lost  ("Shea,Dan [CIS]", )

The pg_resetxlog was run as root. It caused ownership problems of
pg_control and xlog files.
Now we have no access to the data now through psql.  The data is still
there under /var/lib/pgsql/data/base/17347  (PWFPM_DEV DB name).  But
there is no reference to 36 of our tables in pg_class.  Also the 18
other tables that are reported in this database have no data in them.
Is there anyway to have the database resync or make it aware of the data
under /var/lib/pgsql/data/base/17347?
How can this problem be resolved?

There is actually 346 db files adding up to 134 GB in this database.


Below are error messages of when the database trying to be started.  I
am not sure of the when pg_resetxlog was run.  I suspect it was run to
get rid ot the "invalid primary checkpoint record".

The postgresql DB had an error trying to be started up.
The error was
Jun 22 13:17:53 murphy postgres[27430]: [4-1] LOG:  invalid primary
checkpoint record
Jun 22 13:17:53 murphy postgres[27430]: [5-1] LOG:  could not open file
"/var/lib/pgsql/data/pg_xlog/0000000000000000" (log file 0, segment 0):
No such file or directory
Jun 22 13:18:49 murphy postgres[28778]: [6-1] LOG:  invalid secondary
checkpoint record
Jun 22 13:18:49 murphy postgres[28778]: [7-1] PANIC:  could not locate a
valid checkpoint record


Jun 22 13:26:01 murphy postgres[30770]: [6-1] LOG:  database system is
ready
Jun 22 13:26:02 murphy postgresql: Starting postgresql service:
succeeded
Jun 22 13:26:20 murphy postgres[30789]: [2-1] PANIC:  could not access
status of transaction 553
Jun 22 13:26:20 murphy postgres[30789]: [2-2] DETAIL:  could not open
file "/var/lib/pgsql/data/pg_clog/0000": No such file or directory
Jun 22 13:26:20 murphy postgres[30789]: [2-3] STATEMENT:  COMMIT

and
Jun 22 13:26:20 murphy postgres[30791]: [10-1] LOG:  redo starts at
0/2000050
Jun 22 13:26:20 murphy postgres[30791]: [11-1] LOG:  file
"/var/lib/pgsql/data/pg_clog/0000" doesn't exist, reading as zeroes
Jun 22 13:26:20 murphy postgres[30791]: [12-1] LOG:  record with zero
length at 0/2000E84
Jun 22 13:26:20 murphy postgres[30791]: [13-1] LOG:  redo done at
0/2000E60
Jun 22 13:26:20 murphy postgres[30791]: [14-1] WARNING:  xlog flush
request 213/7363F354 is not satisfied --- flushed only to 0/2000E84
Jun 22 13:26:20 murphy postgres[30791]: [14-2] CONTEXT:  writing block
840074 of relation 17347/356768772
Jun 22 13:26:20 murphy postgres[30791]: [15-1] WARNING:  xlog flush
request 213/58426648 is not satisfied --- flushed only to 0/2000E84

and
Jun 22 13:38:23 murphy postgres[1460]: [2-1] ERROR:  xlog flush request
210/E757F150 is not satisfied --- flushed only to 0/2074CA0
Jun 22 13:38:23 murphy postgres[1460]: [2-2] CONTEXT:  writing block
824605 of relation 17347/356768772

We are using a san for our storage device.


В списке pgsql-performance по дате сообщения:

От: Tom Lane
Дата:
Сообщение: Re: after using pg_resetxlog, db lost
От: Joe Conway
Дата:
Сообщение: Re: postgresql and openmosix migration