Обсуждение: Corrupted DB? could not open file pg_clog/####

Поиск
Список
Период
Сортировка

Corrupted DB? could not open file pg_clog/####

От
Francisco Reyes
Дата:
Looking at archives seem to indicate missing pg_clog files is some form
of row or page corruption.

In an old thread from back in 2003 Tom Lane recommended
(http://tinyurl.com/jushf):

>>If you want to try to narrow down where the corruption is, you can
>>experiment with commands like
>>select ctid,* from big_table offset N limit 1;

Is that still a valid suggestion?
How do I know the possible maximun value for offset to try for each table?

If I have logs turned on.. at which level will the eror show? I am only
aware of the problem, because an application connected to postgrseql had the
errors in it's logs, but not seeing anything in the postgresql logs
themselves.

Just tried a pg_dump and got the
could not open file "pg_clog/0000"
error.

The file pg_clog/0000 is missing.
Looking at another thread (http://tinyurl.com/feyye) I see that the file can
be created as 256K worth of zeroes. If I do this.. will operations resume
normally? Is there a way to tell if any data was lost?

Re: Corrupted DB? could not open file pg_clog/####

От
Martijn van Oosterhout
Дата:
On Sun, Jul 30, 2006 at 01:31:14AM -0400, Francisco Reyes wrote:
> Looking at archives seem to indicate missing pg_clog files is some form
> of row or page corruption.
>
> In an old thread from back in 2003 Tom Lane recommended
> (http://tinyurl.com/jushf):
>
> >>If you want to try to narrow down where the corruption is, you can
> >>experiment with commands like
> >>select ctid,* from big_table offset N limit 1;
>
> Is that still a valid suggestion?
> How do I know the possible maximun value for offset to try for each table?

It's still a reasonable suggestion. The maximum offset is the number of
rows in the table. You'll notice when the output is empty. Do you have
an idea how much data it contains?

> If I have logs turned on.. at which level will the eror show? I am only
> aware of the problem, because an application connected to postgrseql had
> the errors in it's logs, but not seeing anything in the postgresql logs
> themselves.

If you connect via psql, you can set the client_min_message to DEBUG to
get some more stuff.

> Just tried a pg_dump and got the
> could not open file "pg_clog/0000"
> error.
>
> The file pg_clog/0000 is missing.
> Looking at another thread (http://tinyurl.com/feyye) I see that the file
> can be created as 256K worth of zeroes. If I do this.. will operations
> resume normally? Is there a way to tell if any data was lost?

Well, it will stop complaining. What will happen is that any
transactions involving those transaction IDs will be assumed to have
been committed. This may be ok, but in extreme cases it could lead to
broken constraints.

However, pg_clog/0000 is the very first transaction file created. Is it
in the range of the files that do do exist? Are you sure some other
process didn't remove it accedently?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Вложения

Re: Corrupted DB? could not open file pg_clog/####

От
Francisco Reyes
Дата:
Martijn van Oosterhout writes:

> It's still a reasonable suggestion. The maximum offset is the number of
> rows in the table. You'll notice when the output is empty.

Once I find the point where the output is empty then what?

> Do you have
> an idea how much data it contains?

Yes. Around 87 million rows.


> If you connect via psql, you can set the client_min_message to DEBUG to
> get some more stuff.

How?
Tried set client_min_message='DEBUG';

> Well, it will stop complaining. What will happen is that any
> transactions involving those transaction IDs will be assumed to have
> been committed.

If the pg_clog files are to keep track of transactions, shouldn't a "pg_ctl
restart"  rollback all pending transactions.. so there are no pending
transactions upon the restart and this error should not appear again?
Using 8.1.4

> This may be ok, but in extreme cases it could lead to
> broken constraints.

That exteme case sounds pretty bad. :-(
Will it be safe to do after a restart? After all there should not be any
transactions..

> However, pg_clog/0000 is the very first transaction file created. Is it
> in the range of the files that do do exist?

There are 228 files in the directory and the oldest one is "016E" from about
a month ago.

>  Are you sure some other
> process didn't remove it accedently?

Not that I can think off. I didn't even know what this diretory was until
this problem showed up.. much less delete anything from it.

Re: Corrupted DB? could not open file pg_clog/####

От
Martijn van Oosterhout
Дата:
On Sun, Jul 30, 2006 at 04:58:34PM -0400, Francisco Reyes wrote:
> Martijn van Oosterhout writes:
>
> >It's still a reasonable suggestion. The maximum offset is the number of
> >rows in the table. You'll notice when the output is empty.
>
> Once I find the point where the output is empty then what?

That's when you've reached the end of the table. The point is that
before then you'll have found the value of N that produces the error.

> How?
> Tried set client_min_message='DEBUG';

That should do it.

> If the pg_clog files are to keep track of transactions, shouldn't a "pg_ctl
> restart"  rollback all pending transactions.. so there are no pending
> transactions upon the restart and this error should not appear again?
> Using 8.1.4

It will rollback all pending transactions. The point is that it's
looking for information about transactions that were committed. This is
usually a memory or disk error.

> >However, pg_clog/0000 is the very first transaction file created. Is it
> >in the range of the files that do do exist?
>
> There are 228 files in the directory and the oldest one is "016E" from
> about a month ago.

Sounds like some corrupt data. Once you've located the invalid data,
dump the block with pgfiledump, that should give you more info about
what happened.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Вложения

Re: Corrupted DB? could not open file pg_clog/####

От
Francisco Reyes
Дата:
Martijn van Oosterhout writes:

> That's when you've reached the end of the table. The point is that
> before then you'll have found the value of N that produces the error.

Will be a while.. my little python script is doing under 10 selects/sec...
and there are nearly 67 million records. :-(

>> How?
>> Tried set client_min_message='DEBUG';
>
> That should do it.

The right one (for the archives) was actually:
set client_min_messages=DEBUG;

> It will rollback all pending transactions. The point is that it's
> looking for information about transactions that were committed. This is
> usually a memory or disk error.

So, should it be safe to create the file and fill it up with 256K zeros?

> Sounds like some corrupt data. Once you've located the invalid data,
> dump the block with pgfiledump, that should give you more info about
> what happened.

At the rate my script is going.. it's going to take a very long time to
find out where the problem is. If I have a dump.. any usefull info I can
take from the point the dump stopped?

Re: Corrupted DB? could not open file pg_clog/####

От
Martijn van Oosterhout
Дата:
On Mon, Jul 31, 2006 at 06:09:33PM -0400, Francisco Reyes wrote:
> Martijn van Oosterhout writes:
>
> >That's when you've reached the end of the table. The point is that
> >before then you'll have found the value of N that produces the error.
>
> Will be a while.. my little python script is doing under 10 selects/sec...
> and there are nearly 67 million records. :-(

Naturally you'd do a binary search. That'd take a maximum of about
log2(67 million) = about 26 scans.

Once you find it you should be able to identify the ctid. You may be
able to delete it.

> >It will rollback all pending transactions. The point is that it's
> >looking for information about transactions that were committed. This is
> >usually a memory or disk error.
>
> So, should it be safe to create the file and fill it up with 256K zeros?

For a certain value of "safe". It get the system running, but there's
no guarentees about the data in it...

> At the rate my script is going.. it's going to take a very long time to
> find out where the problem is. If I have a dump.. any usefull info I can
> take from the point the dump stopped?

That gives you the place in the file where it broke...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Вложения