Обсуждение: offline consistency check and info on attributes

Поиск
Список
Период
Сортировка

offline consistency check and info on attributes

От
Tomas Vondra
Дата:
Hi,

a recent discussion about possible data corruption (in the general list)
suggests that a tool for offline integrity check might be useful.
Something that might be run when the database is shut down, to check
that the data files are OK.

The idea is to read all heap blocks and check for various 'suspicious'
things like 'pd_lower > pd_upper', invalid lengths (negative lengths,
resulting in in alloc errors) etc. It might list blocks that are somehow
corrupted, dump them in a separate file etc.

Right now I do have a very simple tool that reads a given file and
performs a lot of checks at the block level (as described in bufpage.h),
and the next step should be validating basic structure of the tuples
(lengths). And that's the point where I'm stuck right now - I'm thinking
what might be the most elegant way to get info about attributes, without
access to the pg_attribute catalog (the tool is intended for offline
checks).

I've figured out the catalog-to-file mapping (in relmapper.c), but now
I'm wondering - it's just another relation, so I'd have to read the
block, parse the items and interpret them (not sure how to do that
without the pg_attribute data itself). So I wonder - what would be an
elegant solution?

regards
Tomas


Re: offline consistency check and info on attributes

От
Alvaro Herrera
Дата:
Excerpts from Tomas Vondra's message of dom abr 24 13:49:31 -0300 2011:

> Right now I do have a very simple tool that reads a given file and
> performs a lot of checks at the block level (as described in bufpage.h),
> and the next step should be validating basic structure of the tuples
> (lengths). And that's the point where I'm stuck right now - I'm thinking
> what might be the most elegant way to get info about attributes, without
> access to the pg_attribute catalog (the tool is intended for offline
> checks).

Each tuple declares its length.  You don't need to know each attribute's
length to check that.  Doing attribute-level checks is probably
pointless without catalog access.

> I've figured out the catalog-to-file mapping (in relmapper.c), but now
> I'm wondering - it's just another relation, so I'd have to read the
> block, parse the items and interpret them (not sure how to do that
> without the pg_attribute data itself). So I wonder - what would be an
> elegant solution?

This reminds me -- we need to have pg_filedump be able to dump the
relmapper stuff.  I was going to write a patch for it but then I forgot.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: offline consistency check and info on attributes

От
Tomas Vondra
Дата:
Dne 25.4.2011 18:16, Alvaro Herrera napsal(a):
> Excerpts from Tomas Vondra's message of dom abr 24 13:49:31 -0300 2011:
> 
>> Right now I do have a very simple tool that reads a given file and
>> performs a lot of checks at the block level (as described in bufpage.h),
>> and the next step should be validating basic structure of the tuples
>> (lengths). And that's the point where I'm stuck right now - I'm thinking
>> what might be the most elegant way to get info about attributes, without
>> access to the pg_attribute catalog (the tool is intended for offline
>> checks).
> 
> Each tuple declares its length.  You don't need to know each attribute's
> length to check that.  Doing attribute-level checks is probably
> pointless without catalog access.

Yes, I know the tuple length is in HeapTupleHeader (and I'm already
checking that), but that does not allow to check lengths of the
individual columns, especially those with varlena types.

That's a very annoying type of corruption, because the queries that do
not touch such columns seem to work fine, but once you attempt to access
the corrupted column you'll get something like this:

pg_dump: SQL command failed
pg_dump: Error message from server: ERROR:  invalid memory alloc
request size 4294967293

So the ability to check where a the column lengths do not make sense (in
this case it's a negative value) would be a nice thing. But without the
access to pg_attribute this seems to be very difficult.

Hmmm, maybe the idea to build it as an offline tool (to use it when the
DB is not running) is not a good idea ...

Tomas


Re: offline consistency check and info on attributes

От
Alvaro Herrera
Дата:
Excerpts from Tomas Vondra's message of lun abr 25 14:50:18 -0300 2011:

> Yes, I know the tuple length is in HeapTupleHeader (and I'm already
> checking that), but that does not allow to check lengths of the
> individual columns, especially those with varlena types.
> 
> That's a very annoying type of corruption, because the queries that do
> not touch such columns seem to work fine, but once you attempt to access
> the corrupted column you'll get something like this:
> 
> pg_dump: SQL command failed
> pg_dump: Error message from server: ERROR:  invalid memory alloc
> request size 4294967293

Yeah, I agree with this being less than ideal.  However, as you
conclude, I don't think it's really workable to check this without
support from the running system.  I wrote a dumb tool to attempt to
detoast all varlena columns, capture exceptions and report them; see the
code here:
http://alvherre.livejournal.com/4404.html
(You need to pass it a table name as a text parameter; that bit is
crap, as it fails for funny names).  Note that this assumes that there
is a function length() for every varlena datatype in the table, which
may not be true for some of them.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: offline consistency check and info on attributes

От
Tomas Vondra
Дата:
Dne 25.4.2011 18:16, Alvaro Herrera napsal(a):
> Excerpts from Tomas Vondra's message of dom abr 24 13:49:31 -0300 2011:
> 
>> I've figured out the catalog-to-file mapping (in relmapper.c), but now
>> I'm wondering - it's just another relation, so I'd have to read the
>> block, parse the items and interpret them (not sure how to do that
>> without the pg_attribute data itself). So I wonder - what would be an
>> elegant solution?
> 
> This reminds me -- we need to have pg_filedump be able to dump the
> relmapper stuff.  I was going to write a patch for it but then I forgot.

Was this a polite question whether I volunteer to write that patch? ;-)

I've never used pg_filedump and I'm not quite sure what exactly is
needed, but it seem simple enough to do it. OK, I know the iceberg that
sank Titanic seemed small too ...

I think I'll move the integrity check to the db, so that it's possible
to check the column lengths etc. (pageinspect seems like a good module
to mutilate in this direction) but I still believe it'd be useful to
have an offline tool for basic checks. Would pg_filedump be a resonable
tool to do that?

regards
Tomas


Re: offline consistency check and info on attributes

От
Alvaro Herrera
Дата:
Excerpts from Tomas Vondra's message of mar abr 26 17:39:19 -0300 2011:
> Dne 25.4.2011 18:16, Alvaro Herrera napsal(a):

> > This reminds me -- we need to have pg_filedump be able to dump the
> > relmapper stuff.  I was going to write a patch for it but then I forgot.
> 
> Was this a polite question whether I volunteer to write that patch? ;-)
>
> I've never used pg_filedump and I'm not quite sure what exactly is
> needed, but it seem simple enough to do it. OK, I know the iceberg that
> sank Titanic seemed small too ...

Err, no, sorry if you thought it was.  If you still want to volunteer
I'm sure it'd be more than welcome.

> I think I'll move the integrity check to the db, so that it's possible
> to check the column lengths etc. (pageinspect seems like a good module
> to mutilate in this direction) but I still believe it'd be useful to
> have an offline tool for basic checks. Would pg_filedump be a resonable
> tool to do that?

No, I don't think pg_filedump is a good host for such checks.  If we're
going to have a tool to do that it'd be better to be able to include it in
core (or at least contrib), and we can't have pg_filedump in there for
licensing reasons.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: offline consistency check and info on attributes

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Excerpts from Tomas Vondra's message of mar abr 26 17:39:19 -0300 2011:
>> Dne 25.4.2011 18:16, Alvaro Herrera napsal(a):
>> I think I'll move the integrity check to the db, so that it's possible
>> to check the column lengths etc. (pageinspect seems like a good module
>> to mutilate in this direction) but I still believe it'd be useful to
>> have an offline tool for basic checks. Would pg_filedump be a resonable
>> tool to do that?

> No, I don't think pg_filedump is a good host for such checks.  If we're
> going to have a tool to do that it'd be better to be able to include it in
> core (or at least contrib), and we can't have pg_filedump in there for
> licensing reasons.

Quite aside from licensing reasons, pg_filedump is only meant to print
out a very low-level representation of file contents; it has little
real understanding of what it's printing.  So I think it's a bad basis
for a verification utility on technical grounds too.
        regards, tom lane