[GSOC] - I ntegrity check algorithm for data files

Поиск
Список
Период
Сортировка
От Robert Mach
Тема [GSOC] - I ntegrity check algorithm for data files
Дата
Msg-id 464D8564.7020901@gmail.com
обсуждение исходный текст
Ответы Re: [GSOC] - I ntegrity check algorithm for data files  (Martijn van Oosterhout <kleptog@svana.org>)
Re: [GSOC] - I ntegrity check algorithm for data files  (Zdenek Kotala <Zdenek.Kotala@Sun.COM>)
Список pgsql-hackers
Hello,
my name is Robert Mach and I am happy to by working on GSoC project for 
Postgresql. The name of the project is  Integrity check algorithm for 
data files in Postgresql.

So far, I have put together a list of possible checks for data failure 
and I would like to hear as much opinions on this list or even better 
some recommendation on what should be added to this list or removed.

Before presenting possible errors, I divided them into physical and 
logical errors. Physical errors will refer to errors in the structure of 
pages and tuples, whereas logical errors will depict errors that cause 
incorrect performance of postgresql, but are correct according the 
structure of data files.

In order to find PHYSICAL errors:
- check whether the total size of all TOAST table chunks is the same as 
the size mentioned in the TOASTed table
- in case of variable length representation of data (attlen = -1) 
compare the real size of stored data with the size of data mentioned 
within the varlena lenth word
- count the number of rows in a table and compare it with the 
pg_class.reltuples of corresponding record in pg_class.
-check the format of data according to the flags (that determine the 
representation) belonging to them. e.g.: in case of TOASTed values, is 
the size of pointer datum really 20B?, etc.
-check the fields firmly defined by the structure for occurrence of odd data

In order to find LOGICAL errors:
-check the validity of all items in index (e.g. concurrent update and 
index scan with constrain could cause inconsistent snapshot of database 
used for creating the index)/
-After creating of index or running the index scan, check whether all 
tuples that should be indexed are really indexed (e.g. )
-check the validity and effectiveness of free space map (whether it is 
not considering valid data as free, whether the size is fitting the 
needs of database, etc.)
-check whether all user-defined functions in the database are 
visible/usable for the users (maybe also verify privileges..)
-check whether the constrains applied on items are fulfilled:    - the uniqueness    - range of values    - correctness
offoreign key values
 
-in case of very large databases, check whether the wrap around of 
transactions IDs occurred. (Transaction ID = 2 to the power of 32)
-check the integrity of catalogs

I see different ways of delivering this functionality to Postgresql. The 
best of course would be to become part of Postgresql release either as a 
PostgreSQL command (like UPDATE) or as an postgresql server application 
like vacuumdb.
Other possibility is to create a freestanding program that would be 
called with location of datafiles as arguments.
Last possibility is to create an administrative console access (single 
user mode) to the database in which this integrity check could be fired.

I hope to get lots of thoughts  on this proposal as well as lots on 
other ideas on what should be checked in odrer to verify the integrity 
of data in Postgres.

Cheers,

Robert



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Working with PostgreSQL source tree (was Re: Not ready for 8.3)
Следующее
От: "Pavan Deolasee"
Дата:
Сообщение: Re: Working with PostgreSQL source tree (was Re: Not ready for 8.3)