pg v. 8.4.5 misses objects and data after restoring from backup using wal

Поиск
Список
Период
Сортировка
От Imre Oolberg
Тема pg v. 8.4.5 misses objects and data after restoring from backup using wal
Дата
Msg-id 4D1C670E.1090408@auul.pri.ee
обсуждение исходный текст
Ответы Re: pg v. 8.4.5 misses objects and data after restoring from backup using wal  (Scott Ribe <scott_ribe@elevated-dev.com>)
Re: pg v. 8.4.5 misses objects and data after restoring from backup using wal  (Imre Oolberg <imre@auul.pri.ee>)
Список pgsql-admin
Hi!

I did my usual test restoration from backup which leaded me to suspect
that some data is missing although database starts up. I did some more
testing to reproduce the problem as generally as i could and i wish to
present it here and hope that somebody could comment on this. It may
well be my bad as i am doing some systematic error or something else.

I use debian lenny amd64 with postgresql from backports which is

# dpkg -l postgre\* | grep ii
ii  postgresql-8.4                      8.4.5-2~bpo50+1
object-relational SQL database, version 8.4
ii  postgresql-client-8.4               8.4.5-2~bpo50+1
front-end programs for PostgreSQL 8.4
ii  postgresql-client-common            111~bpo50+1
manager for multiple PostgreSQL client versi
ii  postgresql-common                   111~bpo50+1
PostgreSQL database-cluster manager
ii  postgresql-contrib-8.4              8.4.5-2~bpo50+1
additional facilities for PostgreSQL
ii  postgresql-plperl-8.4               8.4.5-2~bpo50+1
PL/Perl procedural language for PostgreSQL 8
ii  postgresql-server-dev-8.4           8.4.5-2~bpo50+1
development files for PostgreSQL 8.4 server-

In addition i have compiled and installed pljava, plproxy and skytools
libraries but in this test i do not use them, i.e. i guess it doest
really matter that the are in /usr/lib/postgresql/8.4/lib directory
since no languages/functions are defined to use them.

PostgreSQL config is default except wal logging is turned on.

My procedure consists of the following steps

1. start up database
2. created schema test1 and into it table test like

CREATE TABLE test1.test (
   test_id serial NOT NULL,
   pilt bytea,
   CONSTRAINT test_id_pkey PRIMARY KEY (test_id)
)

and started inserting data with this script

import psycopg2, cPickle
conn = psycopg2.connect("dbname=test host=127.0.0.1 user=postgres
password=xxx port=5432")
cur = conn.cursor()

f = open('/boot/initrd.img-2.6.26-2-amd64', 'rb')
binary = f.read()
sql = "insert into test1.test (pilt) values (%s)";
cur.execute(sql, (psycopg2.Binary(binary),))
cur.close()
conn.commit()
conn.close()

like

$ for i in `seq 1 200`; do echo $i; python pi1.py; sleep 10; done

3. issued pg_start_backup('test') while script is still running
4. created schemas and tables test2.test and test3.test and started
insterting data similarly
5. while three scripts are running issued rsync -avH --bwlimit=1024
root@localhost:/var/lib/postgresql/ /mnt/backup/
6. while rsync is running i created schemas and tables test4.test and
test5.test and started insterting data similarly
7. when scripts and rsync have been completed i said pg_stop_backup
8. created schema and table test6.test and started insterting data similarly

When everything was finished i had approx 5g used as data cluster
filesystem and about 400 wal logs by 16 MB each.

Restore went like this

1. mounted /mnt/backup/ under /var/lib/postgresql
2. removed backup_label
3. created recovery.conf with a line

restore_command = 'cp /data/backup/postgresql/archive-logs/%f %p'

4. emptied pg_xlog directory while having directory itself
5. started postgresql and it finishes accepting connections

And now observations

1. while i have 400 wal log files during recovery it used only about
last 20 of them
2. while original working postgresql data cluster filesystem's used size
was about 5 gb after restore used space is 2.6 gb
3. looking thru tables, most of them have quarter of half less entries
than was actually insterted and is present in original database
4. some tables are missing (test4.test) and some give errors like
(test5.test) when i say select count(*) from test5.test;

2010-12-30 12:41:23 EET ERROR:  catalog is missing 2 attribute(s) for
relid 16586 at character 22
2010-12-30 12:41:23 EET STATEMENT:  select count(*) from test5.test;

tried it also with compiled version of 8.4.5 but i chose to use lenny +
backports because it may be this way more reproducable that my compiled
instance. Actually my data size is about 160 gb, and i tested also with
sizes about 1g and 4 tables but successful reproduction seems to need
database size in filesystem around 5 g and 6 tables.

I undestand that this report is not very precise, i.e. i have not fixed
how much data was interted into database at the moment rsyncing started
but i hope it has some useful points and someone could help me further,
i am also ready to do some more testing if it is helpful.


Imre


В списке pgsql-admin по дате отправления:

Предыдущее
От: David Rericha
Дата:
Сообщение: Re: date out of range for timestamp
Следующее
От: Lukasz Brodziak
Дата:
Сообщение: Finding missing values in sequence column