warm standby, pg_standby, invalid checkpoint record

Поиск

Список

Период

Сортировка

От	Brad Wiemerslage
Тема	warm standby, pg_standby, invalid checkpoint record
Дата	27 февраля 2009 г. 03:47:58
Msg-id	818050.20576.qm@web56001.mail.re3.yahoo.com обсуждение исходный текст
Ответы	Re: warm standby, pg_standby, invalid checkpoint record
Список	pgsql-admin

Дерево обсуждения

I'm attempting to get warm standby up and running with a pair of servers running ubuntu 8.04 and postgresql 8.3.  Been
followingthe docs: 

http://www.postgresql.org/docs/8.3/static/warm-standby.html
http://www.postgresql.org/docs/current/static/pgstandby.html

Also, basically following the ideas here in this blog post:

http://scale-out-blog.blogspot.com/2009/02/simple-ha-with-postgresql-point-in-time.html

I've customized the original script he refers to in the article, which is here in its entirety for reference:

https://s3.amazonaws.com/extras.continuent.com/standby.sh

Here is the meat of my customized script, which runs on the standby.  The postgresql server on the standby is stopped
first. 

start_backup="SELECT pg_start_backup('my_backup');"
stop_backup="SELECT pg_stop_backup();"
echo "$start_backup" | $psql -h$PRIMARY -U myuser -d mydb -e
rsync --delete -avz -e "ssh -i /path/to/key" myuser@$PRIMARY:$PG_DATA/ $PG_DATA
echo "$stop_backup" | $psql -h $PRIMARY -U myuser -d mydb -e

The files seem to copied over to the standby machine just fine.  Success is reported with respect to the backup
commands. Permissions seem fine.   

Next, there are some steps which blow out some files.  As I understand it, you no longer need the files on the standby
thatwere in pg_xlog on the primary.  

rm -f $PG_DATA/recovery.*
rm -f $PG_DATA/8.3/main/logfile
rm -f $PG_DATA/8.3/main/postmaster.pid
rm -f $PG_DATA/8.3/main/pg_xlog/0*
rm -f $PG_DATA/8.3/main/pg_xlog/archive_status/0*

This step seems to work fine.

Then, the archives are pulled.  They are pulled to /mnt/postgresql_archives with this command:

rsync --delete -avz -e "ssh -i /path/to/key" myuser@$PRIMARY:$PG_ARCHIVES/ $PG_ARCHIVES

Everything looks good.  I end up with an up to date list of WAL files in /mnt/postgresql_archives on the standby.  Here
isa listing: 

root@standby:/mnt/postgresql_archives# ls
total 688996
drwxr-xr-x  2 postgres postgres     4096 2009-02-27 01:13 .
drwxr-xr-x 14 root     root         4096 2009-02-27 01:17 ..
-rw-rw----  1 postgres postgres 16777216 2009-02-27 00:19 0000000100000000000000CB
-rw-rw----  1 postgres postgres 16777216 2009-02-27 00:29 0000000100000000000000CC
-rw-rw----  1 postgres postgres 16777216 2009-02-27 00:38 0000000100000000000000CD
-rw-rw----  1 postgres postgres      245 2009-02-27 00:38 0000000100000000000000CD.00000020.backup
-rw-rw----  1 postgres postgres 16777216 2009-02-27 00:48 0000000100000000000000CE
-rw-rw----  1 postgres postgres 16777216 2009-02-27 00:54 0000000100000000000000CF
-rw-rw----  1 postgres postgres 16777216 2009-02-27 00:58 0000000100000000000000D0
-rw-rw----  1 postgres postgres 16777216 2009-02-27 01:01 0000000100000000000000D1
-rw-rw----  1 postgres postgres 16777216 2009-02-27 01:03 0000000100000000000000D2
-rw-rw----  1 postgres postgres 16777216 2009-02-27 01:13 0000000100000000000000D3

Then, the recovery.conf is put in place.  I've tried two different versions, which end up giving me the same error.
Hereare the two different versions.   

#1: restore_command = '/usr/lib/postgresql/8.3/bin/pg_standby -c -d -s 2 -t /mnt/postgresql_archives/pgsql.trigger
/mnt/postgresql_archives%f %p >> /mnt/postgresql_archives/standby.log 1>&2' 

#2: restore_command = 'cp /mnt/server/archivedir/%f "%p"'

I don't believe that #2 is suitable for warm standby, but just tried it to debug after #1 wouldn't work.  Now, I try to
startup the server.  For it to work in standby mode, additional archive files will be pulled from the primary machine
ona periodic basis.  I'm using this command, which deletes them on the primary when they are no longer necessary.  It
alsoseems to work fine.     

rsync -avz -e "ssh -i /path/to/key" myuser@$PRIMARY:$PG_ARCHIVES/ $PG_ARCHIVES

I guess I'm a little confused about exactly what is happening here when the server comes up, but here is the error
messageI'm getting.  It seems to be looking for the files in pg_pxlog, which is cleared out.  So, the error makes
sense. But isn't it supposed to be looking in /mnt/postgresql_archives per the restore_command(s)?  The files are
availablethere.    

2009-02-27 01:26:52.867 EST,,,7422,,49a787ac.1cfe,2,,2009-02-27 01:26:52 EST,,0,LOG,58P01,"could not open file
""pg_xlog/0000000100000000000000CD""(log file 0, segment 20 
5): No such file or directory",,,,,,,,
2009-02-27 01:26:52.867 EST,,,7422,,49a787ac.1cfe,3,,2009-02-27 01:26:52 EST,,0,LOG,00000,"invalid checkpoint
record",,,,,,,,
2009-02-27 01:26:52.867 EST,,,7422,,49a787ac.1cfe,4,,2009-02-27 01:26:52 EST,,0,PANIC,XX000,"could not locate required
checkpointrecord",,"If you are not restoring from a 
 backup, try removing the file ""/var/lib/postgresql/8.3/main/backup_label"".",,,,,,
2009-02-27 01:26:52.868 EST,,,7419,,49a787ab.1cfb,1,,2009-02-27 01:26:51 EST,,0,LOG,00000,"startup process (PID 7422)
wasterminated by signal 6: Aborted",,,,,,,, 
2009-02-27 01:26:52.868 EST,,,7419,,49a787ab.1cfb,2,,2009-02-27 01:26:51 EST,,0,LOG,00000,"aborting startup due to
startupprocess failure",,,,,,,, 

So, I tried copying the files in the /mnt/postgresql_archives over to pg_xlog.  This seemed to work, and the updates
wereapplied.  Never at any point did I get a recovery.done file.  Also, for whatever reason, I was never able to get
anydebug info from pg_standby in standby.log. 

Anyhow, I've burned up a couple days trying to figure this out.  Any help would be much appreciated.

Thanks,
Brad

В списке pgsql-admin по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

warm standby, pg_standby, invalid checkpoint record