Обсуждение: problem with pg_standby
Hello, we currently setup a standby database with archive_command sending the WALs from master to standby. This works as expected, but the standby database doesn't restore the WALs from the given directory in recovery.conf and I have no idea why... recovery.conf: ---------------- restore_command = '/usr/bin/pg_standby -d -s 20 -t /tmp/pgsql.trigger.5432 /var/lib/pgsql/wal_exchange %f %p %r 2>> /var/lib/pgsql/data/pg_log/standby.log' The file standby.log contains: -------------------------------- Trigger file : /tmp/pgsql.trigger.5432 Waiting for WAL file : 00000001.history WAL file path : /var/lib/pgsql/wal_exchange/00000001.history Restoring to... : pg_xlog/RECOVERYHISTORY Sleep interval : 20 seconds Max wait interval : 0 forever Command for restore : cp "/var/lib/pgsql/wal_exchange/00000001.history" "pg_xlog/RECOVERYHISTORY" Keep archive history : 000000000000000000000000 and later running restore :cp: cannot stat `/var/lib/pgsql/wal_exchange/00000001.history': No such file or directory cp: cannot stat `/var/lib/pgsql/wal_exchange/00000001.history': No such file or directory cp: cannot stat `/var/lib/pgsql/wal_exchange/00000001.history': No such file or directory cp: cannot stat `/var/lib/pgsql/wal_exchange/00000001.history': No such file or directory not restored : history file not found Trigger file : /tmp/pgsql.trigger.5432 Waiting for WAL file : 000000010000001E0000001D.00000020.backup WAL file path : /var/lib/pgsql/wal_exchange/000000010000001E0000001D.00000020.backup Restoring to... : pg_xlog/RECOVERYHISTORY Sleep interval : 20 seconds Max wait interval : 0 forever Command for restore : cp "/var/lib/pgsql/wal_exchange/000000010000001E0000001D.00000020.backup" "pg_xlog/RECOVERYHISTORY" Keep archive history : 000000000000000000000000 and later WAL file not present yet. Checking for trigger file... WAL file not present yet. Checking for trigger file... WAL file not present yet. Checking for trigger file... WAL file not present yet. Checking for trigger file... WAL file not present yet. Checking for trigger file... ------------------------------- And in the directory "/var/lib/pgsql/wal_exchange" I have dozens of WALs in the meantime, but they didn't get restored, and in the logs I receive "WAL file not present...." The file "pg_xlog/RECOVERYHISTORY" doesn't exist, btw..., and the file xx00000020.backup contains: "" START WAL LOCATION: 1E/1D000020 (file 000000010000001E0000001D) STOP WAL LOCATION: 1E/1F000000 (file 000000010000001E0000001E) CHECKPOINT LOCATION: 1E/1D000020 START TIME: 2010-08-03 21:11:23 CEST LABEL: initial_backup STOP TIME: 2010-08-03 21:44:06 CEST "" I can stop and restart the standby db as often as I want ending up in a working sync-mechanism, as long as the WALs being copied from master to standby, right ? Any help appreciated, many thanks .....GERD.....
On Tue, 2010-08-03 at 22:37 +0200, Gerd Koenig wrote: > Hello, > > we currently setup a standby database with archive_command sending the WALs > from master to standby. > This works as expected, but the standby database doesn't restore the WALs from > the given directory in recovery.conf and I have no idea why... What are you using to manage this? I would strongly suggest Walmgr or PITRTools. They take a lot of the guess work out of all this stuff. Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
Hi again, I just want to drop you an additional note. After several attempts of debugging pg_standby is restoring the WAL files as expected, but I cannot explain why... First startup of postgres was with init-script provided by the rpm installation (/etc/init.d/postgresql start as user root). The pg processes have been started thereby as user postgres. An additional stop/start by init-script ended up in the same problem situation, where pg_standby obviously wasn't able to get the file 000000010000001E0000001D.00000020.backup, even if it was there (and readable by user postgres ;-) ). Afterwards (3rd attempt) I started postgres not by init-script, but by pg_ctl as user postgres, and......hej....heureka......pg_standby finds all needed files and restores them. Since even the init-script starts pg as user postgres I have no idea what differs from init-script to direct call of pg_ctl as user postgres...?!?! Somebody an explanation ? kind regards....GERD.... On Tuesday, August 03, 2010 10:37:12 pm Gerd Koenig wrote: > Hello, > > we currently setup a standby database with archive_command sending the WALs > from master to standby. > This works as expected, but the standby database doesn't restore the WALs > from the given directory in recovery.conf and I have no idea why... > > recovery.conf: > ---------------- > restore_command = '/usr/bin/pg_standby -d -s 20 -t /tmp/pgsql.trigger.5432 > /var/lib/pgsql/wal_exchange %f %p %r 2>> > /var/lib/pgsql/data/pg_log/standby.log' > > The file standby.log contains: > -------------------------------- > Trigger file : /tmp/pgsql.trigger.5432 > Waiting for WAL file : 00000001.history > WAL file path : /var/lib/pgsql/wal_exchange/00000001.history > Restoring to... : pg_xlog/RECOVERYHISTORY > Sleep interval : 20 seconds > Max wait interval : 0 forever > Command for restore : cp "/var/lib/pgsql/wal_exchange/00000001.history" > "pg_xlog/RECOVERYHISTORY" > Keep archive history : 000000000000000000000000 and later > running restore :cp: cannot stat > `/var/lib/pgsql/wal_exchange/00000001.history': No such file or directory > cp: cannot stat `/var/lib/pgsql/wal_exchange/00000001.history': No such > file or directory > cp: cannot stat `/var/lib/pgsql/wal_exchange/00000001.history': No such > file or directory > cp: cannot stat `/var/lib/pgsql/wal_exchange/00000001.history': No such > file or directory > not restored : history file not found > > Trigger file : /tmp/pgsql.trigger.5432 > Waiting for WAL file : 000000010000001E0000001D.00000020.backup > WAL file path : > /var/lib/pgsql/wal_exchange/000000010000001E0000001D.00000020.backup > Restoring to... : pg_xlog/RECOVERYHISTORY > Sleep interval : 20 seconds > Max wait interval : 0 forever > Command for restore : cp > "/var/lib/pgsql/wal_exchange/000000010000001E0000001D.00000020.backup" > "pg_xlog/RECOVERYHISTORY" > Keep archive history : 000000000000000000000000 and later > WAL file not present yet. Checking for trigger file... > WAL file not present yet. Checking for trigger file... > WAL file not present yet. Checking for trigger file... > WAL file not present yet. Checking for trigger file... > WAL file not present yet. Checking for trigger file... > ------------------------------- > > And in the directory "/var/lib/pgsql/wal_exchange" I have dozens of WALs in > the meantime, but they didn't get restored, and in the logs I receive "WAL > file not present...." > The file "pg_xlog/RECOVERYHISTORY" doesn't exist, btw..., and the file > xx00000020.backup contains: > "" > START WAL LOCATION: 1E/1D000020 (file 000000010000001E0000001D) > STOP WAL LOCATION: 1E/1F000000 (file 000000010000001E0000001E) > CHECKPOINT LOCATION: 1E/1D000020 > START TIME: 2010-08-03 21:11:23 CEST > LABEL: initial_backup > STOP TIME: 2010-08-03 21:44:06 CEST > "" > > I can stop and restart the standby db as often as I want ending up in a > working sync-mechanism, as long as the WALs being copied from master to > standby, right ? > > Any help appreciated, many thanks .....GERD.....
Gerd Koenig wrote: > Since even the init-script starts pg as user postgres I have no idea what > differs from init-script to direct call of pg_ctl as user postgres...?!?! > Do you have SELinux turned on? That can do weird stuff like this--the init script will be running with restrictions the manual pg_ctl launch will not have. Should be information in /var/log/ somewhere if that's the case. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On Tue, 2010-08-03 at 22:37 +0200, Gerd Koenig wrote: > Hello, > > we currently setup a standby database with archive_command sending the WALs > from master to standby. > This works as expected, but the standby database doesn't restore the WALs from > the given directory in recovery.conf and I have no idea why... What are you using to manage this? I would strongly suggest Walmgr or PITRTools. They take a lot of the guess work out of all this stuff. Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt
Hello Greg, thanks for the hint, yes, SELinux caused the troubles. It complained about wrong filecontext while starting postgres via init-script. Filecontext was: var_lib_t and it should be: postgresql_t regards...GERD.... On Tuesday, August 03, 2010 11:54:45 pm Greg Smith wrote: > Gerd Koenig wrote: > > Since even the init-script starts pg as user postgres I have no idea what > > differs from init-script to direct call of pg_ctl as user postgres...?!?! > > Do you have SELinux turned on? That can do weird stuff like this--the > init script will be running with restrictions the manual pg_ctl launch > will not have. Should be information in /var/log/ somewhere if that's > the case.
Gerd Koenig wrote: > thanks for the hint, yes, SELinux caused the troubles. It complained about > wrong filecontext while starting postgres via init-script. > > Filecontext was: var_lib_t and it should be: postgresql_t > If you want to keep SELinux on, basically you have to relabel the directory you are putting those into so it can access them. You might find some useful hints on that topic at http://serverfault.com/questions/32333/how-does-selinux-affect-the-home-directory (including the observation that /home is a bad place for them). You might think you can just run the chcon command to reset the labels, but it doesn't quite work like that; you have to change the policy and then use restorecon to correct them. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
thanks Greg, yes, we want SELinux in enforcing mode. Thereby (and to ensure persistence) just chcon is the wrong way and * semanage fcontext -a -t postgresql_t '<dir>' * restorecon -vvFR <dir> is much better ;-) regards--GERD-- On Wednesday, August 04, 2010 07:56:56 am Greg Smith wrote: > Gerd Koenig wrote: > > thanks for the hint, yes, SELinux caused the troubles. It complained > > about wrong filecontext while starting postgres via init-script. > > > > Filecontext was: var_lib_t and it should be: postgresql_t > > If you want to keep SELinux on, basically you have to relabel the > directory you are putting those into so it can access them. You might > find some useful hints on that topic at > http://serverfault.com/questions/32333/how-does-selinux-affect-the-home-dir > ectory (including the observation that /home is a bad place for them). You > might think you can just run the chcon command to reset the labels, but it > doesn't quite work like that; you have to change the policy and then use > restorecon to correct them.