Обсуждение: Help setting up warm standby replication
Hi, after having read the documentation and the few posts in this list I tried to set up a warm standby replication between two instances of postgres running on my pc in 2 different base directories and 2 different ports. (The second one was a cp -a of the first one after having created a backup point) . I don't need a realtime update: even few minutes of lag are good enough for my needs. To begin with I followed the example of the docs: in the recovery.conf file of the slave instance I set restore_command = 'cp -av /var/lib/pgsql/data/pg_xlog/%f %p' but at restart I got these messages in the logs (some translation from italian to english by me) : LOG: system shutdown at 2007-04-13 13:02:51 CEST LOG: starting archive recovery LOG: restore_command = "cp -av /var/lib/pgsql/data/pg_xlog/%f %p" cp: impossibile fare stat di `/var/lib/pgsql/data/pg_xlog/00000001.history': No such file or directory `/var/lib/pgsql/data/pg_xlog/000000010000000000000007' -> `pg_xlog/RECOVERYXLOG' LOG: restored file "000000010000000000000007" from archive LOG: invalid record length at 0/7000218 LOG: invalid primary record checkpoint `/var/lib/pgsql/data/pg_xlog/000000010000000000000007' -> `pg_xlog/RECOVERYXLOG' LOG: restored file "000000010000000000000007" from archive LOG: invalid record length at 0/70001D0 LOG: invalid primary record checkpoint PANIC: impossible locating a valid record checkpoint LOG: startup process (PID 20250) was terminated by signal 6 LOG: aborting startup due to startup process failure same thing if I empty the pg_xlog/ dir (except archive_status/) I hope someone can explain what I did wrong. Another thing I didn't understand is the continous replication mechanism: will the slave server periodically call the recovery_command or do I have to setup a cronjob to instruct it to search updates? Sorry if these questions are stupid :) and thanks in advance. Nico
Nico Sabbi <nsabbi@officinedigitali.it> writes: > To begin with I followed the example of the docs: in the recovery.conf > file of the > slave instance I set > restore_command = 'cp -av /var/lib/pgsql/data/pg_xlog/%f %p' Hm, it looks like you are trying to copy xlog segments straight from the pg_xlog directory of the master? That's not going to work, because the master will rename/overwrite those files as soon as it thinks it's done with them. You need to have an archive_command on the master that is really truly copying the data to somewhere else, and then the restore_command should copy from the somewhere else. regards, tom lane
Tom Lane wrote: >Nico Sabbi <nsabbi@officinedigitali.it> writes: > > >>To begin with I followed the example of the docs: in the recovery.conf >>file of the >>slave instance I set >> >> > > > >>restore_command = 'cp -av /var/lib/pgsql/data/pg_xlog/%f %p' >> >> > >Hm, it looks like you are trying to copy xlog segments straight from the >pg_xlog directory of the master? That's not going to work, because the >master will rename/overwrite those files as soon as it thinks it's done >with them. You need to have an archive_command on the master that is >really truly copying the data to somewhere else, and then the >restore_command should copy from the somewhere else. > > > after having set up shared directory (one for archive_command and one for recover_command) I could successfully start and sync the slave dbms, but after the initial recovery the file recovery.conf is renamed to recovery.done and no more updates/WAL are asked (the slave's log reads: LOG: archive recovery complete LOG: system database ready ) Is there any parameter that I have to pass to the second server to keep on requesting WALs? I still don't understand what instructs the server to continously request the master's logs.
On 4/16/07, Nico Sabbi <nsabbi@officinedigitali.it> wrote: > Is there any parameter that I have to pass to the second server to keep > on requesting WALs? > I still don't understand what instructs the server to continously > request the master's logs. google pg_standby. I've set it up and it works. merlin
Merlin Moncure wrote: > On 4/16/07, Nico Sabbi <nsabbi@officinedigitali.it> wrote: > >> Is there any parameter that I have to pass to the second server to keep >> on requesting WALs? >> I still don't understand what instructs the server to continously >> request the master's logs. > > > google pg_standby. I've set it up and it works. > > merlin > Hi, I installed the second version of pg_standby.tar that you posted here. In recovery.conf I set restore_command = '/usr/local/bin/pg_standby -d -c -s 5 -w 0 /tmp/pg/ %f %p' and I see in the log file that something goes on after every update: Trigger file : (null) Waiting for WAL file : /tmp/pg//000000010000000000000021 WAL file path : 000000010000000000000021 Restoring to... : pg_xlog/RECOVERYXLOG Sleep interval : 5 seconds Max wait interval : 0 forever Command for restore : cp /tmp/pg//000000010000000000000021 pg_xlog/RECOVERYXLOG running restore : success LOG: restored file "000000010000000000000021" from archivio but "cp /tmp/pg//000000010000000000000021 pg_xlog/RECOVERYXLOG" looks suspicious to me. Is it expected? should WAL files overwrite pg_xlog/RECOVERYXLOG ? If I stop the replication and start the slave after having removed the recovery.conf the server doesn't come up: LOG: aborting startup due to startup process failure LOG: database system was interrupted while in recovery at log time 2007-04-16 15:29:42 CEST HINT: If this has occurred more than once some data may be corrupted and you may need to choose an earlier recovery target. LOG: impossibile aprire il file "pg_xlog/000000010000000000000009" (file registro 0, segmento 9): No such file or directory LOG: invalid primary checkpoint record LOG: impossibile aprire il file "pg_xlog/000000010000000000000021" (file registro 0, segmento 33): No such file or directory LOG: invalid secondary checkpoint record PANIC: impossibile localizzare un checkpoint record valido LOG: startup process (PID 10824) was terminated by signal 6 LOG: aborting startup due to startup process failure infact pg_xlog/ of the slave contains only RECOVERHISTORY: cat pg_xlog/RECOVERYHISTORY START WAL LOCATION: 0/9000020 (file 000000010000000000000009) STOP WAL LOCATION: 0/A000000 (file 00000001000000000000000A) CHECKPOINT LOCATION: 0/9000020 START TIME: 2007-04-16 15:29:42 CEST LABEL: label STOP TIME: 2007-04-16 15:31:39 CEST Did I do something wrong? Thanks for your help, Nico
On 4/16/07, Nico Sabbi <nsabbi@officinedigitali.it> wrote: > Merlin Moncure wrote: > > > On 4/16/07, Nico Sabbi <nsabbi@officinedigitali.it> wrote: > > > >> Is there any parameter that I have to pass to the second server to keep > >> on requesting WALs? > >> I still don't understand what instructs the server to continously > >> request the master's logs. > > > > > > google pg_standby. I've set it up and it works. > > > > merlin > > > > Hi, > I installed the second version of pg_standby.tar that you posted here. > > In recovery.conf I set > restore_command = '/usr/local/bin/pg_standby -d -c -s 5 -w 0 /tmp/pg/ %f %p' > > and I see in the log file that something goes on after every update: > > Trigger file : (null) > Waiting for WAL file : /tmp/pg//000000010000000000000021 > WAL file path : 000000010000000000000021 > Restoring to... : pg_xlog/RECOVERYXLOG > Sleep interval : 5 seconds > Max wait interval : 0 forever > Command for restore : cp /tmp/pg//000000010000000000000021 > pg_xlog/RECOVERYXLOG > running restore : success > LOG: restored file "000000010000000000000021" from archivio > > > but "cp /tmp/pg//000000010000000000000021 pg_xlog/RECOVERYXLOG" looks > suspicious to me. > > Is it expected? should WAL files overwrite pg_xlog/RECOVERYXLOG ? > If I stop the replication and start the slave after having removed the > recovery.conf > the server doesn't come up: > > LOG: aborting startup due to startup process failure > LOG: database system was interrupted while in recovery at log time > 2007-04-16 15:29:42 CEST > HINT: If this has occurred more than once some data may be corrupted > and you may need to choose an earlier recovery target. > LOG: impossibile aprire il file "pg_xlog/000000010000000000000009" > (file registro 0, segmento 9): No such file or directory > LOG: invalid primary checkpoint record > LOG: impossibile aprire il file "pg_xlog/000000010000000000000021" > (file registro 0, segmento 33): No such file or directory > LOG: invalid secondary checkpoint record > PANIC: impossibile localizzare un checkpoint record valido > LOG: startup process (PID 10824) was terminated by signal 6 > LOG: aborting startup due to startup process failure > > > infact pg_xlog/ of the slave contains only RECOVERHISTORY: > > cat pg_xlog/RECOVERYHISTORY > START WAL LOCATION: 0/9000020 (file 000000010000000000000009) > STOP WAL LOCATION: 0/A000000 (file 00000001000000000000000A) > CHECKPOINT LOCATION: 0/9000020 > START TIME: 2007-04-16 15:29:42 CEST > LABEL: label > STOP TIME: 2007-04-16 15:31:39 CEST > > Did I do something wrong? > Thanks for your help, try link mode, not copy mode (-l). make sure you read about the -k switch. merlin
Merlin Moncure wrote: > > > try link mode, not copy mode (-l). make sure you read about the -k > switch. > > merlin > replaced -c with -l, but the result was the same. I assume that with -k you mean the socket dir for postgres - since pg_standby doesn't seek to recognize that switch - but I didn't need it: I connected to -p 5433. Anyway, after having stopped the replication I had the same problem; in order to make the slave start I had to manually copy the log files from the archive to pg_xlog/ . Is pg_standby supposed to work with Postgres 8.2 ? or only with 8.1.?
Nico Sabbi wrote: > Merlin Moncure wrote: > >> >> >> try link mode, not copy mode (-l). make sure you read about the -k >> switch. >> >> merlin >> > replaced -c with -l, but the result was the same. > I assume that with -k you mean the socket dir for postgres - since > pg_standby doesn't > seek to recognize that switch - but I didn't need it: I connected to > -p 5433. > > Anyway, after having stopped the replication I had the same problem; > in order to make the slave start I had to manually copy the log files > from the > archive to pg_xlog/ . > > Is pg_standby supposed to work with Postgres 8.2 ? or only with 8.1.? > > update: touching the right trigger makes the replication stop, thus I can connect to the server and see the updated data. Thanks a lot, Nico