Обсуждение: Help setting up warm standby replication
Hi,
after having read the documentation and the few posts in this list I
tried to set up
a warm standby replication between two instances of postgres running on
my pc
in 2 different base directories and 2 different ports.
(The second one was a cp -a of the first one after having created a
backup point) .
I don't need a realtime update: even few minutes of lag are good enough
for my needs.
To begin with I followed the example of the docs: in the recovery.conf
file of the
slave instance I set
restore_command = 'cp -av /var/lib/pgsql/data/pg_xlog/%f %p'
but at restart I got these messages in the logs (some translation from
italian to english by me) :
LOG: system shutdown at 2007-04-13 13:02:51 CEST
LOG: starting archive recovery
LOG: restore_command = "cp -av /var/lib/pgsql/data/pg_xlog/%f %p"
cp: impossibile fare stat di
`/var/lib/pgsql/data/pg_xlog/00000001.history': No such file or directory
`/var/lib/pgsql/data/pg_xlog/000000010000000000000007' ->
`pg_xlog/RECOVERYXLOG'
LOG: restored file "000000010000000000000007" from archive
LOG: invalid record length at 0/7000218
LOG: invalid primary record checkpoint
`/var/lib/pgsql/data/pg_xlog/000000010000000000000007' ->
`pg_xlog/RECOVERYXLOG'
LOG: restored file "000000010000000000000007" from archive
LOG: invalid record length at 0/70001D0
LOG: invalid primary record checkpoint
PANIC: impossible locating a valid record checkpoint
LOG: startup process (PID 20250) was terminated by signal 6
LOG: aborting startup due to startup process failure
same thing if I empty the pg_xlog/ dir (except archive_status/)
I hope someone can explain what I did wrong.
Another thing I didn't understand is the continous replication mechanism:
will the slave server periodically call the recovery_command or do I
have to setup a cronjob to instruct it to search updates?
Sorry if these questions are stupid :) and thanks in advance.
Nico
Nico Sabbi <nsabbi@officinedigitali.it> writes:
> To begin with I followed the example of the docs: in the recovery.conf
> file of the
> slave instance I set
> restore_command = 'cp -av /var/lib/pgsql/data/pg_xlog/%f %p'
Hm, it looks like you are trying to copy xlog segments straight from the
pg_xlog directory of the master? That's not going to work, because the
master will rename/overwrite those files as soon as it thinks it's done
with them. You need to have an archive_command on the master that is
really truly copying the data to somewhere else, and then the
restore_command should copy from the somewhere else.
regards, tom lane
Tom Lane wrote: >Nico Sabbi <nsabbi@officinedigitali.it> writes: > > >>To begin with I followed the example of the docs: in the recovery.conf >>file of the >>slave instance I set >> >> > > > >>restore_command = 'cp -av /var/lib/pgsql/data/pg_xlog/%f %p' >> >> > >Hm, it looks like you are trying to copy xlog segments straight from the >pg_xlog directory of the master? That's not going to work, because the >master will rename/overwrite those files as soon as it thinks it's done >with them. You need to have an archive_command on the master that is >really truly copying the data to somewhere else, and then the >restore_command should copy from the somewhere else. > > > after having set up shared directory (one for archive_command and one for recover_command) I could successfully start and sync the slave dbms, but after the initial recovery the file recovery.conf is renamed to recovery.done and no more updates/WAL are asked (the slave's log reads: LOG: archive recovery complete LOG: system database ready ) Is there any parameter that I have to pass to the second server to keep on requesting WALs? I still don't understand what instructs the server to continously request the master's logs.
On 4/16/07, Nico Sabbi <nsabbi@officinedigitali.it> wrote: > Is there any parameter that I have to pass to the second server to keep > on requesting WALs? > I still don't understand what instructs the server to continously > request the master's logs. google pg_standby. I've set it up and it works. merlin
Merlin Moncure wrote:
> On 4/16/07, Nico Sabbi <nsabbi@officinedigitali.it> wrote:
>
>> Is there any parameter that I have to pass to the second server to keep
>> on requesting WALs?
>> I still don't understand what instructs the server to continously
>> request the master's logs.
>
>
> google pg_standby. I've set it up and it works.
>
> merlin
>
Hi,
I installed the second version of pg_standby.tar that you posted here.
In recovery.conf I set
restore_command = '/usr/local/bin/pg_standby -d -c -s 5 -w 0 /tmp/pg/ %f %p'
and I see in the log file that something goes on after every update:
Trigger file : (null)
Waiting for WAL file : /tmp/pg//000000010000000000000021
WAL file path : 000000010000000000000021
Restoring to... : pg_xlog/RECOVERYXLOG
Sleep interval : 5 seconds
Max wait interval : 0 forever
Command for restore : cp /tmp/pg//000000010000000000000021
pg_xlog/RECOVERYXLOG
running restore : success
LOG: restored file "000000010000000000000021" from archivio
but "cp /tmp/pg//000000010000000000000021 pg_xlog/RECOVERYXLOG" looks
suspicious to me.
Is it expected? should WAL files overwrite pg_xlog/RECOVERYXLOG ?
If I stop the replication and start the slave after having removed the
recovery.conf
the server doesn't come up:
LOG: aborting startup due to startup process failure
LOG: database system was interrupted while in recovery at log time
2007-04-16 15:29:42 CEST
HINT: If this has occurred more than once some data may be corrupted
and you may need to choose an earlier recovery target.
LOG: impossibile aprire il file "pg_xlog/000000010000000000000009"
(file registro 0, segmento 9): No such file or directory
LOG: invalid primary checkpoint record
LOG: impossibile aprire il file "pg_xlog/000000010000000000000021"
(file registro 0, segmento 33): No such file or directory
LOG: invalid secondary checkpoint record
PANIC: impossibile localizzare un checkpoint record valido
LOG: startup process (PID 10824) was terminated by signal 6
LOG: aborting startup due to startup process failure
infact pg_xlog/ of the slave contains only RECOVERHISTORY:
cat pg_xlog/RECOVERYHISTORY
START WAL LOCATION: 0/9000020 (file 000000010000000000000009)
STOP WAL LOCATION: 0/A000000 (file 00000001000000000000000A)
CHECKPOINT LOCATION: 0/9000020
START TIME: 2007-04-16 15:29:42 CEST
LABEL: label
STOP TIME: 2007-04-16 15:31:39 CEST
Did I do something wrong?
Thanks for your help,
Nico
On 4/16/07, Nico Sabbi <nsabbi@officinedigitali.it> wrote: > Merlin Moncure wrote: > > > On 4/16/07, Nico Sabbi <nsabbi@officinedigitali.it> wrote: > > > >> Is there any parameter that I have to pass to the second server to keep > >> on requesting WALs? > >> I still don't understand what instructs the server to continously > >> request the master's logs. > > > > > > google pg_standby. I've set it up and it works. > > > > merlin > > > > Hi, > I installed the second version of pg_standby.tar that you posted here. > > In recovery.conf I set > restore_command = '/usr/local/bin/pg_standby -d -c -s 5 -w 0 /tmp/pg/ %f %p' > > and I see in the log file that something goes on after every update: > > Trigger file : (null) > Waiting for WAL file : /tmp/pg//000000010000000000000021 > WAL file path : 000000010000000000000021 > Restoring to... : pg_xlog/RECOVERYXLOG > Sleep interval : 5 seconds > Max wait interval : 0 forever > Command for restore : cp /tmp/pg//000000010000000000000021 > pg_xlog/RECOVERYXLOG > running restore : success > LOG: restored file "000000010000000000000021" from archivio > > > but "cp /tmp/pg//000000010000000000000021 pg_xlog/RECOVERYXLOG" looks > suspicious to me. > > Is it expected? should WAL files overwrite pg_xlog/RECOVERYXLOG ? > If I stop the replication and start the slave after having removed the > recovery.conf > the server doesn't come up: > > LOG: aborting startup due to startup process failure > LOG: database system was interrupted while in recovery at log time > 2007-04-16 15:29:42 CEST > HINT: If this has occurred more than once some data may be corrupted > and you may need to choose an earlier recovery target. > LOG: impossibile aprire il file "pg_xlog/000000010000000000000009" > (file registro 0, segmento 9): No such file or directory > LOG: invalid primary checkpoint record > LOG: impossibile aprire il file "pg_xlog/000000010000000000000021" > (file registro 0, segmento 33): No such file or directory > LOG: invalid secondary checkpoint record > PANIC: impossibile localizzare un checkpoint record valido > LOG: startup process (PID 10824) was terminated by signal 6 > LOG: aborting startup due to startup process failure > > > infact pg_xlog/ of the slave contains only RECOVERHISTORY: > > cat pg_xlog/RECOVERYHISTORY > START WAL LOCATION: 0/9000020 (file 000000010000000000000009) > STOP WAL LOCATION: 0/A000000 (file 00000001000000000000000A) > CHECKPOINT LOCATION: 0/9000020 > START TIME: 2007-04-16 15:29:42 CEST > LABEL: label > STOP TIME: 2007-04-16 15:31:39 CEST > > Did I do something wrong? > Thanks for your help, try link mode, not copy mode (-l). make sure you read about the -k switch. merlin
Merlin Moncure wrote: > > > try link mode, not copy mode (-l). make sure you read about the -k > switch. > > merlin > replaced -c with -l, but the result was the same. I assume that with -k you mean the socket dir for postgres - since pg_standby doesn't seek to recognize that switch - but I didn't need it: I connected to -p 5433. Anyway, after having stopped the replication I had the same problem; in order to make the slave start I had to manually copy the log files from the archive to pg_xlog/ . Is pg_standby supposed to work with Postgres 8.2 ? or only with 8.1.?
Nico Sabbi wrote:
> Merlin Moncure wrote:
>
>>
>>
>> try link mode, not copy mode (-l). make sure you read about the -k
>> switch.
>>
>> merlin
>>
> replaced -c with -l, but the result was the same.
> I assume that with -k you mean the socket dir for postgres - since
> pg_standby doesn't
> seek to recognize that switch - but I didn't need it: I connected to
> -p 5433.
>
> Anyway, after having stopped the replication I had the same problem;
> in order to make the slave start I had to manually copy the log files
> from the
> archive to pg_xlog/ .
>
> Is pg_standby supposed to work with Postgres 8.2 ? or only with 8.1.?
>
>
update: touching the right trigger makes the replication stop, thus I
can connect to the server
and see the updated data.
Thanks a lot,
Nico