Streaming replication

Поиск
Список
Период
Сортировка
От Mike Howland
Тема Streaming replication
Дата
Msg-id CAGAqBGL2CysUbrKhoPWBMppmRPvVV+NUQNjnikr2o5qbGP8Etg@mail.gmail.com
обсуждение исходный текст
Ответы Re: Streaming replication  (Laurenz Albe <laurenz.albe@cybertec.at>)
Список pgsql-novice
Hello all, I have a question that perhaps somebody can help me with. I'm using streaming replication with 9.2 (I know it's quite old now, we plan to move to 9.6 shortly but I'd like to get these older machines working if I can). I have two machines where the secondary is configured as a backup of the primary machine. I used the instructions here: https://wiki.postgresql.org/wiki/Streaming_Replication to establish streaming replication... BTW, I'm using the 7.1 path (using backup/rsync/!backup) to prime the files on the secondary machine. Most of the time it comes up, connects and is quite happy. For my own sanity I have a bit of monitoring code that occasionally injects a change on the primary and watches for it to arrive on the secondary to ensure data is flowing. I also have some code that checks to see how close things are using pg_current_xlog_location() and pg_last_xlog_receive_location(). When things are working, these checks are always fine. Once in a while, this nonce value does not seem to be updated on the secondary server (although I can see it change on the primary). During this period of time, my watcher code also detects that pg_last_xlog_receive_location() is ahead of pg_current_xlog_location(). From what I can tell, the secondary has connected to the primary, it's just data does not seem to be flowing. I'll attach a bit of detal below. I'm sure I must be doing something wrong when I prime the secondary database. It works most of the time but that's simply not good enough. I'd buy a virtual coffee for anybody that can show me the error of my ways here... Any ideas? A bit of output/background that may be useful: Output from watch dog code output (running on the secondary). It queries the local (secondary) cluster and also connects to the primary cluster to insert a nonce and gather some information. The nonce is just a counting int. When the secondary is started, it starts counting at 0 and increases every few minutes. As you can see, the data in the secondary is still at 18 (from a previous switch). This is very odd because the primary cluster was rsync'ed shortly before this. last xlog receive 0/5000000 current xlog location: 0/41A3008 WARNING: Secondary appears ahead of primary: WARNING!! nonce match comparison failed: 2 != 18 dog: nonce did not compare, but it's not been very long yet. The sequence I'm using to prime the secondary: 0) Request that the secondary be shutdown and internally backed up. 1) Without backup being set (just to move the bulk of the files) rsync -a -v --rsh ssh -o StrictHostKeyChecking=no -o BatchMode=yes --delete --exclude pg_log --exclude recovery.conf --exclude backup_label --exclude pg_hba.conf --exclude postmaster.pid --exclude postmaster.opts --exclude postgresql.conf /db/pg/general 77.66.27.224:/db/pg 2) Enter backup mode SELECT pg_start_backup('sync-backup', true); 3) Rsync again, hopefully quicker than #1. I wanted to minimize the time it spent in backup. rsync -a -v --rsh ssh -o StrictHostKeyChecking=no -o BatchMode=yes --delete --exclude pg_log --exclude recovery.conf --exclude backup_label --exclude pg_hba.conf --exclude postmaster.pid --exclude postmaster.opts --exclude postgresql.conf /db/pg/general 77.66.27.224:/db/pg 4) Exit backup mode SELECT pg_stop_backup(); 5) Rsync pg_xlog files rsync -a -v --rsh ssh -o StrictHostKeyChecking=no -o BatchMode=yes --delete /db/pg/general/pg_xlog 77.66.27.224:/db/pg/general 6) Rsync archive files to secondary rsync -a -v --rsh ssh -o StrictHostKeyChecking=no -o BatchMode=yes --delete --exclude syncInProgress --exclude archiveScript /db/pg/general.archive 77.66.27.224:/db/pg 7) Tell the secondary server to start up Output from postgres on the secondary server when it starts. I've tried looking into the cannot stat issues without much success. Note that in situations where it does work, it looks just like this as well (complete with the zero length warning, etc). LOG: database system was shut down in recovery at 2017-11-29 06:43:13 GMT cp: cannot stat '{$0}{$8}/db/pg/general.archive/00000002.history'{$0}{$9}: No such file or directory LOG: entering standby mode cp: cannot stat '{$0}{$8}/pg/general.archive/000000010000000000000005'{$0}{$9}: No such file or directory LOG: consistent recovery state reached at 0/5000080 LOG: record with zero length at 0/5000080 LOG: database system is ready to accept read only connections cp: cannot stat '{$0}{$8}/db/pg/general.archive/000000010000000000000005'{$0}{$9}: No such file or directory cp: cannot stat '{$0}{$8}/db/pg/general.archive/00000002.history'{$0}{$9}: No such file or directory LOG: streaming replication successfully connected to primary Any thoughts at all are appreciated! Thanks, -mike

В списке pgsql-novice по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: Novice question about users and...rights?
Следующее
От: Laurenz Albe
Дата:
Сообщение: Re: Streaming replication