Обсуждение: replication stops working
I have a 9.2 hot standby setup with replication via rsync. For the second time, it has stopped working with no apparent erroron the primary or standby. Last time this happened I fixed it by restarting the primary. Yesterday I started a new basebackup around noon and it replicated without any problems for about 12 hours. Then it just stopped and I don't see anyerrors in the Postgres log (primary or standby). I looked at other system logs and still don't see any problems. I'm running Postgres 9.2.4 on CentOS 6.4. Thanks for any ideas or debug suggestions. John DeSoi, Ph.D. ===== wal_level = hot_standby wal_keep_segments = 48 max_wal_senders = 2 archive_mode = on archive_command = 'rsync --whole-file --ignore-existing --delete-after -a %p bak-postgres:/pgbackup/%f' archive_timeout = 300
John DeSoi wrote: > I have a 9.2 hot standby setup with replication via rsync. For the second time, it has stopped working with no apparenterror on the primary or standby. Last time this happened I fixed it by restarting the primary. Yesterday I starteda new base backup around noon and it replicated without any problems for about 12 hours. Then it just stopped andI don't see any errors in the Postgres log (primary or standby). I looked at other system logs and still don't see anyproblems. > > I'm running Postgres 9.2.4 on CentOS 6.4. Thanks for any ideas or debug suggestions. > > John DeSoi, Ph.D. > > ===== > > wal_level = hot_standby > wal_keep_segments = 48 > max_wal_senders = 2 > > archive_mode = on > archive_command = 'rsync --whole-file --ignore-existing --delete-after -a %p bak-postgres:/pgbackup/%f' > archive_timeout = 300 > If there are no errors in the log, how did you conclude that replication has stopped working? Since you're using a hot standby, you've also setup streaming replication in addition to the WAL archiving, correct? Regards, Daniel Serodio
On Jul 8, 2013, at 5:41 PM, Daniel Serodio (lists) <daniel.lists@mandic.com.br> wrote: > If there are no errors in the log, how did you conclude that replication has stopped working? Since you're using a hotstandby, you've also setup streaming replication in addition to the WAL archiving, correct? I have an external process that calls pg_last_xact_replay_timestamp and sends an alert if the standby is more than 20 minutesout of sync. I'm not using streaming replication, just WAL archiving at 5 minute intervals. I just tried to restart the primary to fix it and it would not shut down. There should not have been any active connections.I finally had to power off the VM. I think what might be happening is that rsync is hanging when trying to send a WAL file. That might explain no error in thelog and difficulty stopping the server. I added a timeout to the archive command; hopefully this will fix it. John DeSoi, Ph.D. 2013-07-08 21:06:02 EDT [27170]: [1-1] user=main,db=main8,remote=127.0.0.1(62194) FATAL: the database system is shuttingdown 2013-07-08 21:07:29 EDT [27189]: [1-1] user=postgres,db=postgres,remote=127.0.0.1(62195) FATAL: the database system is shuttingdown 2013-07-08 21:07:51 EDT [27190]: [1-1] user=postgres,db=postgres,remote=127.0.0.1(62196) FATAL: the database system is shuttingdown 2013-07-08 21:09:42 EDT [27275]: [1-1] user=postgres,db=postgres,remote=[local] FATAL: the database system is shutting down 2013-07-08 21:11:03 EDT [27363]: [1-1] user=[unknown],db=[unknown],remote=127.0.0.1(62199) LOG: incomplete startup packet 2013-07-08 21:11:03 EDT [27364]: [1-1] user=main,db=main8,remote=127.0.0.1(62200) FATAL: the database system is shuttingdown Killed by signal 15.