Обсуждение: replication stops working

Поиск
Список
Период
Сортировка

replication stops working

От
John DeSoi
Дата:
I have a 9.2 hot standby setup with replication via rsync. For the second time, it has stopped working with no apparent
erroron the primary or standby. Last time this happened I fixed it by restarting the primary. Yesterday I started a new
basebackup around noon and it replicated without any problems for about 12 hours. Then it just stopped and I don't see
anyerrors in the Postgres log (primary or standby). I looked at other system logs and still don't see any problems. 

I'm running Postgres 9.2.4 on CentOS 6.4. Thanks for any ideas or debug suggestions.


John DeSoi, Ph.D.


=====

wal_level = hot_standby
wal_keep_segments = 48
max_wal_senders = 2

archive_mode = on
archive_command = 'rsync --whole-file --ignore-existing --delete-after -a %p bak-postgres:/pgbackup/%f'
archive_timeout = 300



Re: replication stops working

От
"Daniel Serodio (lists)"
Дата:
John DeSoi wrote:
> I have a 9.2 hot standby setup with replication via rsync. For the second time, it has stopped working with no
apparenterror on the primary or standby. Last time this happened I fixed it by restarting the primary. Yesterday I
starteda new base backup around noon and it replicated without any problems for about 12 hours. Then it just stopped
andI don't see any errors in the Postgres log (primary or standby). I looked at other system logs and still don't see
anyproblems. 
>
> I'm running Postgres 9.2.4 on CentOS 6.4. Thanks for any ideas or debug suggestions.
>
> John DeSoi, Ph.D.
>
> =====
>
> wal_level = hot_standby
> wal_keep_segments = 48
> max_wal_senders = 2
>
> archive_mode = on
> archive_command = 'rsync --whole-file --ignore-existing --delete-after -a %p bak-postgres:/pgbackup/%f'
> archive_timeout = 300
>
If there are no errors in the log, how did you conclude that replication
has stopped working? Since you're using a hot standby, you've also setup
streaming replication in addition to the WAL archiving, correct?

Regards,
Daniel Serodio



Re: replication stops working

От
John DeSoi
Дата:
On Jul 8, 2013, at 5:41 PM, Daniel Serodio (lists) <daniel.lists@mandic.com.br> wrote:

> If there are no errors in the log, how did you conclude that replication has stopped working? Since you're using a
hotstandby, you've also setup streaming replication in addition to the WAL archiving, correct? 

I have an external process that calls pg_last_xact_replay_timestamp and sends an alert if the standby is more than 20
minutesout of sync.  

I'm not using streaming replication, just WAL archiving at 5 minute intervals.

I just tried to restart the primary to fix it and it would not shut down. There should not have been any active
connections.I finally had to power off the VM.  

I think what might be happening is that rsync is hanging when trying to send a WAL file. That might explain no error in
thelog and difficulty stopping the server. I added a timeout to the archive command; hopefully this will fix it. 

John DeSoi, Ph.D.



2013-07-08 21:06:02 EDT [27170]: [1-1] user=main,db=main8,remote=127.0.0.1(62194) FATAL:  the database system is
shuttingdown 
2013-07-08 21:07:29 EDT [27189]: [1-1] user=postgres,db=postgres,remote=127.0.0.1(62195) FATAL:  the database system is
shuttingdown 
2013-07-08 21:07:51 EDT [27190]: [1-1] user=postgres,db=postgres,remote=127.0.0.1(62196) FATAL:  the database system is
shuttingdown 
2013-07-08 21:09:42 EDT [27275]: [1-1] user=postgres,db=postgres,remote=[local] FATAL:  the database system is shutting
down
2013-07-08 21:11:03 EDT [27363]: [1-1] user=[unknown],db=[unknown],remote=127.0.0.1(62199) LOG:  incomplete startup
packet
2013-07-08 21:11:03 EDT [27364]: [1-1] user=main,db=main8,remote=127.0.0.1(62200) FATAL:  the database system is
shuttingdown 
Killed by signal 15.