Hello list!
I'm using 9.1.5 on Ubuntu 11.10, in a streaming replication scenario. On my slave, recovery.conf states:
standby_mode = on
restore_command = '/usr/local/omnipitr/bin/omnipitr-restore -D /var/lib/postgresql/9.1/main/ --source
gzip=/data/dbanalytics-wal/--remove-unneeded --temp-dir /var/tmp/omnipitr -l /var/log/omnipitr/restore-^Y-^m-^d.log
--streaming-replication--verbose --error-pgcontroldata hang "%f" "%p"'
archive_cleanup_command = '/usr/local/omnipitr/bin/omnipitr-cleanup --verbose --log
/var/log/omnipitr/cleanup-^Y-^m-^d.log--archive gzip=/data/dbanalytics-wal/ "%r"'
primary_conninfo = 'host=master port=5432 user=dbrepl password=password'
I ran out of disk space on the slave, because the archived WAL records were not removed. The documentation for
archive_cleanup_commandstates[1]:
"This optional parameter specifies a shell command that will be executed at every restartpoint."
The slave's last reference to a restart point was:
2012-11-28 20:18:08.279 UTC - @ 20910 (00000) 2012-11-27 21:10:01 UTC - LOG: restartpoint complete: wrote 58947
buffers(22.5%); 0 transaction log file(s) added, 0 removed, 103 recycled; write=539.537 s, sync=0.088 s, total=540.221
s;sync files=122, longest=0.087 s, ave
rage=0.000 s
2012-11-28 20:18:08.279 UTC - @ 20910 (00000) 2012-11-27 21:10:01 UTC - LOG: recovery restart point at E8F/F71DA1E0
2012-11-28 20:18:08.279 UTC - @ 20910 (00000) 2012-11-27 21:10:01 UTC - DETAIL: last completed transaction was at log
time2012-11-28 17:15:51.275427+00
2012-11-28 21:19:24.245 UTC - svanalytics@svanalytics_staging 3476 (00000) 2012-11-28 19:55:50 UTC - LOG: duration:
4984689.535ms statement: SELECT ...
OmniPITR's restore log ends with:
2012-11-28 13:54:13.396142 +0000 : 26378 : omnipitr-restore : LOG : Timer [Copying segment 0000000200000E8E00000069 to
/var/lib/postgresql/9.1/main/pg_xlog/RECOVERYXLOG]took: 0.238s
2012-11-28 13:54:13.485916 +0000 : 26378 : omnipitr-restore : LOG : Segment 0000000200000E8E00000069 restored
2012-11-28 13:54:13.787225 +0000 : 26384 : omnipitr-restore : LOG : Called with parameters: -D
/var/lib/postgresql/9.1/main/--source gzip=/data/dbanalytics-wal/ --remove-unneeded --temp-dir /var/tmp/omnipitr -l
/var/log/omnipitr/restore-^Y-^m-^d.log--streaming-replication --verbose --error-pgcontroldata hang
0000000200000E8E0000006Apg_xlog/RECOVERYXLOG
2012-11-28 13:54:13.802772 +0000 : 26384 : omnipitr-restore : FATAL : Requested file does not exist, and it is
streamingreplication environment. Dying.
And OmniPITR's cleanup log ends with:
2012-11-28 20:18:11.237740 +0000 : 10384 : omnipitr-cleanup : LOG : Segment 0000000200000E8F000000F2.gz removed.
2012-11-28 20:18:11.256186 +0000 : 10384 : omnipitr-cleanup : LOG : Segment 0000000200000E8F000000F3.gz removed.
2012-11-28 20:18:11.258942 +0000 : 10384 : omnipitr-cleanup : LOG : Segment 0000000200000E8F000000F4.gz removed.
2012-11-28 20:18:11.261542 +0000 : 10384 : omnipitr-cleanup : LOG : Segment 0000000200000E8F000000F5.gz removed.
2012-11-28 20:18:11.263758 +0000 : 10384 : omnipitr-cleanup : LOG : Segment 0000000200000E8F000000F6.gz removed.
2012-11-28 20:18:11.265259 +0000 : 10384 : omnipitr-cleanup : LOG : 34 segments removed.
It's 2012-12-30 14:35 UTC on the machine.
How come no new restart points were achieved? I had 4008 WAL archives on my slave. I expected them to be removed as
streamingreplication progressed. Are restart points prevented while long queries are running?
Thanks!
François Beausoleil
[1]: http://www.postgresql.org/docs/current/static/archive-recovery-settings.html#ARCHIVE-CLEANUP-COMMAND