Обсуждение: PG 9.3 Switch streaming to wal shipping
Hi All. I have a slave that was using streaming replication, and all was well, but its far away (remote data center), and its using too much bandwidth at peak times. I'm switching it to wal shipping at night. I don't really need it constantly up to date, nightly is good enough. In my recovery.conf, I disabled the primary_conninfo, and have: restore_command = 'cp /pub/archive/%f %p' archive_cleanup_command = 'pg_archivecleanup /pub/archive/ %r' I restart PG, and all seems well. It reached a consisten state and allows read-only queries. Cool. Except its spamming the log with lines like: Sep 29 15:28:13 webserv postgres[17735]: [125-1] ,,2014-09-29 15:28:13.329 CDT,: LOG: restored log file "00000002000000B900000015" from archive Sep 29 15:28:18 webserv postgres[17735]: [126-1] ,,2014-09-29 15:28:18.358 CDT,: LOG: restored log file "00000002000000B900000015" from archive Sep 29 15:28:23 webserv postgres[17735]: [127-1] ,,2014-09-29 15:28:23.387 CDT,: LOG: restored log file "00000002000000B900000015" from archive The answer to that, I thought was pg_standby. So I changed: restore_command = 'pg_standby -d /pub/archive %f %p 2>>/tmp/standby.log' I restart PG, but now it never reaches consistency, wont allow queries, and /tmp/standbylog show's its trying to start with an old file: Keep archive history: no cleanup required running restore: OK Trigger file: <not set> Waiting for WAL file: 00000002000000B900000004 WAL file path: /pub/archive/00000002000000B900000004 Restoring to: pg_xlog/RECOVERYXLOG Sleep interval: 5 seconds Max wait interval: 0 forever Command for restore: cp "/pub/archive/00000002000000B900000004" "pg_xlog/RECOVERYXLOG" Keep archive history: no cleanup required WAL file not present yet. 00000002000000B900000004 doesn't exist, I've already moved past it. (Also, note, the first log I posted above accessing 00000002000000B900000015 is much later after me messing around some more) I understand why cp is spamming the log, I guess, but I don't understand why I cant just replace it with pg_standby. Any hints on how to set this up? Thanks for your time. -Andy
Crap! Is this a problem?! I switched back to cp, all was going well, here are some logs: Sep 29 16:07:10 webserv postgres[17735]: [590-1] ,,2014-09-29 16:07:10.888 CDT,: LOG: restored log file "00000002000000B900000023" from archive Sep 29 16:07:13 webserv postgres[17734]: [3-1] ,,2014-09-29 16:07:13.971 CDT,: LOG: received fast shutdown request Sep 29 16:07:13 webserv postgres[17734]: [4-1] ,,2014-09-29 16:07:13.971 CDT,: LOG: aborting any active transactions Sep 29 16:07:13 webserv postgres[17739]: [2-1] ,,2014-09-29 16:07:13.995 CDT,: LOG: shutting down Sep 29 16:07:13 webserv postgres[17739]: [3-1] ,,2014-09-29 16:07:13.995 CDT,: LOG: database system is shut down So it was at 00000002000000B900000023. I switched recovery.conf to: restore_command = '/usr/local/pg93/bin/pg_standby -d /pub/archive %f %p 2>>/tmp/standby.log' and restart PG. PG log shows: Sep 29 16:08:56 webserv postgres[19054]: [2-1] ,,2014-09-29 16:08:56.002 CDT,: LOG: database system was shut down in recovery at 2014-09-29 16:07:13 CDT Sep 29 16:08:56 webserv postgres[19054]: [3-1] ,,2014-09-29 16:08:56.002 CDT,: LOG: entering standby mode Sep 29 16:08:56 webserv postgres[19054]: [4-1] ,,2014-09-29 16:08:56.017 CDT,: LOG: restored log file "00000002.history" from archive Sep 29 16:08:56 webserv postgres[19054]: [5-1] ,,2014-09-29 16:08:56.042 CDT,: LOG: restored log file "00000002000000B900000015" from archive I was at 23! Did it really replay 15? How bad is that? /tmp/standby.log makes no sense at all: Trigger file: <not set> Waiting for WAL file: 00000002.history WAL file path: /pub/archive/00000002.history Restoring to: pg_xlog/RECOVERYHISTORY Sleep interval: 5 seconds Max wait interval: 0 forever Command for restore: cp "/pub/archive/00000002.history" "pg_xlog/RECOVERYHISTORY" Keep archive history: no cleanup required running restore: OK Trigger file: <not set> Waiting for WAL file: 00000002000000B900000015 WAL file path: /pub/archive/00000002000000B900000015 Restoring to: pg_xlog/RECOVERYXLOG Sleep interval: 5 seconds Max wait interval: 0 forever Command for restore: cp "/pub/archive/00000002000000B900000015" "pg_xlog/RECOVERYXLOG" Keep archive history: no cleanup required running restore: OK Trigger file: <not set> Waiting for WAL file: 00000002000000B900000006 WAL file path: /pub/archive/00000002000000B900000006 Restoring to: pg_xlog/RECOVERYXLOG Sleep interval: 5 seconds Max wait interval: 0 forever Command for restore: cp "/pub/archive/00000002000000B900000006" "pg_xlog/RECOVERYXLOG" Keep archive history: no cleanup required WAL file not present yet. WAL file not present yet. WAL file not present yet. Why did it jump from 15 back to 6? Why did it even start at 15? Am I hosed at this point? I really don't want to make another base backup. -Andy
On 09/29/2014 02:17 PM, Andy Colson wrote: > Crap! Is this a problem?! > > I switched back to cp, all was going well, here are some logs: > > Sep 29 16:07:10 webserv postgres[17735]: [590-1] ,,2014-09-29 > 16:07:10.888 CDT,: LOG: restored log file "00000002000000B900000023" > from archive > Sep 29 16:07:13 webserv postgres[17734]: [3-1] ,,2014-09-29 16:07:13.971 > CDT,: LOG: received fast shutdown request > Sep 29 16:07:13 webserv postgres[17734]: [4-1] ,,2014-09-29 16:07:13.971 > CDT,: LOG: aborting any active transactions > Sep 29 16:07:13 webserv postgres[17739]: [2-1] ,,2014-09-29 16:07:13.995 > CDT,: LOG: shutting down > Sep 29 16:07:13 webserv postgres[17739]: [3-1] ,,2014-09-29 16:07:13.995 > CDT,: LOG: database system is shut down > > So it was at 00000002000000B900000023. > > I switched recovery.conf to: > > restore_command = '/usr/local/pg93/bin/pg_standby -d /pub/archive %f %p > 2>>/tmp/standby.log' > > and restart PG. PG log shows: > Sep 29 16:08:56 webserv postgres[19054]: [2-1] ,,2014-09-29 16:08:56.002 > CDT,: LOG: database system was shut down in recovery at 2014-09-29 > 16:07:13 CDT > Sep 29 16:08:56 webserv postgres[19054]: [3-1] ,,2014-09-29 16:08:56.002 > CDT,: LOG: entering standby mode > Sep 29 16:08:56 webserv postgres[19054]: [4-1] ,,2014-09-29 16:08:56.017 > CDT,: LOG: restored log file "00000002.history" from archive > Sep 29 16:08:56 webserv postgres[19054]: [5-1] ,,2014-09-29 16:08:56.042 > CDT,: LOG: restored log file "00000002000000B900000015" from archive > > I was at 23! Did it really replay 15? How bad is that? > /tmp/standby.log makes no sense at all: > > Trigger file: <not set> > Waiting for WAL file: 00000002.history > WAL file path: /pub/archive/00000002.history > Restoring to: pg_xlog/RECOVERYHISTORY > Sleep interval: 5 seconds > Max wait interval: 0 forever > Command for restore: cp "/pub/archive/00000002.history" > "pg_xlog/RECOVERYHISTORY" > Keep archive history: no cleanup required > running restore: OK > Trigger file: <not set> > Waiting for WAL file: 00000002000000B900000015 > WAL file path: /pub/archive/00000002000000B900000015 > Restoring to: pg_xlog/RECOVERYXLOG > Sleep interval: 5 seconds > Max wait interval: 0 forever > Command for restore: cp "/pub/archive/00000002000000B900000015" > "pg_xlog/RECOVERYXLOG" > Keep archive history: no cleanup required > running restore: OK > Trigger file: <not set> > Waiting for WAL file: 00000002000000B900000006 > WAL file path: /pub/archive/00000002000000B900000006 > Restoring to: pg_xlog/RECOVERYXLOG > Sleep interval: 5 seconds > Max wait interval: 0 forever > Command for restore: cp "/pub/archive/00000002000000B900000006" > "pg_xlog/RECOVERYXLOG" > Keep archive history: no cleanup required > WAL file not present yet. > WAL file not present yet. > WAL file not present yet. > > > Why did it jump from 15 back to 6? Why did it even start at 15? Am I > hosed at this point? I really don't want to make another base backup. Not sure. Some observations: In the above you have a .history file which seems to indicate you have wandered into timelines: http://www.postgresql.org/docs/9.3/static/continuous-archiving.html 24.3.5. Timelines You might want to look at the pg_xlog directory on the master and the archive directory to see if the WAL file numbers match what you are seeing. > > -Andy > > > > -- Adrian Klaver adrian.klaver@aklaver.com
On 09/29/2014 05:16 PM, Adrian Klaver wrote: > On 09/29/2014 02:17 PM, Andy Colson wrote: >> Crap! Is this a problem?! >> >> I switched back to cp, all was going well, here are some logs: >> >> Sep 29 16:07:10 webserv postgres[17735]: [590-1] ,,2014-09-29 >> 16:07:10.888 CDT,: LOG: restored log file "00000002000000B900000023" >> from archive >> Sep 29 16:07:13 webserv postgres[17734]: [3-1] ,,2014-09-29 16:07:13.971 >> CDT,: LOG: received fast shutdown request >> Sep 29 16:07:13 webserv postgres[17734]: [4-1] ,,2014-09-29 16:07:13.971 >> CDT,: LOG: aborting any active transactions >> Sep 29 16:07:13 webserv postgres[17739]: [2-1] ,,2014-09-29 16:07:13.995 >> CDT,: LOG: shutting down >> Sep 29 16:07:13 webserv postgres[17739]: [3-1] ,,2014-09-29 16:07:13.995 >> CDT,: LOG: database system is shut down >> >> So it was at 00000002000000B900000023. >> >> I switched recovery.conf to: >> >> restore_command = '/usr/local/pg93/bin/pg_standby -d /pub/archive %f %p >> 2>>/tmp/standby.log' >> >> and restart PG. PG log shows: >> Sep 29 16:08:56 webserv postgres[19054]: [2-1] ,,2014-09-29 16:08:56.002 >> CDT,: LOG: database system was shut down in recovery at 2014-09-29 >> 16:07:13 CDT >> Sep 29 16:08:56 webserv postgres[19054]: [3-1] ,,2014-09-29 16:08:56.002 >> CDT,: LOG: entering standby mode >> Sep 29 16:08:56 webserv postgres[19054]: [4-1] ,,2014-09-29 16:08:56.017 >> CDT,: LOG: restored log file "00000002.history" from archive >> Sep 29 16:08:56 webserv postgres[19054]: [5-1] ,,2014-09-29 16:08:56.042 >> CDT,: LOG: restored log file "00000002000000B900000015" from archive >> >> I was at 23! Did it really replay 15? How bad is that? >> /tmp/standby.log makes no sense at all: >> >> Trigger file: <not set> >> Waiting for WAL file: 00000002.history >> WAL file path: /pub/archive/00000002.history >> Restoring to: pg_xlog/RECOVERYHISTORY >> Sleep interval: 5 seconds >> Max wait interval: 0 forever >> Command for restore: cp "/pub/archive/00000002.history" >> "pg_xlog/RECOVERYHISTORY" >> Keep archive history: no cleanup required >> running restore: OK >> Trigger file: <not set> >> Waiting for WAL file: 00000002000000B900000015 >> WAL file path: /pub/archive/00000002000000B900000015 >> Restoring to: pg_xlog/RECOVERYXLOG >> Sleep interval: 5 seconds >> Max wait interval: 0 forever >> Command for restore: cp "/pub/archive/00000002000000B900000015" >> "pg_xlog/RECOVERYXLOG" >> Keep archive history: no cleanup required >> running restore: OK >> Trigger file: <not set> >> Waiting for WAL file: 00000002000000B900000006 >> WAL file path: /pub/archive/00000002000000B900000006 >> Restoring to: pg_xlog/RECOVERYXLOG >> Sleep interval: 5 seconds >> Max wait interval: 0 forever >> Command for restore: cp "/pub/archive/00000002000000B900000006" >> "pg_xlog/RECOVERYXLOG" >> Keep archive history: no cleanup required >> WAL file not present yet. >> WAL file not present yet. >> WAL file not present yet. >> >> >> Why did it jump from 15 back to 6? Why did it even start at 15? Am I >> hosed at this point? I really don't want to make another base backup. > > Not sure. Some observations: > > In the above you have a .history file which seems to indicate you have wandered into timelines: > > http://www.postgresql.org/docs/9.3/static/continuous-archiving.html > > 24.3.5. Timelines > > You might want to look at the pg_xlog directory on the master and the archive directory to see if the WAL file numbersmatch what you are seeing. > >> >> -Andy >> >> >> >> > > Yes, the timeline is correct. A while back I had to promote a slave as a new master, so this should be my 2nd timeline. (I almost feel like a time lord) And yes, the master pg_xlog is full of 00000002000000B90* files. -Andy