Обсуждение: PG 9.3 Switch streaming to wal shipping

Поиск
Список
Период
Сортировка

PG 9.3 Switch streaming to wal shipping

От
Andy Colson
Дата:
Hi All.

I have a slave that was using streaming replication, and all was well,
but its far away (remote data center), and its using too much bandwidth
at peak times.

I'm switching it to wal shipping at night.  I don't really need it
constantly up to date, nightly is good enough.

In my recovery.conf, I disabled the primary_conninfo, and have:

restore_command = 'cp /pub/archive/%f %p'
archive_cleanup_command = 'pg_archivecleanup /pub/archive/ %r'

I restart PG, and all seems well.  It reached a consisten state and
allows read-only queries.  Cool.  Except its spamming the log with lines
like:
Sep 29 15:28:13 webserv postgres[17735]: [125-1] ,,2014-09-29
15:28:13.329 CDT,: LOG:  restored log file "00000002000000B900000015"
from archive
Sep 29 15:28:18 webserv postgres[17735]: [126-1] ,,2014-09-29
15:28:18.358 CDT,: LOG:  restored log file "00000002000000B900000015"
from archive
Sep 29 15:28:23 webserv postgres[17735]: [127-1] ,,2014-09-29
15:28:23.387 CDT,: LOG:  restored log file "00000002000000B900000015"
from archive


The answer to that, I thought was pg_standby.  So I changed:

restore_command = 'pg_standby -d /pub/archive %f %p  2>>/tmp/standby.log'

I restart PG, but now it never reaches consistency, wont allow queries,
and /tmp/standbylog show's its trying to start with an old file:


Keep archive history: no cleanup required
running restore:      OK
Trigger file:         <not set>
Waiting for WAL file: 00000002000000B900000004
WAL file path:        /pub/archive/00000002000000B900000004
Restoring to:         pg_xlog/RECOVERYXLOG
Sleep interval:       5 seconds
Max wait interval:    0 forever
Command for restore:  cp "/pub/archive/00000002000000B900000004"
"pg_xlog/RECOVERYXLOG"
Keep archive history: no cleanup required
WAL file not present yet.

00000002000000B900000004 doesn't exist, I've already moved past it.
(Also, note, the first log I posted above accessing
00000002000000B900000015 is much later after me messing around some more)


I understand why cp is spamming the log, I guess, but I don't understand
why I cant just replace it with pg_standby.

Any hints on how to set this up?

Thanks for your time.

-Andy


Re: PG 9.3 Switch streaming to wal shipping

От
Andy Colson
Дата:
Crap!  Is this a problem?!

I switched back to cp, all was going well, here are some logs:

Sep 29 16:07:10 webserv postgres[17735]: [590-1] ,,2014-09-29
16:07:10.888 CDT,: LOG:  restored log file "00000002000000B900000023"
from archive
Sep 29 16:07:13 webserv postgres[17734]: [3-1] ,,2014-09-29 16:07:13.971
CDT,: LOG:  received fast shutdown request
Sep 29 16:07:13 webserv postgres[17734]: [4-1] ,,2014-09-29 16:07:13.971
CDT,: LOG:  aborting any active transactions
Sep 29 16:07:13 webserv postgres[17739]: [2-1] ,,2014-09-29 16:07:13.995
CDT,: LOG:  shutting down
Sep 29 16:07:13 webserv postgres[17739]: [3-1] ,,2014-09-29 16:07:13.995
CDT,: LOG:  database system is shut down

So it was at 00000002000000B900000023.

I switched recovery.conf to:

restore_command = '/usr/local/pg93/bin/pg_standby -d /pub/archive %f %p
  2>>/tmp/standby.log'

and restart PG.  PG log shows:
Sep 29 16:08:56 webserv postgres[19054]: [2-1] ,,2014-09-29 16:08:56.002
CDT,: LOG:  database system was shut down in recovery at 2014-09-29
16:07:13 CDT
Sep 29 16:08:56 webserv postgres[19054]: [3-1] ,,2014-09-29 16:08:56.002
CDT,: LOG:  entering standby mode
Sep 29 16:08:56 webserv postgres[19054]: [4-1] ,,2014-09-29 16:08:56.017
CDT,: LOG:  restored log file "00000002.history" from archive
Sep 29 16:08:56 webserv postgres[19054]: [5-1] ,,2014-09-29 16:08:56.042
CDT,: LOG:  restored log file "00000002000000B900000015" from archive

I was at 23!  Did it really replay 15?  How bad is that?
/tmp/standby.log makes no sense at all:

Trigger file:         <not set>
Waiting for WAL file: 00000002.history
WAL file path:        /pub/archive/00000002.history
Restoring to:         pg_xlog/RECOVERYHISTORY
Sleep interval:       5 seconds
Max wait interval:    0 forever
Command for restore:  cp "/pub/archive/00000002.history"
"pg_xlog/RECOVERYHISTORY"
Keep archive history: no cleanup required
running restore:      OK
Trigger file:         <not set>
Waiting for WAL file: 00000002000000B900000015
WAL file path:        /pub/archive/00000002000000B900000015
Restoring to:         pg_xlog/RECOVERYXLOG
Sleep interval:       5 seconds
Max wait interval:    0 forever
Command for restore:  cp "/pub/archive/00000002000000B900000015"
"pg_xlog/RECOVERYXLOG"
Keep archive history: no cleanup required
running restore:      OK
Trigger file:         <not set>
Waiting for WAL file: 00000002000000B900000006
WAL file path:        /pub/archive/00000002000000B900000006
Restoring to:         pg_xlog/RECOVERYXLOG
Sleep interval:       5 seconds
Max wait interval:    0 forever
Command for restore:  cp "/pub/archive/00000002000000B900000006"
"pg_xlog/RECOVERYXLOG"
Keep archive history: no cleanup required
WAL file not present yet.
WAL file not present yet.
WAL file not present yet.


Why did it jump from 15 back to 6?  Why did it even start at 15?  Am I
hosed at this point?  I really don't want to make another base backup.

-Andy




Re: PG 9.3 Switch streaming to wal shipping

От
Adrian Klaver
Дата:
On 09/29/2014 02:17 PM, Andy Colson wrote:
> Crap!  Is this a problem?!
>
> I switched back to cp, all was going well, here are some logs:
>
> Sep 29 16:07:10 webserv postgres[17735]: [590-1] ,,2014-09-29
> 16:07:10.888 CDT,: LOG:  restored log file "00000002000000B900000023"
> from archive
> Sep 29 16:07:13 webserv postgres[17734]: [3-1] ,,2014-09-29 16:07:13.971
> CDT,: LOG:  received fast shutdown request
> Sep 29 16:07:13 webserv postgres[17734]: [4-1] ,,2014-09-29 16:07:13.971
> CDT,: LOG:  aborting any active transactions
> Sep 29 16:07:13 webserv postgres[17739]: [2-1] ,,2014-09-29 16:07:13.995
> CDT,: LOG:  shutting down
> Sep 29 16:07:13 webserv postgres[17739]: [3-1] ,,2014-09-29 16:07:13.995
> CDT,: LOG:  database system is shut down
>
> So it was at 00000002000000B900000023.
>
> I switched recovery.conf to:
>
> restore_command = '/usr/local/pg93/bin/pg_standby -d /pub/archive %f %p
>   2>>/tmp/standby.log'
>
> and restart PG.  PG log shows:
> Sep 29 16:08:56 webserv postgres[19054]: [2-1] ,,2014-09-29 16:08:56.002
> CDT,: LOG:  database system was shut down in recovery at 2014-09-29
> 16:07:13 CDT
> Sep 29 16:08:56 webserv postgres[19054]: [3-1] ,,2014-09-29 16:08:56.002
> CDT,: LOG:  entering standby mode
> Sep 29 16:08:56 webserv postgres[19054]: [4-1] ,,2014-09-29 16:08:56.017
> CDT,: LOG:  restored log file "00000002.history" from archive
> Sep 29 16:08:56 webserv postgres[19054]: [5-1] ,,2014-09-29 16:08:56.042
> CDT,: LOG:  restored log file "00000002000000B900000015" from archive
>
> I was at 23!  Did it really replay 15?  How bad is that?
> /tmp/standby.log makes no sense at all:
>
> Trigger file:         <not set>
> Waiting for WAL file: 00000002.history
> WAL file path:        /pub/archive/00000002.history
> Restoring to:         pg_xlog/RECOVERYHISTORY
> Sleep interval:       5 seconds
> Max wait interval:    0 forever
> Command for restore:  cp "/pub/archive/00000002.history"
> "pg_xlog/RECOVERYHISTORY"
> Keep archive history: no cleanup required
> running restore:      OK
> Trigger file:         <not set>
> Waiting for WAL file: 00000002000000B900000015
> WAL file path:        /pub/archive/00000002000000B900000015
> Restoring to:         pg_xlog/RECOVERYXLOG
> Sleep interval:       5 seconds
> Max wait interval:    0 forever
> Command for restore:  cp "/pub/archive/00000002000000B900000015"
> "pg_xlog/RECOVERYXLOG"
> Keep archive history: no cleanup required
> running restore:      OK
> Trigger file:         <not set>
> Waiting for WAL file: 00000002000000B900000006
> WAL file path:        /pub/archive/00000002000000B900000006
> Restoring to:         pg_xlog/RECOVERYXLOG
> Sleep interval:       5 seconds
> Max wait interval:    0 forever
> Command for restore:  cp "/pub/archive/00000002000000B900000006"
> "pg_xlog/RECOVERYXLOG"
> Keep archive history: no cleanup required
> WAL file not present yet.
> WAL file not present yet.
> WAL file not present yet.
>
>
> Why did it jump from 15 back to 6?  Why did it even start at 15?  Am I
> hosed at this point?  I really don't want to make another base backup.

Not sure. Some observations:

In the above you have a .history file which seems to indicate you have
wandered into timelines:

http://www.postgresql.org/docs/9.3/static/continuous-archiving.html

24.3.5. Timelines

You might want to look at the pg_xlog directory on the master and the
archive directory to see if the WAL file numbers match what you are seeing.

>
> -Andy
>
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: PG 9.3 Switch streaming to wal shipping

От
Andy Colson
Дата:
On 09/29/2014 05:16 PM, Adrian Klaver wrote:
> On 09/29/2014 02:17 PM, Andy Colson wrote:
>> Crap!  Is this a problem?!
>>
>> I switched back to cp, all was going well, here are some logs:
>>
>> Sep 29 16:07:10 webserv postgres[17735]: [590-1] ,,2014-09-29
>> 16:07:10.888 CDT,: LOG:  restored log file "00000002000000B900000023"
>> from archive
>> Sep 29 16:07:13 webserv postgres[17734]: [3-1] ,,2014-09-29 16:07:13.971
>> CDT,: LOG:  received fast shutdown request
>> Sep 29 16:07:13 webserv postgres[17734]: [4-1] ,,2014-09-29 16:07:13.971
>> CDT,: LOG:  aborting any active transactions
>> Sep 29 16:07:13 webserv postgres[17739]: [2-1] ,,2014-09-29 16:07:13.995
>> CDT,: LOG:  shutting down
>> Sep 29 16:07:13 webserv postgres[17739]: [3-1] ,,2014-09-29 16:07:13.995
>> CDT,: LOG:  database system is shut down
>>
>> So it was at 00000002000000B900000023.
>>
>> I switched recovery.conf to:
>>
>> restore_command = '/usr/local/pg93/bin/pg_standby -d /pub/archive %f %p
>>   2>>/tmp/standby.log'
>>
>> and restart PG.  PG log shows:
>> Sep 29 16:08:56 webserv postgres[19054]: [2-1] ,,2014-09-29 16:08:56.002
>> CDT,: LOG:  database system was shut down in recovery at 2014-09-29
>> 16:07:13 CDT
>> Sep 29 16:08:56 webserv postgres[19054]: [3-1] ,,2014-09-29 16:08:56.002
>> CDT,: LOG:  entering standby mode
>> Sep 29 16:08:56 webserv postgres[19054]: [4-1] ,,2014-09-29 16:08:56.017
>> CDT,: LOG:  restored log file "00000002.history" from archive
>> Sep 29 16:08:56 webserv postgres[19054]: [5-1] ,,2014-09-29 16:08:56.042
>> CDT,: LOG:  restored log file "00000002000000B900000015" from archive
>>
>> I was at 23!  Did it really replay 15?  How bad is that?
>> /tmp/standby.log makes no sense at all:
>>
>> Trigger file:         <not set>
>> Waiting for WAL file: 00000002.history
>> WAL file path:        /pub/archive/00000002.history
>> Restoring to:         pg_xlog/RECOVERYHISTORY
>> Sleep interval:       5 seconds
>> Max wait interval:    0 forever
>> Command for restore:  cp "/pub/archive/00000002.history"
>> "pg_xlog/RECOVERYHISTORY"
>> Keep archive history: no cleanup required
>> running restore:      OK
>> Trigger file:         <not set>
>> Waiting for WAL file: 00000002000000B900000015
>> WAL file path:        /pub/archive/00000002000000B900000015
>> Restoring to:         pg_xlog/RECOVERYXLOG
>> Sleep interval:       5 seconds
>> Max wait interval:    0 forever
>> Command for restore:  cp "/pub/archive/00000002000000B900000015"
>> "pg_xlog/RECOVERYXLOG"
>> Keep archive history: no cleanup required
>> running restore:      OK
>> Trigger file:         <not set>
>> Waiting for WAL file: 00000002000000B900000006
>> WAL file path:        /pub/archive/00000002000000B900000006
>> Restoring to:         pg_xlog/RECOVERYXLOG
>> Sleep interval:       5 seconds
>> Max wait interval:    0 forever
>> Command for restore:  cp "/pub/archive/00000002000000B900000006"
>> "pg_xlog/RECOVERYXLOG"
>> Keep archive history: no cleanup required
>> WAL file not present yet.
>> WAL file not present yet.
>> WAL file not present yet.
>>
>>
>> Why did it jump from 15 back to 6?  Why did it even start at 15?  Am I
>> hosed at this point?  I really don't want to make another base backup.
>
> Not sure. Some observations:
>
> In the above you have a .history file which seems to indicate you have wandered into timelines:
>
> http://www.postgresql.org/docs/9.3/static/continuous-archiving.html
>
> 24.3.5. Timelines
>
> You might want to look at the pg_xlog directory on the master and the archive directory to see if the WAL file
numbersmatch what you are seeing. 
>
>>
>> -Andy
>>
>>
>>
>>
>
>

Yes, the timeline is correct.  A while back I had to promote a slave as a new master, so this should be my 2nd
timeline.  (I almost feel like a time lord) 

And yes, the master pg_xlog is full of 00000002000000B90* files.

-Andy