Обсуждение: Setting up a warm standby server - some questions

Поиск
Список
Период
Сортировка

Setting up a warm standby server - some questions

От
Thomas Kellerer
Дата:
Hi,

I'm trying to setup a warm standby server. But there are some things that I don't yet understand.
I'm testing this on my windows box to understand the concepts and while waiting for the real hardware ;)

I got the basic setup working, and a simulated failover was working, but some things are unclear to me:


What is the recommended way to switch back to the primary, once that is up and running again?

Especially assuming that data has been changed on the secondar?
Can I do a base backup of the data directory (after running pg_start_backup()) from the secondary to the primary?


What is the recommended way to get the secondary back into the "standby" mode?
Currently I simply shut it down, rename recovery.done to recovery.conf and start it up again.
Is this allright?

In the logfile that is written by pg_standby I can see some "errors" and I'm not sure if that's OK.

My recovery.conf looks like this:

restore_command = 'c:\Projects\pgtest\standby\server\bin\pg_standby.exe -l -d -s 2 -k 50 -t
c:\Projects\pgtest\standby\pgsql.triggerc:\Projects\pgtest\archive %f %p %r >>c:\Projects\pg\standby\standby.log 2>&1' 

Which is essentially copied from the manual and seems to be working.
However in the standby.log I can see messages like this:

Command for restore    : copy "c:\Projects\pg\archive\00000003.history" "pg_xlog\RECOVERYHISTORY"
Keep archive history    : 000000000000000000000000 and later
running restore        :The system cannot find the file specified.
The system cannot find the file specified.
The system cannot find the file specified.
The system cannot find the file specified.
not restored
history file not found

The "cannot find the file" or "history file not found" do not look right to me.

Although I also see "success" messages in there, like:
running restore        :        1 file(s) copied.
  OK

So what should I do about the error messages?


I have also problems shutting down the secondary server while it is in "standby" (i.e. recovery) mode
If I use "pg_ctl stop" it never shuts down.
If I use "pg_ctl stop -m immediate" most of the postgres processes terminate, but the one that spawned the pg_standby
processwill linger around. 
What am I missing here?


Thanks a lot in advance
Thomas

Re: Setting up a warm standby server - some questions

От
Thomas Kellerer
Дата:
Hello,


I could solve one of the questions myself :)
> I have also problems shutting down the secondary server while it is in
> "standby" (i.e. recovery) mode

This works fine when using "-m fast" instead of "-m immediate".


I would still like an answer on these questions:

> What is the recommended way to switch back to the primary, once that is
> up and running again?
> What is the recommended way to get the secondary back into the "standby"
> mode?


As we are trying to minimize the possible data loss, I'm looking at the archive_timeout setting (I'm currently
evaluatingour possibilities with 8.4) 

The manual at
http://www.postgresql.org/docs/current/static/continuous-archiving.html#RECOVERY-CONFIG-SETTINGS
states:

     "It is therefore unwise to set a very short archive_timeout  — it will bloat your archive storage.
     archive_timeout settings of a minute or so are usually reasonable"

But on the other hand
http://www.postgresql.org/docs/current/static/warm-standby.html
states:

   "The length of the window of data loss can be limited by use of the archive_timeout parameter, which can
   be set as low as a few seconds if required"

but kind of restrains itself right after that by stating:

   "However such low settings will substantially increase the bandwidth requirements for file shipping"


My question is: is that the bandwidth between primary and standby?

If the archive is stored on a differend harddisk (or storgae system) as the data directory, I'd reckon it wouldn't have
muchimpact on the primary server. Or am I missing something? 


Regards
Thomas






Re: Setting up a warm standby server - some questions

От
Brad Nicholson
Дата:
On Fri, 2010-07-16 at 15:06 +0200, Thomas Kellerer wrote:
> Hello,
>
>
> I could solve one of the questions myself :)
> > I have also problems shutting down the secondary server while it is in
> > "standby" (i.e. recovery) mode
>
> This works fine when using "-m fast" instead of "-m immediate".

fast is the preferred way.  immediate is the sledghammer.

>
> I would still like an answer on these questions:
>
> > What is the recommended way to switch back to the primary, once that is
> > up and running again?

You need to do a new base backup from secondary to primary and ship the
wal files back from secondary to primary.  Once you are ready to switch,
switch back to it the same way that you switched to the the secondary.

For the new base backup, you'll probably want to use something like
rsync that does a differential backup so you don't have to move all the
blocks that are still the same.  But that depends on how big your DB is,
how much churn there is, and how long the primary has been out.

> > What is the recommended way to get the secondary back into the "standby"
> > mode?

Same thing as above.  After switching back - new base backup from
primary to secondary (which should involve a very small differential of
base files), and continue streaming back to the secondary.

Should all be very quick if it is automated.

>
> As we are trying to minimize the possible data loss, I'm looking at the archive_timeout setting (I'm currently
evaluatingour possibilities with 8.4) 
>
> The manual at
> http://www.postgresql.org/docs/current/static/continuous-archiving.html#RECOVERY-CONFIG-SETTINGS
> states:
>
>      "It is therefore unwise to set a very short archive_timeout  — it will bloat your archive storage.
>      archive_timeout settings of a minute or so are usually reasonable"
>
> But on the other hand
> http://www.postgresql.org/docs/current/static/warm-standby.html
> states:
>
>    "The length of the window of data loss can be limited by use of the archive_timeout parameter, which can
>    be set as low as a few seconds if required"
>
> but kind of restrains itself right after that by stating:
>
>    "However such low settings will substantially increase the bandwidth requirements for file shipping"

>
> My question is: is that the bandwidth between primary and standby?


Yes.  It means that you are going to be shipping a lot of files from the
primary to secondary.  If you set the timeout to 1 second, you will be
shipping 960MB a minute (1 16MB wal segment per second).

There is a utility to reduce the size to the wal files - pg_lesslog

http://pglesslog.projects.postgresql.org/

I have no experience with it though.  I'm always a bit hesitant when it
comes to using 3rd party tools to manipulate something as critical as
wal files.  The folks that wrote it (NTT) are solid though.

> If the archive is stored on a differend harddisk (or storgae system) as the data directory, I'd reckon it wouldn't
havemuch impact on the primary server. Or am I missing something? 

It shouldn't have much impact in this case.

--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.