Обсуждение: Tutorials on high availability Postgresql setup?

Поиск
Список
Период
Сортировка

Tutorials on high availability Postgresql setup?

От
Andy
Дата:
Is there any tutorials or detailed instructions on how to set up HA postgresql & failover? The documentation
(http://www.postgresql.org/docs/9.0/interactive/warm-standby-failover.html)on this topics is pretty scarce. 

The scenario I'm most interested in is this:

2 servers - a master and a hot standby. All writes are sent to master, reads are split between master and hot standby.

1) If the hot standby goes down, how do I redirect reads to the master?
2) If the master fails
    -how do I automatically promote the standby to master and send all reads/writes to the new master?
    -what happens when the old master comes back up? Do I need to so anything to make it catches up to the new master?

Thanks.




Re: Tutorials on high availability Postgresql setup?

От
Scott Marlowe
Дата:
On Thu, Oct 7, 2010 at 12:27 AM, Andy <angelflow@yahoo.com> wrote:
> Is there any tutorials or detailed instructions on how to set up HA postgresql & failover? The documentation
(http://www.postgresql.org/docs/9.0/interactive/warm-standby-failover.html)on this topics is pretty scarce. 
>
> The scenario I'm most interested in is this:
>
> 2 servers - a master and a hot standby. All writes are sent to master, reads are split between master and hot
standby.

To have true redundancy, you need 3 servers.  Just saying.  Otherwise
when one goes down, no more redundancy.

> 1) If the hot standby goes down, how do I redirect reads to the master?

Have a config file for your app that tells it where to go for reads
and writes.  Change the config file to point reads at a different db
if a read slave fails.  What constitutes a failed read slave is kind
of a business decision, so you'll likely have to write your own code
to decide what being down means.

> 2) If the master fails
>    -how do I automatically promote the standby to master and send all reads/writes to the new master?

First you need to decide if you actually want automated failovers.
I've seen automated failovers cause as many problems as they were
supposed to fix, but it can be done.  Keep in mind that on a two db
system, failing over means you lose redundancy.  If your cluster fails
over on a lot of false positives, that's a lot of time with no
redundancy.  If your script isn't written with having only one node in
mind, it might try to failover a second time with no read slave to
promote to master.

Also, you're going to have to come up with what constitutes a failed
master.  30 seconds non-responsive?  5 minutes?  An hour?    If the
problem is that the write master is simply overloaded, then failing
over isn't gonna solve anything, as the now newly promoted master is
going to collapse as well under even heavier load.  It might have been
better to adjust the load factors used to determine where read queries
go to take load off of the master, or to change a setting in your app
that reduces load on the master.  With an overloaded write master,
then failover, then overloaded even worse new write master you've got
a site down, no redundancy, and you need to rebuild your old master as
a read slave to handle the load.

To start with I do not recommend doing automatic failovers.  Have a
system in place where your DBA / SA can promote a slave to master in
one or two easy steps, and if / when the master truly fails, then run
that script.  A human can make that decision with far more care than a
piece of code.

>    -what happens when the old master comes back up? Do I need to so anything to make it catches up to the new master?

You can't let the old master come back up as thinking it's the master
as well.  You have to re-establish replication to it as a slave.
Again, this is usually not automated, at least not at first.  The old
master needs to be "shot in the head" so to speak before it comes back
up, or your app may start writing to it instead of or as well as the
new master, and now you've got split-brain problems.

In short automated failover is complicated to get right, and if you
get it wrong the cost of the consequences can far worse than the 5 or
10 minutes of downtime required for a manual switch-over.  First write
scripts that automate most of the task for your application and db
farm.  Test those scripts as much as you can on a test farm.  Then run
them when needed by hand when things go wrong.  If or when you're
certain you've got all the bugs worked out and all the possible
failure scenarios worked out, you can start testing automated
failover.

Re: Tutorials on high availability Postgresql setup?

От
Tatsuo Ishii
Дата:
> Is there any tutorials or detailed instructions on how to set up HA postgresql & failover? The documentation
(http://www.postgresql.org/docs/9.0/interactive/warm-standby-failover.html)on this topics is pretty scarce. 
>
> The scenario I'm most interested in is this:
>
> 2 servers - a master and a hot standby. All writes are sent to master, reads are split between master and hot
standby.
>
> 1) If the hot standby goes down, how do I redirect reads to the master?

pgpool-II 3.0 will take care of this.

> 2) If the master fails
>     -how do I automatically promote the standby to master and send all reads/writes to the new master?

This is covered by pgpool-II 3.0 as well.

>     -what happens when the old master comes back up? Do I need to so anything to make it catches up to the new
master?

I recommend to use it a standby. Such a configuration is possible by
using pgpool-II 3.0.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: Tutorials on high availability Postgresql setup?

От
Andy
Дата:
--- On Thu, 10/7/10, Tatsuo Ishii <ishii@postgresql.org> wrote:

> > The scenario I'm most interested in is this:
> >
> > 2 servers - a master and a hot standby. All writes are
> sent to master, reads are split between master and hot
> standby.
> >
> > 1) If the hot standby goes down, how do I redirect
> reads to the master?
>
> pgpool-II 3.0 will take care of this.
>
> > 2) If the master fails
> >     -how do I automatically
> promote the standby to master and send all reads/writes to
> the new master?
>
> This is covered by pgpool-II 3.0 as well.
>
> >     -what happens when the old
> master comes back up? Do I need to so anything to make it
> catches up to the new master?
>
> I recommend to use it a standby. Such a configuration is
> possible by
> using pgpool-II 3.0.
> --

Oh so I'd still need a proxy such as pgpool-II for HA setup?

I was thinking that with the new built-in replication in 9.0 there would be no need to use pgpool-II.

If pgpool is still necessary why not also use it for replication? What would be the advantages of using the 9.0's
built-inreplication as opposed to pgpool's replication? 





Re: Tutorials on high availability Postgresql setup?

От
John R Pierce
Дата:
On 10/07/10 12:53 AM, Andy wrote:
> If pgpool is still necessary why not also use it for replication? What would be the advantages of using the 9.0's
built-inreplication as opposed to pgpool's replication? 

pgpool's replication works quite differently than the built in.  pgpool
replicates queries at the front end, while the built in replication
replicates block writes at the back end.



Re: Tutorials on high availability Postgresql setup?

От
Andy
Дата:
Ah thanks for the explanation. I was hoping for an automated setup without the need to get paged 24/7.

So HA is still as hard as I thought it would be. I was hoping that with 9.0 things would be easier.

--- On Thu, 10/7/10, Scott Marlowe <scott.marlowe@gmail.com> wrote:

> From: Scott Marlowe <scott.marlowe@gmail.com>
> Subject: Re: [GENERAL] Tutorials on high availability Postgresql setup?
> To: "Andy" <angelflow@yahoo.com>
> Cc: pgsql-general@postgresql.org
> Date: Thursday, October 7, 2010, 3:24 AM
> On Thu, Oct 7, 2010 at 12:27 AM, Andy
> <angelflow@yahoo.com>
> wrote:
> > Is there any tutorials or detailed instructions on how
> to set up HA postgresql & failover? The documentation
> (http://www.postgresql.org/docs/9.0/interactive/warm-standby-failover.html)
> on this topics is pretty scarce.
> >
> > The scenario I'm most interested in is this:
> >
> > 2 servers - a master and a hot standby. All writes are
> sent to master, reads are split between master and hot
> standby.
>
> To have true redundancy, you need 3 servers.  Just
> saying.  Otherwise
> when one goes down, no more redundancy.
>
> > 1) If the hot standby goes down, how do I redirect
> reads to the master?
>
> Have a config file for your app that tells it where to go
> for reads
> and writes.  Change the config file to point reads at
> a different db
> if a read slave fails.  What constitutes a failed read
> slave is kind
> of a business decision, so you'll likely have to write your
> own code
> to decide what being down means.
>
> > 2) If the master fails
> >    -how do I automatically promote the standby to
> master and send all reads/writes to the new master?
>
> First you need to decide if you actually want automated
> failovers.
> I've seen automated failovers cause as many problems as
> they were
> supposed to fix, but it can be done.  Keep in mind
> that on a two db
> system, failing over means you lose redundancy.  If
> your cluster fails
> over on a lot of false positives, that's a lot of time with
> no
> redundancy.  If your script isn't written with having
> only one node in
> mind, it might try to failover a second time with no read
> slave to
> promote to master.
>
> Also, you're going to have to come up with what constitutes
> a failed
> master.  30 seconds non-responsive?  5
> minutes?  An hour?    If the
> problem is that the write master is simply overloaded, then
> failing
> over isn't gonna solve anything, as the now newly promoted
> master is
> going to collapse as well under even heavier load.  It
> might have been
> better to adjust the load factors used to determine where
> read queries
> go to take load off of the master, or to change a setting
> in your app
> that reduces load on the master.  With an overloaded
> write master,
> then failover, then overloaded even worse new write master
> you've got
> a site down, no redundancy, and you need to rebuild your
> old master as
> a read slave to handle the load.
>
> To start with I do not recommend doing automatic
> failovers.  Have a
> system in place where your DBA / SA can promote a slave to
> master in
> one or two easy steps, and if / when the master truly
> fails, then run
> that script.  A human can make that decision with far
> more care than a
> piece of code.
>
> >    -what happens when the old master comes back up?
> Do I need to so anything to make it catches up to the new
> master?
>
> You can't let the old master come back up as thinking it's
> the master
> as well.  You have to re-establish replication to it
> as a slave.
> Again, this is usually not automated, at least not at
> first.  The old
> master needs to be "shot in the head" so to speak before it
> comes back
> up, or your app may start writing to it instead of or as
> well as the
> new master, and now you've got split-brain problems.
>
> In short automated failover is complicated to get right,
> and if you
> get it wrong the cost of the consequences can far worse
> than the 5 or
> 10 minutes of downtime required for a manual
> switch-over.  First write
> scripts that automate most of the task for your application
> and db
> farm.  Test those scripts as much as you can on a test
> farm.  Then run
> them when needed by hand when things go wrong.  If or
> when you're
> certain you've got all the bugs worked out and all the
> possible
> failure scenarios worked out, you can start testing
> automated
> failover.
>




Re: Tutorials on high availability Postgresql setup?

От
Tatsuo Ishii
Дата:
>> > The scenario I'm most interested in is this:
>> >
>> > 2 servers - a master and a hot standby. All writes are
>> sent to master, reads are split between master and hot
>> standby.
>> >
>> > 1) If the hot standby goes down, how do I redirect
>> reads to the master?
>>
>> pgpool-II 3.0 will take care of this.
>>
>> > 2) If the master fails
>> >,A (B ,A   (B-how do I automatically
>> promote the standby to master and send all reads/writes to
>> the new master?
>>
>> This is covered by pgpool-II 3.0 as well.
>>
>> >,A (B ,A   (B-what happens when the old
>> master comes back up? Do I need to so anything to make it
>> catches up to the new master?
>>
>> I recommend to use it a standby. Such a configuration is
>> possible by
>> using pgpool-II 3.0.
>> --
>
> Oh so I'd still need a proxy such as pgpool-II for HA setup?
> I was thinking that with the new built-in replication in 9.0 there would be no need to use pgpool-II.

PostgreSQL 9.0's replication still lacks automated failover/load
balance/query dispatching(send read/write query to primary, send read
query to standby). So if you need these, you would want to use
pgpool-II or any other proxy solutions.

> If pgpool is still necessary why not also use it for replication? What would be the advantages of using the 9.0's
built-inreplication as opposed to pgpool's replication? 

Each replication solution has its own merit/demerit. For example, if
you need synchronous replication, pgpool-II is for you. If you are ok
with async, PostgreSQL's replication is quite nice.

So it depends on you.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

Re: Tutorials on high availability Postgresql setup?

От
Digimer
Дата:
On 10-10-07 02:27 AM, Andy wrote:
> Is there any tutorials or detailed instructions on how to set up HA postgresql & failover? The documentation
(http://www.postgresql.org/docs/9.0/interactive/warm-standby-failover.html)on this topics is pretty scarce. 
>
> The scenario I'm most interested in is this:
>
> 2 servers - a master and a hot standby. All writes are sent to master, reads are split between master and hot
standby.
>
> 1) If the hot standby goes down, how do I redirect reads to the master?
> 2) If the master fails
>     -how do I automatically promote the standby to master and send all reads/writes to the new master?
>     -what happens when the old master comes back up? Do I need to so anything to make it catches up to the new
master?
>
> Thanks.

One option would be to create a simple 2-node cluster and run your PgSQL
server in a migrateable VM backed by a SAN or, if your budget is more
modest, a simple DRBD device.

Personally, I like to use RHCS (Red Hat Cluster Services) with a DRBD
array becking clustered LVM with Xen VMs (domU) on dedicated LVs. This
doesn't dictate the OS version hosting postgres though, so if you've got
a particular set of requirement, you can meet them.

Then in the case of planned outage, you can hot-migrate your VM to the
other node. In the case of catastrophic failure, the surviving node can
boot the PgSQL server and it would come back as if it had simply lost
power. In either case, you don't need to worry about IPs, special
configs or what have you.

--
Digimer
E-Mail:         linux@alteeve.com
AN!Whitepapers: http://alteeve.com
Node Assassin:  http://nodeassassin.org

Re: Tutorials on high availability Postgresql setup?

От
Leonardo Francalanci
Дата:
> One option would be to create a simple  2-node cluster and run your PgSQL
> server in a migrateable VM backed by a SAN  or, if your budget is more
> modest, a simple DRBD device.
>
> Personally, I  like to use RHCS (Red Hat Cluster Services) with a DRBD
> array becking  clustered LVM with Xen VMs (domU) on dedicated LVs. This
> doesn't dictate the  OS version hosting postgres though, so if you've got
> a particular set of  requirement, you can meet them.
>
> Then in the case of planned outage, you  can hot-migrate your VM to the
> other node. In the case of catastrophic  failure, the surviving node can
> boot the PgSQL server and it would come back  as if it had simply lost
> power.


I'm interested in the subject... can you explain a little more the setup?


Thank you




Re: Tutorials on high availability Postgresql setup?

От
Steve Crawford
Дата:
On 10/07/2010 12:59 AM, Andy wrote:
> Ah thanks for the explanation. I was hoping for an automated setup without the need to get paged 24/7.
>
> So HA is still as hard as I thought it would be. I was hoping that with 9.0 things would be easier.
>
>
My 0.02.

Whether you need 3 servers (or 2 or 5 or even just 1) is a business
decision informed by technological constraints. HA/redundancy is
basically just like insurance - how much you should spend depends on the
frequency of failure (MTBF), how long it takes to repair a failure
(MTTR), and the cost to the business of downtime and/or data-loss. For
many businesses, a single server is fine but for others 3 isn't close to
enough. Based on many years of experience running PostgreSQL, I would
say that using a single-server option would have given us well over
 >99.9% availability. (I'm referring to failure-related downtime which
we just never see. Scheduled downtime for updates - especially with the
older releases - are a different story and a second server is very handy
for that.)

99.9% gives you over 8-hours/year downtime - generally pretty easy.
Trimming that downtime to a guaranteed 5-minute maximum (5-nines)
becomes exponentially more costly. (And if your primary worry is lost
sleep due to the 24x7 pager, consider that PostgreSQL properly set up on
quality hardware is pretty friggin reliable. I've lost sleep many times
- it just hasn't been PostgreSQL that caused it.)

Replication is one piece of HA and 9.0 does make that much easier (and
lets you to run read queries on the backup machine so it can earn its
keep). But how to fail from one machine to another is still up to you.

Cheers,
Steve