Обсуждение: Postgres Clustering Options
I'm trying to meet a very high uptime requirement in a high performance environment. to do this we will need to have some form of cluster for our databases What I plan on doing is: Postgres installed on a Cluster configured in active/passive (both pointing to the same SAN (If PG or the OS fails we trigger a failover to the passive node) Log shipping between that cluster and a single PG Instance off site. Is this a common/reccomended method of handling clusterin with Postgres? google searches basically point to using a replication based solution, which i don't think would meet my performance demands. Does anyone have expereince with this or a similar setup that they could share with me? Thanks Dave
What are you trying to protect against? Software failure? Hardware failure? Both? Depending on your budget, you could theoretically point any number of failover nodes at a san, so long as you make sure only one of them is running postgres at a time. Of course, you still have the single point of failure in the SAN. If you aren't made of money and are running linux, we've found DRBD is a great way to cluster two machines and it avoids a few single points of failure. But you limit yourself to two or three cluster nodes. What are you trying to achieve with your offsite node? Is it supposed to pick up the load if the cluster dies? David Kerr wrote: > I'm trying to meet a very high uptime requirement in a high performance environment. > > to do this we will need to have some form of cluster for our databases > > What I plan on doing is: > > Postgres installed on a Cluster configured in active/passive (both pointing to the same SAN > (If PG or the OS fails we trigger a failover to the passive node) > > Log shipping between that cluster and a single PG Instance off site. > > > Is this a common/reccomended method of handling clusterin with Postgres? google searches > basically point to using a replication based solution, which i don't think would meet my > performance demands. > > Does anyone have expereince with this or a similar setup that they could share with me? > > Thanks > > Dave > >
David Kerr wrote: > I'm trying to meet a very high uptime requirement in a high performance environment. > > to do this we will need to have some form of cluster for our databases > > What I plan on doing is: > > Postgres installed on a Cluster configured in active/passive (both pointing to the same SAN > (If PG or the OS fails we trigger a failover to the passive node) > > Log shipping between that cluster and a single PG Instance off site. > > > Is this a common/reccomended method of handling clusterin with Postgres? google searches > basically point to using a replication based solution, which i don't think would meet my > performance demands. > > Does anyone have expereince with this or a similar setup that they could share with me? > thats your classic database cluster. the reason you don't see much of that in online writeups is that the high availability SAN hardware is expensive presumably you'd manage this with classic cluster managemetn software (veritas cluster, sun cluster, redhat cluster, heartbeat, or whatever is appropriate to your environment. commercial cluster vendors generally recommend doing the cluster 'heartbeat' over at least two seperate network links so that a network failure doesn't trigger a false failover. implementing 'fencing' in your storage switch is also a very good idea, most fencing systems can send commands to common fiberchannel switches to disable the access port or soft zone of the current standby server so ti can't accidentally mount the storage. your applications should be tolerant of database server disconnects, and know how to reconnect and restart the transaction that was in progress.
Ben Chobot wrote: > Of course, you still have the single point of failure in the SAN. a proper SAN has two switches, each host connected to it has two HBA interfaces, there are two redundant storage controllers with mirrored cache, dual paths from each controller to all the storage, and redundant power supplies throughout. there is no single point of failure there. of course, none of this is cheap.
John R Pierce wrote: > presumably you'd manage this with classic cluster managemetn software > (veritas cluster, sun cluster, redhat cluster, heartbeat, or whatever > is appropriate to your environment. I've seen or heard of successful implementations like this done with Veritas, Sun, RedHat, and Linux HA. This topic is actually addressed briefly at both http://www.postgresql.org/about/press/faq and http://www.enterprisedb.com/products/allfaq.do ; Sun even mentions PostgreSQL support as a specific strength of their product at http://blogs.sun.com/SC/entry/migrating_from_veritas_cluster_server I think you don't hear about this much here because the sort of companies who have the cash for this style of deployment are hiring commercial vendors and private consultants under NDA rather than relying on the PostgreSQL community. That's why I can't say more about the one project like this I did, that featured a shared SAN and commercial cluster software. -- Greg Smith greg@2ndQuadrant.com Baltimore, MD
David Kerr wrote: > Postgres installed on a Cluster configured in active/passive (both pointing to the same SAN > (If PG or the OS fails we trigger a failover to the passive node) > Log shipping between that cluster and a single PG Instance off site. > Is this a common/reccomended method of handling clusterin with Postgres? google searches > basically point to using a replication based solution, which i don't think would meet my > performance demands. > The part I'm having trouble with here is how it is you expect to keep a remote node up to date with log-shipping, but then reject log-shipping based replication as not high enough performance for you? The classic problem with log-shipping in PostgreSQL is that you've got a single recovery process trying to replay the work of what many workers did on the master, and that can turn into a potential lag problem as volume spikes upwards. If you don't expect a standby is going to be able to keep up with your volume due to that issue, the remote one is going to be even worse though. -- Greg Smith greg@2ndQuadrant.com Baltimore, MD
On Wed, Nov 11, 2009 at 09:40:21AM -0800, John R Pierce wrote: - David Kerr wrote: - >Does anyone have expereince with this or a similar setup that they could - >share with me? - > - - thats your classic database cluster. the reason you don't see - much of that in online writeups is that the high availability SAN - hardware is expensive - - presumably you'd manage this with classic cluster managemetn software - (veritas cluster, sun cluster, redhat cluster, heartbeat, or whatever is - appropriate to your environment. commercial cluster vendors - generally recommend doing the cluster 'heartbeat' over at least two - seperate network links so that a network failure doesn't trigger a false - failover. implementing 'fencing' in your storage switch is also a - very good idea, most fencing systems can send commands to common - fiberchannel switches to disable the access port or soft zone of the - current standby server so ti can't accidentally mount the storage. - - your applications should be tolerant of database server disconnects, and - know how to reconnect and restart the transaction that was in progress. I'll look into Fencing this is the first i've heard of that. But everything else you mentioned is exactly how I planned on doing it. so that's good news =) Thanks! Dave
On Wed, Nov 11, 2009 at 01:11:52PM -0500, Greg Smith wrote: - David Kerr wrote: - >Postgres installed on a Cluster configured in active/passive (both - >pointing to the same SAN - >(If PG or the OS fails we trigger a failover to the passive node) - >Log shipping between that cluster and a single PG Instance off site. - >Is this a common/reccomended method of handling clusterin with Postgres? - >google searches - >basically point to using a replication based solution, which i don't think - >would meet my performance demands. - > - The part I'm having trouble with here is how it is you expect to keep a - remote node up to date with log-shipping, but then reject log-shipping - based replication as not high enough performance for you? The classic - problem with log-shipping in PostgreSQL is that you've got a single - recovery process trying to replay the work of what many workers did on - the master, and that can turn into a potential lag problem as volume - spikes upwards. If you don't expect a standby is going to be able to - keep up with your volume due to that issue, the remote one is going to - be even worse though. We'd fail over to the standby db (recipient of the log shipping) in the case that our hosting center was nuked. those are considered "extreme" circumstances and we have a higher RTO/RPO in those cases. The apps actually aren't as robust as the DB in this case, so i'll have time to replay all of the logs that made it before "the big one" while those are being configured to come up. and if it does take longer that's not a huge issue i'll have a few hours to get 100% caught up. Dave
On Wed, Nov 11, 2009 at 09:35:35AM -0800, Ben Chobot wrote: - What are you trying to protect against? Software failure? Hardware - failure? Both? - - Depending on your budget, you could theoretically point any number of - failover nodes at a san, so long as you make sure only one of them is - running postgres at a time. Of course, you still have the single point - of failure in the SAN. If you aren't made of money and are running - linux, we've found DRBD is a great way to cluster two machines and it - avoids a few single points of failure. But you limit yourself to two or - three cluster nodes. Protecting against both hardware and software failure. SAN failure would be handled by the offsite node, but we've got a pretty robust SAN, (I don't have all of the details) so it may even not have a single point of failure. We tried out DRBD and the performance impact was pretty sigificant. our app is very sensitive to any performance hitch so I just can't see any form of replication working for us. Dave
David Kerr wrote: > The apps actually aren't as robust as the DB in this case, so i'll have time to > replay all of the logs that made it before "the big one" while those are being > configured to come up. and if it does take longer that's not a huge issue > i'll have a few hours to get 100% caught up. > It sounds like you've got the basics nailed down here and are on a well trod path, just one not one documented publicly very well. Since you said that even DRBD was too much overhead for you, I think a dive into evaluating the commercial clustering approaches (or the free LinuxHA that RedHat's is based on, which I haven't been real impressed by) would be appropriate. The hard part is generally getting a heartbeat between the two servers sharing the SAN that is both sensitive enough to catch failures while not being so paranoid that it fails over needlessly (say, when load spikes on the primary and it slows down). Make sure you test that part out very carefully with any vendor you evaluate. As far as the PostgreSQL specifics go, you need a solid way to ensure you've disconnected the now defunct master from the SAN (the classic "shoot the other node in the head" problem). All you *should* have to do is start the database again on the backup after doing that. That will come up as a standard crash, run through WAL replay crash recovery, and the result should be no different than had you restarted after a crash on the original node. The thing you cannot let happen is allowing the original master to continue writing to the shared SAN volume once that transition has happened. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
On Wed, Nov 11, 2009 at 01:35:58PM -0500, Greg Smith wrote: - David Kerr wrote: - >The apps actually aren't as robust as the DB in this case, so i'll have - >time to - >replay all of the logs that made it before "the big one" while those are - >being - >configured to come up. and if it does take longer that's not a huge issue - >i'll have a few hours to get 100% caught up. - > - It sounds like you've got the basics nailed down here and are on a well - trod path, just one not one documented publicly very well. Since you - said that even DRBD was too much overhead for you, I think a dive into - evaluating the commercial clustering approaches (or the free LinuxHA - that RedHat's is based on, which I haven't been real impressed by) would - be appropriate. The hard part is generally getting a heartbeat between - the two servers sharing the SAN that is both sensitive enough to catch - failures while not being so paranoid that it fails over needlessly (say, - when load spikes on the primary and it slows down). Make sure you test - that part out very carefully with any vendor you evaluate. - - As far as the PostgreSQL specifics go, you need a solid way to ensure - you've disconnected the now defunct master from the SAN (the classic - "shoot the other node in the head" problem). All you *should* have to - do is start the database again on the backup after doing that. That - will come up as a standard crash, run through WAL replay crash recovery, - and the result should be no different than had you restarted after a - crash on the original node. The thing you cannot let happen is allowing - the original master to continue writing to the shared SAN volume once - that transition has happened. Thanks Greg that sounds good! and puts my (and my management's) concerns at ease! Dave
Greg Smith wrote: > It sounds like you've got the basics nailed down here and are on a > well trod path, just one not one documented publicly very well. Since > you said that even DRBD was too much overhead for you, I think a dive > into evaluating the commercial clustering approaches (or the free > LinuxHA that RedHat's is based on, which I haven't been real impressed > by) would be appropriate. The hard part is generally getting a > heartbeat between the two servers sharing the SAN that is both > sensitive enough to catch failures while not being so paranoid that it > fails over needlessly (say, when load spikes on the primary and it > slows down). Make sure you test that part out very carefully with any > vendor you evaluate. hence the 'multiple dedicated heartbeat networks' previously suggested. a typical cluster server has a quad ethernet, 2 ports (802.3ad link aggregation w/ failover) for the LAN, and 2 dedicated for the heartbeat, then a dual HBA for the SAN. the heartbeats can run over crossover cables, even 10baseT is plenty as the traffic volume is quite low, it just needs low latency and no possibility of congestion. I setup the RHCS aka CentOS Cluster in a test lab environment... it seemed to work well enough. I was using FC storage via a QLogic SANbox 5600 switch, which was supported by RHCS as a fencing device... Note that ALL of the storage used by the cluster servers on the SAN should be under cluster management as the 'standby' server won't see any of it when its fenced (I implemented fencing via FC port disable). This is can be an issue when you want to do rolling upgrades (update the standby server, force a failover, update the previous master). I built each cluster node with its own direct attached mirrored storage for the OS and software. > As far as the PostgreSQL specifics go, you need a solid way to ensure > you've disconnected the now defunct master from the SAN (the classic > "shoot the other node in the head" problem). All you *should* have to > do is start the database again on the backup after doing that. That > will come up as a standard crash, run through WAL replay crash > recovery, and the result should be no different than had you restarted > after a crash on the original node. The thing you cannot let happen > is allowing the original master to continue writing to the shared SAN > volume once that transition has happened. > which is what 'storage fencing' prevents.
On Wed, Nov 11, 2009 at 7:28 PM, David Kerr <dmk@mr-paradox.net> wrote:
What I plan on doing is:
Postgres installed on a Cluster configured in active/passive (both pointing to the same SAN
(If PG or the OS fails we trigger a failover to the passive node)
Is this a common/reccomended method of handling clusterin with Postgres? google searches
basically point to using a replication based solution, which i don't think would meet my
performance demands.
Does anyone have expereince with this or a similar setup that they could share with me?
We have done a setup like this with Red Hat Cluster Suite.
We are quite happy with the setup in general, and it has been working well even in 'unexpected circumstances' (power outages etc). The only thing I'd change in this setup if I could is the cluster software: RHCS is not mature enough and it seems every release contains new critical bugs, and sometimes even mission-critical components such as quorum disk do not work after an upgrade.
Regards
Mikko
On Thu, Nov 12, 2009 at 07:50:06AM +0200, Mikko Partio wrote: - On Wed, Nov 11, 2009 at 7:28 PM, David Kerr <dmk@mr-paradox.net> wrote: - > basically point to using a replication based solution, which i don't think - > would meet my - > performance demands. - > - > Does anyone have expereince with this or a similar setup that they could - > share with me? - > - - - We have done a setup like this with Red Hat Cluster Suite. - - We are quite happy with the setup in general, and it has been working well - even in 'unexpected circumstances' (power outages etc). The only thing I'd - change in this setup if I could is the cluster software: RHCS is not mature - enough and it seems every release contains new critical bugs, and sometimes - even mission-critical components such as quorum disk do not work after an - upgrade. - - Regards - - Mikko Hi Mikko, In your enviornment, are the applications able to recover automatically after a DB failover? For exmaple, we're using Java/JDBC connections +Geronimo we're researching whether or not JDBC/Geronimo would be able to retry in the case of losing a connection to the DB vs failing and crashing the app. Dave
On Wednesday 11 November 2009, David Kerr said something like: > I'm trying to meet a very high uptime requirement in a high > performance environment. If you don't mind Xen, have you considered: http://dsg.cs.ubc.ca/remus/ System mirroring/hot standby, with instant failover, complete with any open TCP/IP connections, etc. With paravirtualization, the performance hit shouldn't be very big. j -- Joshua Kugler Part-Time System Admin/Programmer http://www.eeinternet.com PGP Key: http://pgp.mit.edu/ ID 0x14EA086E
On Fri, Nov 13, 2009 at 1:47 AM, David Kerr <dmk@mr-paradox.net> wrote:
In your enviornment, are the applications able to recover automatically after
a DB failover?
For exmaple, we're using Java/JDBC connections +Geronimo we're researching whether
or not JDBC/Geronimo would be able to retry in the case of losing a connection to
the DB vs failing and crashing the app.
Since the system is active/passive, a failover *will* disconnect all sessions. We have coded our applications so that if they lose connection to the database, they will automatically try to reconnect.
Regards
Mikko
On Thu, Nov 12, 2009 at 07:32:24PM -0900, Joshua J. Kugler wrote: - On Wednesday 11 November 2009, David Kerr said something like: - > I'm trying to meet a very high uptime requirement in a high - > performance environment. - - If you don't mind Xen, have you considered: - - http://dsg.cs.ubc.ca/remus/ System mirroring/hot standby, with instant - failover, complete with any open TCP/IP connections, etc. - - With paravirtualization, the performance hit shouldn't be very big. - - j Hehe, that opens up a conversation that got closed on me long ago. =) The I/O hit though would be tough for me though. thanks! Dave