Обсуждение: master/master replication with load balancer in front
(I've used other DBs in the past, but I'm fairly new to PG.) Currently I have a single PG 9.3 instance in the cloud. A Python script run as a cron job is connecting to it over the network and is doing the batch updates every hour, usually in append mode. Users have various custom scripts which are used for analytics queries and connect to the single 9.3 instance over the network and run their queries a few times a day. Dataset is a few dozen GB. The single instance scheme is not very reliable. I need to build a new DB backend. I'll set up Postgres 9.4. Ideally, I'd like to setup 2 instances, each instance placed in a different availability zone. Master/master replication. I'll put a load balancer (ELB) in front of both instances. The batch updates and the queries will be sent by the ELB to any instance in the cluster; replication will take care of copying the data to all instances. I want the whole cluster + the load balancer to act as a single instance to everyone connecting to it. "Eventually consistent" replication is fine. I don't want to share storage between PG instances if I can avoid it. I would like to use the 9.4.4 packages made for Ubuntu if at all possible (avoiding any patching). I see there are many different ways to build a PG cluster. What would be the best choice in my case? If I were to drop the master/master requirement and just do master/slave, sending updates to one node, and doing all analytics on the other node, what would be the best replication technique in this case? (We are also considering a migration from the batch update model to a more continuous stream.) -- Florin Andrei http://florin.myip.org/
> I see there are many different ways to build a PG cluster. What would be > the best choice in my case? Hi, a few keywords in your mail hint at the fact you're using AWS? If that's the case, you might want to look into their managed PostgreSQL hosting: it's called Amazon RDS for PostgreSQL and supports failover ("Multi AZ") and master-slave replication ("Read Replicas"). There's no master-master support, though. If you need that, you might want to look into BDR, but then you need to patch... As load balancer, PgPool-II might be what you're looking for. Bye, Chris.
El 21/08/15 a las 20:45, Florin Andrei escribió: > > The single instance scheme is not very reliable. I need to build a new > DB backend. I'll set up Postgres 9.4. Ideally, I'd like to setup 2 > instances, each instance placed in a different availability zone. > Master/master replication. I'll put a load balancer (ELB) in front of > both instances. It's not clear if the main goal is reliability (or availability), or to balance writes. If you are looking for HA, single master with multiple standbys is your best bet (you can put standbys on different zones). You can also look at BDR and have masters geographically distributed, but I'd strongly suggest you look at the link http://bdr-project.org/docs/stable/weak-coupled-multimaster.html, and keep in mind that with multi-master systems you will be more prone to data modification conflicts. You also have to be aware that BDR tries to keeps the schemas consistent across all nodes. This is done by replicating DDL statements (not all, you'll need to check for the not replicated statements and the prohibited ones here http://bdr-project.org/docs/stable/ddl-replication-statements.html) which can bring up locking issues. If you want to balance writes, I don't think a multi-master clustering will fit your needs. If this is the case your needs would be better satisfied with sharding solutions. > The batch updates and the queries will be sent by the ELB to any > instance in the cluster; replication will take care of copying the data > to all instances. I want the whole cluster + the load balancer to act as > a single instance to everyone connecting to it. > > "Eventually consistent" replication is fine. I don't want to share > storage between PG instances if I can avoid it. I would like to use the > 9.4.4 packages made for Ubuntu if at all possible (avoiding any patching). BDR requires patching PostgreSQL for 9.4 and 9.5. > I see there are many different ways to build a PG cluster. What would be > the best choice in my case? > > If I were to drop the master/master requirement and just do > master/slave, sending updates to one node, and doing all analytics on > the other node, what would be the best replication technique in this case? Stream replication seems the one which might fit better. Trigger based replication would choke on large bulk loads (unless you split them up into smaller pieces) Regards, -- Martín Marqués http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 2015-08-23 06:56, Martín Marqués wrote: > El 21/08/15 a las 20:45, Florin Andrei escribió: >> >> The single instance scheme is not very reliable. I need to build a new >> DB backend. I'll set up Postgres 9.4. Ideally, I'd like to setup 2 >> instances, each instance placed in a different availability zone. >> Master/master replication. I'll put a load balancer (ELB) in front of >> both instances. > > It's not clear if the main goal is reliability (or availability), or to > balance writes. > > If you are looking for HA, single master with multiple standbys is your > best bet (you can put standbys on different zones). This is for reliability / availability. The thing is, if I have a single master and an AZ fails, I still have to make manual changes to switch to the healthy AZ - and, until then, updates would fail. Master/master, in theory, should absorb single-AZ failures without needing any manual intervention. > You can also look at BDR and have masters geographically distributed, > but I'd strongly suggest you look at the link > http://bdr-project.org/docs/stable/weak-coupled-multimaster.html, and > keep in mind that with multi-master systems you will be more prone to > data modification conflicts. I get the sense that BDR is not 100% ready for prime time. Is that accurate? How about Bucardo? https://bucardo.org/wiki/Bucardo >> If I were to drop the master/master requirement and just do >> master/slave, sending updates to one node, and doing all analytics on >> the other node, what would be the best replication technique in this >> case? > > Stream replication seems the one which might fit better. Trigger based > replication would choke on large bulk loads (unless you split them up > into smaller pieces) It's all bulk uploads for now. It's a Python script that wakes up once in a while and dumps more data into the DB. Size varies but it can be big. -- Florin Andrei http://florin.myip.org/
On 2015-08-22 03:05, Chris Mair wrote: > > a few keywords in your mail hint at the fact you're using AWS? > > If that's the case, you might want to look into their managed > PostgreSQL hosting: it's called Amazon RDS for PostgreSQL and > supports failover ("Multi AZ") and master-slave replication > ("Read Replicas"). Yes, it's AWS. A few issues with that: lack of master/master support which you've mentioned, and we have longer term plans to use other cloud providers as well - so I'm trying to avoid provider lock-in (whenever it makes sense to do so). > As load balancer, PgPool-II might be what you're looking for. Would there be any issues if I just used an ELB pointing at two masters? Let's say I use sticky sessions to avoid clients switching masters too much. -- Florin Andrei http://florin.myip.org/
El 24/08/15 a las 15:41, Florin Andrei escribió: > On 2015-08-23 06:56, Martín Marqués wrote: >> El 21/08/15 a las 20:45, Florin Andrei escribió: >>> >>> The single instance scheme is not very reliable. I need to build a new >>> DB backend. I'll set up Postgres 9.4. Ideally, I'd like to setup 2 >>> instances, each instance placed in a different availability zone. >>> Master/master replication. I'll put a load balancer (ELB) in front of >>> both instances. >> >> It's not clear if the main goal is reliability (or availability), or to >> balance writes. >> >> If you are looking for HA, single master with multiple standbys is your >> best bet (you can put standbys on different zones). > > This is for reliability / availability. > > The thing is, if I have a single master and an AZ fails, I still have to > make manual changes to switch to the healthy AZ - and, until then, > updates would fail. Master/master, in theory, should absorb single-AZ > failures without needing any manual intervention. When updates fail the application should have the code to retry executing those statements or return the appropriate errors to the user so further actions can be carried on. And in any case, you will need to adjust you application or pooling system to send statements to a node that's up. Multi-master solutions won't magically do that. Maybe you should take a look at repmgr and particularly the automatic failover feature: https://github.com/2ndQuadrant/repmgr/blob/master/FAILOVER.rst >> You can also look at BDR and have masters geographically distributed, >> but I'd strongly suggest you look at the link >> http://bdr-project.org/docs/stable/weak-coupled-multimaster.html, and >> keep in mind that with multi-master systems you will be more prone to >> data modification conflicts. > > I get the sense that BDR is not 100% ready for prime time. Is that > accurate? BDR is ready for prime time, but if it will work for you depends on your needs. BDR aims at *eventually* consistent multi-masters system with geographically distributed nodes, so DML statements on one node will not be slowed down because the data has to be replicated to other distant nodes. There's still lots of ground where to improve, but isn't that also true for any other system? The important thing is if it provides what you need. I suggest you run some tests and see if it fits your requirements. Regards, -- Martín Marqués http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services