Re: Replication
От | Jeff Davis |
---|---|
Тема | Re: Replication |
Дата | |
Msg-id | 1156444306.15743.221.camel@dogma.v10.wvs обсуждение исходный текст |
Ответ на | Re: Replication (Markus Schiltknecht <markus@bluegap.ch>) |
Ответы |
Re: Replication
|
Список | pgsql-hackers |
On Thu, 2006-08-24 at 11:18 +0200, Markus Schiltknecht wrote: > Hi, > > Jeff Davis wrote: > > I disagree about high-availability. In fact, I would say that sync > > replication is trading availability and performance for synchronization > > (which is a valid tradeoff, but costly). > > In a way, replication is for databases systems what RAID1 is for hard > drives. Having multiple cluster nodes in sync minimizes the risk of a > complete outage due to hardware failure. Thus maximizing availability. > Of course, as you say, traded for performance. > > > If you have an async system, all nodes must go down for the system to go > > down. > > Yes. But it takes only one node to go down to potentially lose committed > transactions. In contrast to synchronous replication systems, where a > committed transaction is guaranteed to be 'committed on the cluster'. So > if at least one node of the cluster is up and running, you can be > assured to have consistent data. Right, that's the cost of asynchronous replication. > Please note that the Postgres-R approach does relax some of these > constraints a little to gain performance. The most obvious result of > these relaxations is that the nodes may 'behind' with replaying > transactions and show a past view of the data. > > > If you have a sync system, if any node goes down the system goes down. > > That's plain wrong. Ok, maybe not one node, but I don't think I'm totally off base. See my explanation below. > > If you plan on doing failover, consider this: what if it's not obvious > > which system is still up? What if the network route between the two > > systems goes down (or just becomes too slow to replicate over), but > > clients can still connect to both servers? Then you have two systems > > that both think that the other system went down, and both start > > accepting transactions. Now you no longer have replication at all. > > This problem is often called 'network partitioning', which also refers > to a more general case: a group of M nodes being split into two groups > of N and (M-N) nodes (due to network failure or whatever). > > In Postgres-R a Group Communication System is used to cover all these > aspects (error detection, congruent agreement on a major group, etc..). > Which doesn't work very well in the case of two groups of servers set up in two physical locations. I can see two possibilities: (1) You require a quorum to be effective, in which case your cluster of databases is only as reliable as the location which holds more servers. (2) You have another central authority that determines which databases are up, and which are down. Then your cluster is only as reliable as that central authority. Sure, if you have a million groups of servers spread all over the internet, it works with a very high degree of reliability because you can likely always form a quorum. However, you then have horrible performance because the updates need to be spread to so many locations. And for truly synchronous replication you probably have to serialize the updates, which is very costly over that many nodes all over a network. Even if you have a large number of nodes at different locations, then you end up with strange decisions to make if the network connections are intermittent or very slow. A temporary slowdown of many nodes could cause them to be degraded until some kind of human intervention brought them back. Until that time you might not be able to determine which nodes make up an authoritative group. This kind of degradation could happen in the case of a DDoS attack, or perhaps a worm moving around the internet. In practice everyone can find a solution that works for them. However, synchronous replication is not perfect, and there are many failure scenarios which need to be resolved in a way that fits your business. I think synchronous replication is inherently less available than asynchronous. Regards,Jeff Davis
В списке pgsql-hackers по дате отправления: