Re: Replication Ideas
От | Chris Travers |
---|---|
Тема | Re: Replication Ideas |
Дата | |
Msg-id | 3F4A420E.6090604@travelamericas.com обсуждение исходный текст |
Ответ на | Re: Replication Ideas (Ron Johnson <ron.l.johnson@cox.net>) |
Ответы |
Re: Replication Ideas
(Ron Johnson <ron.l.johnson@cox.net>)
Re: Replication Ideas (Alvaro Herrera <alvherre@dcc.uchile.cl>) |
Список | pgsql-general |
Ron Johnson wrote: >This is vaguely similar to Two Phase Commit, which is a sine qua >non of distributed transactions, which is the s.q.n. of multi-master >replication. > > > I may be wrong, but if I recall correctly, one of the problems with a standard 2-phase commit is that if one server goes down, the other masters cannot commit their transactions. This would make a clustered database server have a downtime equivalent to the total downtime of all of its nodes. This is a real problem. Of course my understanding of Two Phase Commit may be incorrect, in which case, I would appreciate it if someone could point out where I am wrong. It had occurred to me that the issue was one of failure handling more than one of concept. I.e. the problem is how one node's failure is handled rather than the fundamental structure of Two Phase Commit. If a single node fails, we don't want that to take down the whole cluster, and I have actually revised my logic a bit more (to make it even safer). In this I assume that: 1: General failures on any one node are rare 2: A failure is more likely to prevent a transaction from being committed than allow one to be committed. This hot-failover solution requires a transparency from a client perspective-- i.e. the client should not have to choose a different server should one go and should not need to know when a server comes back up. This also means that we need to assume that a load balancing solution can be a part of the clustering solution. I would assume that this would require a shared IP address for the public interface of the server and a private communicatiions channel where each node has a separate IP address (similar to Microsoft's implimentation of Network Load Balancing). Also, different transactions within a single connection should be able to be handled by different nodes, so if one node goes down, users don't have to reconnect. So here is my suggested logic for high availablility/load balanced clustering: 1: All nodes recognize each user connection and delegage transactions rather than connections. 2: At the beginning of a transaction, nodes decide who will take it. Any operation which does not change the information or schema of the database is handled exclusively on that node. Other operations are distributed across nodes. 3: When the transaction is committed, the nodes "vote" on whether the commitment of the transaction is valid. Majority rules, and the minority must remove themselves from the cluster until they can synchronize their databases with the existing masters. If the vote is split 50/50 (i.e. one node fails in a 2 node cluster), success is considered more likely to be valid than failure, and the node(s) which failed to commit the transaction must remove themselves from the cluster until they can recover. Best Wishes, Chris Travers
В списке pgsql-general по дате отправления: