Re: Built-in Raft replication
От | Konstantin Osipov |
---|---|
Тема | Re: Built-in Raft replication |
Дата | |
Msg-id | Z_9-BR89w-DLeFv3@ark обсуждение исходный текст |
Ответ на | Re: Built-in Raft replication (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) |
Список | pgsql-hackers |
* Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> [25/04/16 11:06]: > > My view is what Konstantin wants is automatic replication topology management. For some reason this technology is calledHA, DCS, Raft, Paxos and many other scary words. But basically it manages primary_conn_info of some nodes to providesome fault-tolerance properties. I'd start to design from here, not from Raft paper. > > > In my experience, the load of managing hundreds of replicas which all > participate in RAFT protocol becomes more than regular transaction > load. So making every replica a RAFT participant will affect the > ability to deploy hundreds of replica. I think this experience needs to be detailed out. There are implementations in the field that are less efficient than others. Early etcd-raft didn't have pre-voting and had "bastardized" (their own definition) implementation of configuration changes which didn't use joint consensus. Then there is a liveness issue if leader election is implemented in a straightforward way in large clusters. But this is addressed: scaling up the randomized election timeout with the cluster size, converting most of participants to non-voters in large clusters. Raft replication, again, if implemented in a naive way, would require a O(outstanding transaction) * number of replicas amount of RAM. But that doesn't have to be naive. To sum up, I am not aware of any principal limitations in this area. -- Konstantin Osipov, Moscow, Russia
В списке pgsql-hackers по дате отправления: