Horizontal Write Scaling

Поиск

Список

Период

Сортировка

От	Eliot Gable
Тема	Horizontal Write Scaling
Дата	23 ноября 2010 г. 19:43:33
Msg-id	AANLkTinxTiuuDy8Up-pZVh=1YDa89ga_cDarGZ3usg3L@mail.gmail.com обсуждение исходный текст
Ответы	Re: Horizontal Write Scaling (Eliot Gable <egable+pgsql-hackers@gmail.com>) Re: Horizontal Write Scaling ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) Re: Horizontal Write Scaling (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>) Re: Horizontal Write Scaling (Greg Smith <greg@2ndquadrant.com>) Re: Horizontal Write Scaling (Markus Wanner <markus@bluegap.ch>)
Список	pgsql-hackers

Дерево обсуждения

I know there has been a lot of talk about replication getting built into Postgres and I know of many projects that aim to fill the role. However, I have not seen much in the way of a serious attempt at multi-master write scaling. I understand the fundamental problem with write scaling across multiple nodes is Disk I/O and inter-node communication latency and that in the conventional synchronous, multi-master replication type setup you would be limited to the speed of the slowest node, plus the communication protocol overhead and latency. However, it occurs to me that if you had a shared disk system via either iSCSI, Fiber Channel, NFS, or whatever (which also had higher I/O capabilities than a single server could utilize), if you used a file system that supported locks on a particular section (extent) of a file, it should theoretically be possible for multiple Postgres instances on multiple systems sharing the database to read and write to the database without causing corruption. Obviously, it would take some carefully designed code in terms of file extent locking, but it seems like it would also eliminate the need entirely for any type of replication system. Lustre seems to support the type of file locking required. Of course, I am assuming the disk system would be RAID 1, RAID 10, RAID 5, or RAID 6 for reliability purposes and that it is sufficiently redundant that you don't have to worry about an outage of your storage system.

Has anyone put any thought into what it would take to do this in Postgres? Is it simply a matter of making the database file interaction code aware of extent locking, or is it considerably more involved than that? It also occurs to me that you probably need some form of transaction ordering mechanism across the nodes based on synchronized timestamps, but it seems Postgres-R has the required code to do that portion already written. That may not even be needed since all nodes would actually be working on the same database files and the locks would ensure a strict ordering of queries. The only place I see that possibly causing a problem is someone load balancing across the servers and sending a delete and insert really close to each other to different nodes such that if the insert executes first, it would be deleted by the delete even though the intent was to have the delete run first. Timestamp ordering of the queries or just some shared transaction ID across the nodes would eliminate that possibility. Other than that, is there anything else I am missing? Wouldn't this type of setup be far simpler to implement and provide better scalability than trying to do multi-master replication using log shipping or binary object shipping or any other techniques? Wouldn't it also be far more efficient since you don't need to have a copy of your data on each master node and therefor also don't have to ship your data to each node and have each node process it?

I am mostly asking for educational purposes, and I would appreciate technical (and hopefully specific) explanations as to what in Postgres would need to change to support this.

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Peter Tanski
Дата: 23 ноября 2010 г., 18:54:25
Сообщение: Re: GiST seems to drop left-branch leaf tuples

Следующее

От: Peter Eisentraut
Дата: 23 ноября 2010 г., 19:53:47
Сообщение: Re: ALTER TYPE recursion to typed tables

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Horizontal Write Scaling

Предыдущее

Следующее