Re: I'd like to discuss scaleout at PGCon
От | Michael Paquier |
---|---|
Тема | Re: I'd like to discuss scaleout at PGCon |
Дата | |
Msg-id | 20180606030608.GI1442@paquier.xyz обсуждение исходный текст |
Ответ на | Re: I'd like to discuss scaleout at PGCon ("MauMau" <maumau307@gmail.com>) |
Ответы |
Re: I'd like to discuss scaleout at PGCon
("MauMau" <maumau307@gmail.com>)
|
Список | pgsql-hackers |
On Wed, Jun 06, 2018 at 01:14:04AM +0900, MauMau wrote: > I don't think an immediate server like the coordinators in XL is > necessary. That extra hop can be eliminated by putting both the > coordinator and the data node roles in the same server process. That > is, the node to which an application connects communicates with other > nodes only when it does not necessary data. Yes, I agree with that. This was actually a concern I had over the original XC design after a couple of years working on it. The less nodes, the easier the HA, even if applying any PITR logic in N nodes instead of N*2 nodes with 2PC checks and cleanups is far from trivial either.. It happens that the code resulting in splitting coordinator and datanode was simpler to maintain than merging both, at the cost of operation maintenance and complexity in running the thing. > Furthermore, an extra hop and double parsing/planning could matter for > analytic queries, too. For example, SAP HANA boasts of scanning 1 > billion rows in one second. In HANA's scaleout architecture, an > application can connect to any worker node and the node communicates > with other nodes only when necessary (there's one special node called > "master", but it manages the catalog and transactions; it's not an > extra hop like the coordinator in XL). Vertica is an MPP analytics > database, but it doesn't have a node like the coordinator, either. To > achieve maximum performance for real-time queries, the scaleout > architecture should avoid an extra hop when possible. Greenplum's orca planner (and Citus?) have such facilities if I recall correctly, just mentioning that pushing down directly to remote nodes compiled plans ready for execution exists here and there (that's not the case of XC/XL). For queries whose planning time is way shorter than its actual execution, like analytical work that would not matter much. But not for OLTP and short transaction workloads. >> Using a central coordinator also allows multi-node transaction >> control, global deadlock detection etc.. > > VoltDB does not have an always-pass hop like the coordinator in XL. Greenplum uses also a single-coordinator, multi-datanode instance. That looks similar, right? > Our proprietary RDBMS named Symfoware, which is not based on > PostgreSQL, also doesn't have an extra hop, and can handle distributed > transactions and deadlock detection/resolution without any special > node like GTM. Interesting to know that. This is an area with difficult problems. At the closer to merge with Postgres head, the more fun (?) you get into trying to support new SQL features, and sometimes you finish with hard ERRORs or extra GUC switches to prevent any kind of inconsistent operations. -- Michael
Вложения
В списке pgsql-hackers по дате отправления: