Re: The plan for FDW-based sharding
От | Petr Jelinek |
---|---|
Тема | Re: The plan for FDW-based sharding |
Дата | |
Msg-id | 56D5E5FA.3070809@2ndquadrant.com обсуждение исходный текст |
Ответ на | Re: The plan for FDW-based sharding (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: The plan for FDW-based sharding
|
Список | pgsql-hackers |
On 27/02/16 04:54, Robert Haas wrote: > On Fri, Feb 26, 2016 at 10:56 PM, Konstantin Knizhnik > <k.knizhnik@postgrespro.ru> wrote: >> We do not have formal prove that proposed XTM is "general enough" to handle >> all possible transaction manager implementations. >> But there are two general ways of dealing with isolation: snapshot based and >> CSN based. > > I don't believe that for a minute. For example, consider this article: > > https://en.wikipedia.org/wiki/Global_serializability > > I think the neutrality of that article is *very* debatable, but it > certainly contradicts the idea that snapshots and CSNs are the only > methods of achieving global serializability. > > Or consider this lecture: > > http://hssl.cs.jhu.edu/~randal/416/lectures.old/ln5.2.pdf > > That's a great introduction to the problem we're trying to solve here, > but again, snapshots are not mentioned, and CSNs certainly aren't > mentioned. > > This write-up goes further, explaining three different methods for > ensuring global serializability, none of which mention snapshots or > CSNs: > > http://heaven.eee.metu.edu.tr/~vision/LectureNotes/EE442/Ee442ch7.html > > Actually, I think the second approach is basically a snapshot/CSN-type > approach, but it doesn't use that terminology and the connection to > what you are proposing is very unclear. > > I think you're approaching this problem from a viewpoint that is > entirely too focused on the code that exists in PostgreSQL today. > Lots of people have done lots of academic research on how to solve > this problem, and you can't possibly say that CSNs and snapshots are > the only solution to this problem unless you haven't read any of those > papers. The articles above aren't exceptional in mentioning neither > of the approaches that you are advocating - they are typical of the > literature in this area. How can it be that the only solutions to > this problem are ones that are totally different from the approaches > that university professors who spend time doing research on > concurrency have spent time exploring? > > I think we need to back up here and examine our underlying design > assumptions. The goal here shouldn't necessarily be to replace > PostgreSQL's current transaction management with a distributed version > of the same thing. We might want to do that, but I think the goal is > or should be to provide ACID semantics in a multi-node environment, > and specifically the I in ACID: transaction isolation. Making the > existing transaction manager into something that can be spread across > multiple nodes is one way of accomplishing that. Maybe the best one. > Certainly one that's been experimented within Postgres-XC. But it is > often the case that an algorithm that works tolerably well on a single > machine starts performing extremely badly in a distributed > environment, because the latency of communicating between multiple > systems is vastly higher than the latency of communicating between > CPUs or cores on the same system. So I don't think we should be > assuming that's the way forward. > I have similar problem with the FDW approach though. It seems to me like because we have something that solves access to external tables somebody decided that it should be used as base for the whole sharding solution but there is no real concept of how it will look like together, no ideas what it will be usable for and not even simple prototype that would prove that the idea is sound (although again, I am not clear on what the actual idea is beyond "we will use FDWs"). Don't get me wrong, I agree that the current FDW enhancements are useful, I am just worried about them being presented as future of sharding in Postgres when nobody has sketched how the future might look like. And once we get to more interesting parts like consistency, distributed query planning, p2p connections (and I am really concerned about these as FDWs abstract some knowledge that coordinator and or data nodes might need to do these well), etc we might very well find ourselves painted in the corner and have to start from beginning, while if we had some idea on how the whole thing might look like we could identify this early and not postpone built-in sharding by several years just because somebody said we will use FDWs and that's what we worked on in those years. Note that I am not saying that other discussed approaches are any better, I am saying that we should know approximately what we actually want and not just beat FDWs with a hammer and hope sharding will eventually emerge and call that the plan. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-hackers по дате отправления: