Re: The plan for FDW-based sharding

Поиск

Список

Период

Сортировка

От	Petr Jelinek
Тема	Re: The plan for FDW-based sharding
Дата	1 марта 2016 г. 18:57:10
Msg-id	56D5E5FA.3070809@2ndquadrant.com обсуждение исходный текст
Ответ на	Re: The plan for FDW-based sharding (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: The plan for FDW-based sharding
Список	pgsql-hackers

Дерево обсуждения

On 27/02/16 04:54, Robert Haas wrote:
> On Fri, Feb 26, 2016 at 10:56 PM, Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>> We do not have formal prove that proposed XTM is "general enough" to handle
>> all possible transaction manager implementations.
>> But there are two general ways of dealing with isolation: snapshot based and
>> CSN  based.
>
> I don't believe that for a minute.  For example, consider this article:
>
> https://en.wikipedia.org/wiki/Global_serializability
>
> I think the neutrality of that article is *very* debatable, but it
> certainly contradicts the idea that snapshots and CSNs are the only
> methods of achieving global serializability.
>
> Or consider this lecture:
>
> http://hssl.cs.jhu.edu/~randal/416/lectures.old/ln5.2.pdf
>
> That's a great introduction to the problem we're trying to solve here,
> but again, snapshots are not mentioned, and CSNs certainly aren't
> mentioned.
>
> This write-up goes further, explaining three different methods for
> ensuring global serializability, none of which mention snapshots or
> CSNs:
>
> http://heaven.eee.metu.edu.tr/~vision/LectureNotes/EE442/Ee442ch7.html
>
> Actually, I think the second approach is basically a snapshot/CSN-type
> approach, but it doesn't use that terminology and the connection to
> what you are proposing is very unclear.
>
> I think you're approaching this problem from a viewpoint that is
> entirely too focused on the code that exists in PostgreSQL today.
> Lots of people have done lots of academic research on how to solve
> this problem, and you can't possibly say that CSNs and snapshots are
> the only solution to this problem unless you haven't read any of those
> papers.  The articles above aren't exceptional in mentioning neither
> of the approaches that you are advocating - they are typical of the
> literature in this area.  How can it be that the only solutions to
> this problem are ones that are totally different from the approaches
> that university professors who spend time doing research on
> concurrency have spent time exploring?
>
> I think we need to back up here and examine our underlying design
> assumptions.  The goal here shouldn't necessarily be to replace
> PostgreSQL's current transaction management with a distributed version
> of the same thing.  We might want to do that, but I think the goal is
> or should be to provide ACID semantics in a multi-node environment,
> and specifically the I in ACID: transaction isolation.  Making the
> existing transaction manager into something that can be spread across
> multiple nodes is one way of accomplishing that.  Maybe the best one.
> Certainly one that's been experimented within Postgres-XC.  But it is
> often the case that an algorithm that works tolerably well on a single
> machine starts performing extremely badly in a distributed
> environment, because the latency of communicating between multiple
> systems is vastly higher than the latency of communicating between
> CPUs or cores on the same system.  So I don't think we should be
> assuming that's the way forward.
>

I have similar problem with the FDW approach though. It seems to me like 
because we have something that solves access to external tables somebody 
decided that it should be used as base for the whole sharding solution 
but there is no real concept of how it will look like together, no ideas 
what it will be usable for and not even simple prototype that would 
prove that the idea is sound (although again, I am not clear on what the 
actual idea is beyond "we will use FDWs").

Don't get me wrong, I agree that the current FDW enhancements are 
useful, I am just worried about them being presented as future of 
sharding in Postgres when nobody has sketched how the future might look 
like. And  once we get to more interesting parts like consistency, 
distributed query planning, p2p connections (and I am really concerned 
about these as FDWs abstract some knowledge that coordinator and or data 
nodes might need to do these well), etc we might very well find 
ourselves painted in the corner and have to start from beginning, while 
if we had some idea on how the whole thing might look like we could 
identify this early and not postpone built-in sharding by several years 
just because somebody said we will use FDWs and that's what we worked on 
in those years.

Note that I am not saying that other discussed approaches are any 
better, I am saying that we should know approximately what we actually 
want and not just beat FDWs with a hammer and hope sharding will 
eventually emerge and call that the plan.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: The plan for FDW-based sharding