Re: The plan for FDW-based sharding

Поиск

Список

Период

Сортировка

От	Konstantin Knizhnik
Тема	Re: The plan for FDW-based sharding
Дата	27 февраля 2016 г. 10:29:40
Msg-id	56D15059.7080403@postgrespro.ru обсуждение исходный текст
Ответ на	Re: The plan for FDW-based sharding (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: The plan for FDW-based sharding Re: The plan for FDW-based sharding
Список	pgsql-hackers

Дерево обсуждения

On 02/27/2016 06:57 AM, Robert Haas wrote:
> On Sat, Feb 27, 2016 at 1:49 AM, Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>> pg_tsdtm  is based on another approach: it is using system time as CSN and
>> doesn't require arbiter. In theory there is no limit for scalability. But
>> differences in system time and necessity to use more rounds of communication
>> have negative impact on performance.
> How do you prevent clock skew from causing serialization anomalies?

If node receives message from "feature" it just needs to wait until this future arrive.
Practically we just "adjust" system time in this case, moving it forward (certainly system time is not actually
changed,we just set correction value which need to be added to system time).

This approach was discussed in the article:
http://research.microsoft.com/en-us/people/samehe/clocksi.srds2013.pdf
I hope, in this article algorithm is explained much better than I can do here.

Few notes:
1. I can not prove that our pg_tsdtm absolutely correctly implements approach described in this article.
2. I didn't try to formally prove that our implementation can not cause some serialization anomalies.
3. We just run various synchronization tests (including simplest debit-credit test which breaks old version of
Postgtes-XL)during several days and we didn't get any inconsistencies.

4. We have tested pg_tsdtm both at single node, blade cluster and geographically distributed nodes (distance more than
thousandkilometers: one server was in Vladivostok, another in Kaliningrad). Ping between these two servers takes about
100msec.

Performance of our benchmark drops about 100 times but there was no inconsistencies.

Also I once again want to notice that primary idea of the proposed patch was not pg_tsdtm.
There are well know limitation of this  pg_tsdtm which we will try to address in future.
What we want is to include XTM API in PostgreSQL to be able to continue our experiments with different transaction
managersand implementing multimaster on top of it (our first practical goal) without affecting PostgreSQL core.

If XTM patch will be included in 9.6, then we can propose our multimaster as PostgreSQL extension and everybody can use
it.
Otherwise we have to propose our own fork of Postgres which significantly complicates using and maintaining it.

>> So there is no ideal solution which can work well for all cluster. This is
>> why it is not possible to develop just one GTM, propose it as a patch for
>> review and then (hopefully) commit it in Postgres core. IMHO it will never
>> happen. And I do not think that it is actually needed. What we need is a way
>> to be able to create own transaction managers as Postgres extension not
>> affecting its  core.
> This seems rather defeatist.  If the code is good and reliable, why
> should it not be committed to core?

Two reasons:
1. There is no ideal implementation of DTM which will fit all possible needs and be  efficient for all clusters.
2. Even if such implementation exists, still the right way of it integration is Postgres should use kind of TM API.
I hope that everybody will agree that doing it in this way:

#ifdef PGXC        /* In Postgres-XC, stop timestamp has to follow the timeline of GTM */        xlrec.xact_time =
xactStopTimestamp+ GTMdeltaTimestamp;

#else        xlrec.xact_time = xactStopTimestamp;
#endif

or in this way:
        xlrec.xact_time = xactUseGTM ? xactStopTimestamp + GTMdeltaTimestamp : xactStopTimestamp;

is very very bad idea.
In OO programming we should have abstract TM interface and several implementations of this interface, for example
MVCC_TM, 2PL_TM, Distributed_TM...
This is actually what can be done with our XTM API.
As far as Postgres is implemented in C, not in C++ we have to emulate interfaces using structures with function
pointers.
And please notice that there is completely no need to include DTM implementation in core, as far as it is not needed
foreverybody.

It can be easily distributed as extension.

I have that quite soon we can propose multimaster extension which should provides functionality similar with MySQL
Gallera.But even right now we have integrated pg_dtm and pg_tsdtm with pg_shard and postgres_fdw, allowing to provide
distributed

consistency for them.

>
>> All arguments against XTM can be applied to any other extension API in
>> Postgres, for example FDW.
>> Is it general enough? There are many useful operations which currently are
>> not handled by this API. For example performing aggregation and grouping at
>> foreign server side.  But still it is very useful and flexible mechanism,
>> allowing to implement many wonderful things.
> That is true.  And everybody is entitled to an opinion on each new
> proposed hook, as to whether that hook is general or not.  We have
> both accepted and rejected proposed hooks in the past.
>

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Pavel Stehule
Дата: 27 февраля 2016 г., 10:27:05
Сообщение: Re: raw output from copy

Следующее

От: Konstantin Knizhnik
Дата: 27 февраля 2016 г., 10:52:20
Сообщение: Re: Relation cache invalidation on replica

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: The plan for FDW-based sharding

Предыдущее

Следующее