Re: The plan for FDW-based sharding

Поиск
Список
Период
Сортировка
От Oleg Bartunov
Тема Re: The plan for FDW-based sharding
Дата
Msg-id CAF4Au4zx5zC1Zt12mM8KWiECAiDm=mv4P+RhM4jqofeZFLdm3Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: The plan for FDW-based sharding  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Ответы Re: The plan for FDW-based sharding  (Bruce Momjian <bruce@momjian.us>)
Re: The plan for FDW-based sharding  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers


On Wed, Feb 24, 2016 at 12:17 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
Hi, Bruce!

The important point for me is to distinguish different kind of plans: implementation plan and research plan.
If we're talking about implementation plan then it should be proven that proposed approach works in this case. I.e research should be already done.
If we're talking about research plan then we should realize that result is unpredictable. And we would probably need to dramatically change our way.

This two things would work with FDW:
1) Pull data from data nodes to coordinator.
2) Pushdown computations from coordinator to data nodes: joins, aggregates etc.
It's proven and clear. This is good.
Another point is that these FDW advances are useful by themselves. This is good too.

However, the model of FDW assumes that communication happen only between coordinator and data node. But full-weight distributed optimized can't be done under this restriction, because it requires every node to communicate every other node if it makes distributed query faster. And as I get, FDW approach currently have no research and no particular plan for that.

Before we consider repartitioning joins, we should probably get everything previously discussed working first.
– Join Pushdown For Parallelism, FDWs
– PartialAggregate/FinalizeAggregate
– Aggregate Pushdown For Parallelism, FDWs
– Declarative Partitioning
– Parallel-Aware Append

So, as I get we didn't ever think about possibility of data redistribution using FDW. Probably, something changed since that time. But I haven't heard about it.

On Tue, Feb 23, 2016 at 7:43 PM, Bruce Momjian <bruce@momjian.us> wrote:
Second, as part of this staged implementation, there are several use
cases that will be shardable at first, and then only later, more complex
ones.  For example, here are some use cases and the technology they
require:

1. Cross-node read-only queries on read-only shards using aggregate
queries, e.g. data warehouse:

This is the simplest to implement as it doesn't require a global
transaction manager, global snapshot manager, and the number of rows
returned from the shards is minimal because of the aggregates.

2. Cross-node read-only queries on read-only shards using non-aggregate
queries:

This will stress the coordinator to collect and process many returned
rows, and will show how well the FDW transfer mechanism scales.

FDW would work for queries which fits pull-pushdown model. I see no plan to make other queries work.
 
3. Cross-node read-only queries on read/write shards:

This will require a global snapshot manager to make sure the shards
return consistent data.

4. Cross-node read-write queries:

This will require a global snapshot manager and global snapshot manager.

At this point, it unclear why don't you refer work done in the direction of distributed transaction manager (which is also distributed snapshot manager in your terminology)
 
In 9.6, we will have FDW join and sort pushdown
(http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-s
calability.html
).  Unfortunately I don't think we will have aggregate
pushdown, so we can't test #1, but we might be able to test #2, even in
9.5.  Also, we might have better partitioning syntax in 9.6.

We need things like parallel partition access and replicated lookup
tables for more join pushdown.

In a way, because these enhancements are useful independent of sharding,
we have not tested to see how well an FDW sharding setup will work and
for which workloads.
 
This is the point I agree. I'm not objecting against any single FDW advance, because it's useful by itself.

We know Postgres XC/XL works, and scales, but we also know they require
too many code changes to be merged into Postgres (at least based on
previous discussions).  The FDW sharding approach is to enhance the
existing features of Postgres to allow as much sharding as possible.

This comparison doesn't seems correct to me. Postgres XC/XL supports data redistribution between nodes. And I haven't heard any single idea of supporting this in FDW. You are comparing not equal things.
 
Once that is done, we can see what workloads it covers and
decide if we are willing to copy the volume of code necessary
to implement all supported Postgres XC or XL workloads.
(The Postgres XL license now matches the Postgres license,
http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
Postgres XC has always used the Postgres license.)

If we are not willing to add code for the missing Postgres XC/XL
features, Postgres XC/XL will probably remain a separate fork of
Postgres.  I don't think anyone knows the answer to this question, and I
don't know how to find the answer except to keep going with our current
FDW sharding approach.

I have nothing against particular FDW advances. However, it's unclear for me that FDW should be the only sharding approach.
It's unproven that FDW can do work that Postgres XC/XL does. With FDW we can have some low-hanging fruits. That's good.
But it's unclear we can have high-hanging fruits (like data redistribution) with FDW approach. And if we can it's unclear that it would be easier than with other approaches.
Just let's don't call this community chosen plan for implementing sharding.
Until we have full picture we can't select one way and reject others.

I already several times pointed, that we need XTM to be able to continue development in different directions, since there is no clear winner.  Moreover, I think there is no fits-all  solution and while I agree we need one built-in in the core, other approaches should have ability to exists without patching.

 

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Konstantin Knizhnik
Дата:
Сообщение: Re: The plan for FDW-based sharding
Следующее
От: Artur Zakirov
Дата:
Сообщение: Re: plpgsql - DECLARE - cannot to use %TYPE or %ROWTYPE for composite types