Обсуждение: The plan for FDW-based sharding

Поиск
Список
Период
Сортировка

The plan for FDW-based sharding

От
Bruce Momjian
Дата:
There was discussion at the FOSDEM/PGDay Developer Meeting
(https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting)
about sharding so I wanted to outline where I think we are going with
sharding and FDWs.

First, let me point out that, unlike pg_upgrade and the Windows port,
which either worked or didn't work, sharding is going be implemented and
useful in stages.  It will take several years to complete, similar to
parallelism, streaming replication, and logical replication.

Second, as part of this staged implementation, there are several use
cases that will be shardable at first, and then only later, more complex
ones.  For example, here are some use cases and the technology they
require:

1. Cross-node read-only queries on read-only shards using aggregate
queries, e.g. data warehouse:

This is the simplest to implement as it doesn't require a global
transaction manager, global snapshot manager, and the number of rows
returned from the shards is minimal because of the aggregates.

2. Cross-node read-only queries on read-only shards using non-aggregate
queries:

This will stress the coordinator to collect and process many returned
rows, and will show how well the FDW transfer mechanism scales.

3. Cross-node read-only queries on read/write shards:

This will require a global snapshot manager to make sure the shards
return consistent data.

4. Cross-node read-write queries:

This will require a global snapshot manager and global snapshot manager.

In 9.6, we will have FDW join and sort pushdown
(http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-s
calability.html).  Unfortunately I don't think we will have aggregate
pushdown, so we can't test #1, but we might be able to test #2, even in
9.5.  Also, we might have better partitioning syntax in 9.6.

We need things like parallel partition access and replicated lookup
tables for more join pushdown.

In a way, because these enhancements are useful independent of sharding,
we have not tested to see how well an FDW sharding setup will work and
for which workloads.

We know Postgres XC/XL works, and scales, but we also know they require
too many code changes to be merged into Postgres (at least based on
previous discussions).  The FDW sharding approach is to enhance the
existing features of Postgres to allow as much sharding as possible.

Once that is done, we can see what workloads it covers and
decide if we are willing to copy the volume of code necessary
to implement all supported Postgres XC or XL workloads.
(The Postgres XL license now matches the Postgres license,
http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
Postgres XC has always used the Postgres license.)

If we are not willing to add code for the missing Postgres XC/XL
features, Postgres XC/XL will probably remain a separate fork of
Postgres.  I don't think anyone knows the answer to this question, and I
don't know how to find the answer except to keep going with our current
FDW sharding approach.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +



Re: The plan for FDW-based sharding

От
"David G. Johnston"
Дата:
On Tue, Feb 23, 2016 at 9:43 AM, Bruce Momjian <bruce@momjian.us> wrote:
4. Cross-node read-write queries:

This will require a global snapshot manager and global snapshot manager.

Probably meant "global transaction manager"

​David J.​

Re: The plan for FDW-based sharding

От
Bruce Momjian
Дата:
On Tue, Feb 23, 2016 at 09:54:46AM -0700, David G. Johnston wrote:
> On Tue, Feb 23, 2016 at 9:43 AM, Bruce Momjian <bruce@momjian.us> wrote:
> 
>     4. Cross-node read-write queries:
> 
>     This will require a global snapshot manager and global snapshot manager.
> 
> 
> Probably meant "global transaction manager"

Oops, yes, it should be:
4. Cross-node read-write queries:This will require a global snapshot manager and global transactionmanager.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +



Re: The plan for FDW-based sharding

От
Simon Riggs
Дата:
On 23 February 2016 at 16:43, Bruce Momjian <bruce@momjian.us> wrote:
There was discussion at the FOSDEM/PGDay Developer Meeting
(https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting)
about sharding so I wanted to outline where I think we are going with
sharding and FDWs.

I think we need to be very careful to understand that "FDWs and Sharding" is one tentative proposal amongst others, not a statement of direction for the PostgreSQL project since there is not yet any universal agreement.

We know Postgres XC/XL works, and scales

Agreed. 

In contrast, the FDW/sharding approach is as-yet unproven, and significantly without any detailed technical discussion of the exact approach and how it would work, even after more than 6 months since we first heard of it openly. Since we don't know how it will work, we have no idea how long it will take either, or even if it ever will.

I'd like to see discussion of the details in presentation/wiki form and an initial prototype, with measurements. Without these things we are still just at the speculation stage. Some alternate proposals are also at that stage.
 
, but we also know they require
too many code changes to be merged into Postgres (at least based on
previous discussions).  The FDW sharding approach is to enhance the
existing features of Postgres to allow as much sharding as possible.

Once that is done, we can see what workloads it covers and
decide if we are willing to copy the volume of code necessary
to implement all supported Postgres XC or XL workloads.
(The Postgres XL license now matches the Postgres license,
http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
Postgres XC has always used the Postgres license.)

It's never been our policy to try to include major projects in single code drops. Any move of XL/XC code into PostgreSQL core would need to be done piece by piece across many releases. XL is definitely too big for the elephant to eat in one mouthful.
 
If we are not willing to add code for the missing Postgres XC/XL
features, Postgres XC/XL will probably remain a separate fork of
Postgres. 

And if the FDW approach doesn't work, that won't be part of PostgreSQL core either...
 
I don't think anyone knows the answer to this question, and I
don't know how to find the answer except to keep going with our current
FDW sharding approach.

This is exactly the wrong time to discuss this, since we are days away from the final deadline for PostgreSQL 9.6 and the community should be focusing on that for next few months, not futures.

What I notice is that when Greenplum announced it would publish as open source its modified version of Postgres, there was some scary noise made immediately about that concerning patents etc..

Now, Postgres-XL 9.5 is recently announced and we see another scary sounding pronouncement about that *maybe* it won't be included in core. While the comments made are true, they do not solely apply to XC/XL, in fact the uncertainty applies to all approaches equally since notably we have approximately five proposals for future designs.

These comments, given their timing and nature could easily cause "Fear, Uncertainty and Doubt" in people seeing this. FUD is also the name of a sales technique designed to undermine proposals. I hope and presume it was not the intention and reason for discussing uncertainty now and earlier.

I'm glad to see that the viability of the XC/XL approach is recognized. The fact we have a working solution now is important for users, who don't want to wait the 3-5 years while we work out and implement a longer term strategy. Future upgrade support is certain, however.

What eventually gets into PostgreSQL core is as yet uncertain, as is the timescale, but my hope is that we recognize that multiple use cases can be supported rather than a single fixed architecture. It seems likely to me that the PostgreSQL project will do what it does best - take multiple comments and merge those into a combined system that is better than any of the individual single proposals.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: The plan for FDW-based sharding

От
Bruce Momjian
Дата:
On Wed, Feb 24, 2016 at 01:08:29AM +0000, Simon Riggs wrote:
> On 23 February 2016 at 16:43, Bruce Momjian <bruce@momjian.us> wrote:
> 
>     There was discussion at the FOSDEM/PGDay Developer Meeting
>     (https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting)
>     about sharding so I wanted to outline where I think we are going with
>     sharding and FDWs.
> 
> I think we need to be very careful to understand that "FDWs and Sharding" is
> one tentative proposal amongst others, not a statement of direction for the                        --------------

What other directions are proposed to add sharding to the existing
Postgres code?  If there are, I have not heard of them.  Or are they
only (regularly updated?) forks of Postgres?

> PostgreSQL project since there is not yet any universal agreement.

As I stated clearly, we are going in the FDW direction because improving
FDWs have uses beyond sharding, and once it is done we can see how well
it works for sharding.

>     We know Postgres XC/XL works, and scales
> 
> 
> Agreed. 
> 
> In contrast, the FDW/sharding approach is as-yet unproven, and significantly
> without any detailed technical discussion of the exact approach and how it
> would work, even after more than 6 months since we first heard of it openly.
> Since we don't know how it will work, we have no idea how long it will take
> either, or even if it ever will.

Yep.

> I'd like to see discussion of the details in presentation/wiki form and an
> initial prototype, with measurements. Without these things we are still just at
> the speculation stage. Some alternate proposals are also at that stage.

Uh, what "alternate proposals"?

My point was that we know XC/XL works, but there is too much code change
for us, so maybe FDWs will make built-in sharding possible/easier.

>     , but we also know they require
>     too many code changes to be merged into Postgres (at least based on
>     previous discussions).  The FDW sharding approach is to enhance the
>     existing features of Postgres to allow as much sharding as possible.
> 
>     Once that is done, we can see what workloads it covers and
>     decide if we are willing to copy the volume of code necessary
>     to implement all supported Postgres XC or XL workloads.
>     (The Postgres XL license now matches the Postgres license,
>     http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
>     Postgres XC has always used the Postgres license.)
> 
> 
> It's never been our policy to try to include major projects in single code
> drops. Any move of XL/XC code into PostgreSQL core would need to be done piece
> by piece across many releases. XL is definitely too big for the elephant to eat
> in one mouthful.

Is there any plan to move the XL/XC code into Postgres?  If so, I have
not heard of it.  I thought everyone agreed it was too much code change,
which is why it is a separate code tree.  Is that incorrect?

>     If we are not willing to add code for the missing Postgres XC/XL
>     features, Postgres XC/XL will probably remain a separate fork of
>     Postgres. 
> 
> 
> And if the FDW approach doesn't work, that won't be part of PostgreSQL core
> either...

Uh, duh.  Yeah, that's what I said.  What is your point?  I said we
don't know if it will work, as you quoted below:

>     I don't think anyone knows the answer to this question, and I
>     don't know how to find the answer except to keep going with our current
>     FDW sharding approach.
> 
> 
> This is exactly the wrong time to discuss this, since we are days away from the
> final deadline for PostgreSQL 9.6 and the community should be focusing on that
> for next few months, not futures.

I posted this because of the discussion at the FOSDEM meeting, and to
address the questions you asked in that meeting.  I even told you last
week on IM that I was going to post this for that stated purpose.  I
didn't pick the time at random.

> What I notice is that when Greenplum announced it would publish as open source
> its modified version of Postgres, there was some scary noise made immediately
> about that concerning patents etc..

> Now, Postgres-XL 9.5 is recently announced and we see another scary sounding
> pronouncement about that *maybe* it won't be included in core. While the
> comments made are true, they do not solely apply to XC/XL, in fact the
> uncertainty applies to all approaches equally since notably we have
> approximately five proposals for future designs.
> 
> These comments, given their timing and nature could easily cause "Fear,
> Uncertainty and Doubt" in people seeing this. FUD is also the name of a sales
> technique designed to undermine proposals. I hope and presume it was not the
> intention and reason for discussing uncertainty now and earlier.

Oh, I absolutely did this as a way to undermine what _everyone_ else is
doing?  Is there another way to behave?

I find this insulting.  Others made the same remarks when I questioned
the patents, and earlier when I questioned if we would integrate the
Greenplum code after their press release.  And you know what, we didn't
want the Greenplum code (yet), and I explained how open source code with
patents is riskier than closed-source code with patents, and I think
people finally understood that, including you.

When people don't like what I have to say, they figure their must be
some other motive, because I certainly couldn't think this on my own? 
Really?  Have I not been around long enough for people to realize that
is not the case!

If you _presume_ did not have some undermining motivation for posting
this, why did you mention it?  You obviously _do_ think I have some
external motivation for talking about FDWs now or you wouldn't have
mentioned it.  (I can't even think of what the motivation would be.)

Let me come out and say what people might be thinking:  I realize it is
unfortunate that _if_ FDWs succeed in sharding, the value of the work
done on Postgres XC/XL will be diminished.  I personally think that
Postgres needs a built-in sharding solution, just like I thought we
needed a native Windows port, in-place upgrade, and parallelism.  I was
hopeful XC/XL could be integrated into Postgres, but based on
discussions, it seems that is not acceptable, so the FDW/sharding
approach is that only built-in one I can think of.  Are there other
possibilities?

I talk about it and try to get people excited about it.  I make no
apologies for that.  I will talk about this forever, or as long as
people will listen, so you can expect to hear about it.  I am sure I
will think of other "crazy" things to talk about too because the other
items I mentioned above were also considered odd/crazy at the time I
proposed them.

> I'm glad to see that the viability of the XC/XL approach is recognized. The
> fact we have a working solution now is important for users, who don't want to
> wait the 3-5 years while we work out and implement a longer term strategy.
> Future upgrade support is certain, however.

Yes, no question.  The benchmarks of XC/XL looked amazing.  Can you
remind me of the URLs for that?  Do you have any new ones?

In a way, I don't see any need for an FDW sharding prototype because, as
I said, we already know XC/XL work, so copying what they do doesn't
help.  What we need to know is if we can get near the XC/XL benchmarks
with an acceptable addition of code, which is what I thought I already
said.  Perhaps this can be done with FDWs, or some other approach I have
not heard of yet.

> What eventually gets into PostgreSQL core is as yet uncertain, as is the
> timescale, but my hope is that we recognize that multiple use cases can be
> supported rather than a single fixed architecture. It seems likely to me that
> the PostgreSQL project will do what it does best - take multiple comments and
> merge those into a combined system that is better than any of the individual
> single proposals.

Agreed.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +



Re: The plan for FDW-based sharding

От
Alexander Korotkov
Дата:
Hi, Bruce!

The important point for me is to distinguish different kind of plans: implementation plan and research plan.
If we're talking about implementation plan then it should be proven that proposed approach works in this case. I.e research should be already done.
If we're talking about research plan then we should realize that result is unpredictable. And we would probably need to dramatically change our way.

This two things would work with FDW:
1) Pull data from data nodes to coordinator.
2) Pushdown computations from coordinator to data nodes: joins, aggregates etc.
It's proven and clear. This is good.
Another point is that these FDW advances are useful by themselves. This is good too.

However, the model of FDW assumes that communication happen only between coordinator and data node. But full-weight distributed optimized can't be done under this restriction, because it requires every node to communicate every other node if it makes distributed query faster. And as I get, FDW approach currently have no research and no particular plan for that.

Before we consider repartitioning joins, we should probably get everything previously discussed working first.
– Join Pushdown For Parallelism, FDWs
– PartialAggregate/FinalizeAggregate
– Aggregate Pushdown For Parallelism, FDWs
– Declarative Partitioning
– Parallel-Aware Append

So, as I get we didn't ever think about possibility of data redistribution using FDW. Probably, something changed since that time. But I haven't heard about it.

On Tue, Feb 23, 2016 at 7:43 PM, Bruce Momjian <bruce@momjian.us> wrote:
Second, as part of this staged implementation, there are several use
cases that will be shardable at first, and then only later, more complex
ones.  For example, here are some use cases and the technology they
require:

1. Cross-node read-only queries on read-only shards using aggregate
queries, e.g. data warehouse:

This is the simplest to implement as it doesn't require a global
transaction manager, global snapshot manager, and the number of rows
returned from the shards is minimal because of the aggregates.

2. Cross-node read-only queries on read-only shards using non-aggregate
queries:

This will stress the coordinator to collect and process many returned
rows, and will show how well the FDW transfer mechanism scales.

FDW would work for queries which fits pull-pushdown model. I see no plan to make other queries work.
 
3. Cross-node read-only queries on read/write shards:

This will require a global snapshot manager to make sure the shards
return consistent data.

4. Cross-node read-write queries:

This will require a global snapshot manager and global snapshot manager.

At this point, it unclear why don't you refer work done in the direction of distributed transaction manager (which is also distributed snapshot manager in your terminology)
 
In 9.6, we will have FDW join and sort pushdown
(http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-s
calability.html
).  Unfortunately I don't think we will have aggregate
pushdown, so we can't test #1, but we might be able to test #2, even in
9.5.  Also, we might have better partitioning syntax in 9.6.

We need things like parallel partition access and replicated lookup
tables for more join pushdown.

In a way, because these enhancements are useful independent of sharding,
we have not tested to see how well an FDW sharding setup will work and
for which workloads.
 
This is the point I agree. I'm not objecting against any single FDW advance, because it's useful by itself.

We know Postgres XC/XL works, and scales, but we also know they require
too many code changes to be merged into Postgres (at least based on
previous discussions).  The FDW sharding approach is to enhance the
existing features of Postgres to allow as much sharding as possible.

This comparison doesn't seems correct to me. Postgres XC/XL supports data redistribution between nodes. And I haven't heard any single idea of supporting this in FDW. You are comparing not equal things.
 
Once that is done, we can see what workloads it covers and
decide if we are willing to copy the volume of code necessary
to implement all supported Postgres XC or XL workloads.
(The Postgres XL license now matches the Postgres license,
http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
Postgres XC has always used the Postgres license.)

If we are not willing to add code for the missing Postgres XC/XL
features, Postgres XC/XL will probably remain a separate fork of
Postgres.  I don't think anyone knows the answer to this question, and I
don't know how to find the answer except to keep going with our current
FDW sharding approach.

I have nothing against particular FDW advances. However, it's unclear for me that FDW should be the only sharding approach.
It's unproven that FDW can do work that Postgres XC/XL does. With FDW we can have some low-hanging fruits. That's good.
But it's unclear we can have high-hanging fruits (like data redistribution) with FDW approach. And if we can it's unclear that it would be easier than with other approaches.
Just let's don't call this community chosen plan for implementing sharding.
Until we have full picture we can't select one way and reject others.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: The plan for FDW-based sharding

От
Konstantin Knizhnik
Дата:
Sorry, but based on this plan it is possible to make a conclusion that 
there are only two possible cluster solutions for Postgres:
XC/XL and FDW-based.  From my point of view there are  much more 
possible alternatives.
Our main idea with XTM (eXtensible Transaction Manager API) was to make 
it possible to develop cluster solutions for Postgres as extensions 
without patching code of Postgres core. And FDW is one of the mechanism 
which makes it possible to reach this goal.

IMHO it will be hard to implement efficient execution of complex OLAP 
queries (including cross-node joins  and aggregation) within FDW 
paradigm. It will be necessary to build distributed query execution plan 
and coordinate it execution at cluster nodes. And definitely we need 
specialized optimizer for distributed queries. Right now solution of the 
problem are provided by XL and Greenplum, but both are forks of Posrgres 
with a lot of changes in Postgres core. The challenge is to provide the 
similar functionality, but at extension level (using custom nodes, 
pluggable transaction manager, ...).

But, as you noticed,  complex OLAP is just one of the scenarios and this 
is not the only possible way of using clusters. In some cases FDW-based 
sharding can be quite efficient. Or pg_shard approach which also adds 
sharding at extension level and in some aspects is more flexible than 
FDW-based solution. Not all scenarios require global transaction 
manager. But if one need global consistency, then XTM API allows to 
provide ACID for both approaches (and not only for them).

We currently added to commitfest our XTM patch together with 
postgres_fdw patch integrating timestamp-based DTM implementation in 
postgres_fdw. It illustrates how global consistency canbe reached for 
FDW-based sharding.
If this XTM patch will be committed, then in 9.6 we will have wide 
flexibility to play with different distributed transaction managers. And 
it can be used for many cluster solutions.

IMHO it will be very useful to extend your classification of cluster use 
cases, more precisely  formulate demands in all cases, investigate  how 
them can be covered by existed cluster solutions for Postgres and which 
niches are still vacant. We are currently continue work on "multimaster" 
- some more convenient alternative to hot-standby replication. Looks 
like PostgreSQL is missing some product providing functionality similar 
to Oracle RAC or MySQL Gallera. It is yet another direction of cluster 
development for PostgreSQL.  Let's be more open and flexible.


On 23.02.2016 19:43, Bruce Momjian wrote:
> There was discussion at the FOSDEM/PGDay Developer Meeting
> (https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting)
> about sharding so I wanted to outline where I think we are going with
> sharding and FDWs.
>
> First, let me point out that, unlike pg_upgrade and the Windows port,
> which either worked or didn't work, sharding is going be implemented and
> useful in stages.  It will take several years to complete, similar to
> parallelism, streaming replication, and logical replication.
>
> Second, as part of this staged implementation, there are several use
> cases that will be shardable at first, and then only later, more complex
> ones.  For example, here are some use cases and the technology they
> require:
>
> 1. Cross-node read-only queries on read-only shards using aggregate
> queries, e.g. data warehouse:
>
> This is the simplest to implement as it doesn't require a global
> transaction manager, global snapshot manager, and the number of rows
> returned from the shards is minimal because of the aggregates.
>
> 2. Cross-node read-only queries on read-only shards using non-aggregate
> queries:
>
> This will stress the coordinator to collect and process many returned
> rows, and will show how well the FDW transfer mechanism scales.
>
> 3. Cross-node read-only queries on read/write shards:
>
> This will require a global snapshot manager to make sure the shards
> return consistent data.
>
> 4. Cross-node read-write queries:
>
> This will require a global snapshot manager and global snapshot manager.
>
> In 9.6, we will have FDW join and sort pushdown
> (http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-s
> calability.html).  Unfortunately I don't think we will have aggregate
> pushdown, so we can't test #1, but we might be able to test #2, even in
> 9.5.  Also, we might have better partitioning syntax in 9.6.
>
> We need things like parallel partition access and replicated lookup
> tables for more join pushdown.
>
> In a way, because these enhancements are useful independent of sharding,
> we have not tested to see how well an FDW sharding setup will work and
> for which workloads.
>
> We know Postgres XC/XL works, and scales, but we also know they require
> too many code changes to be merged into Postgres (at least based on
> previous discussions).  The FDW sharding approach is to enhance the
> existing features of Postgres to allow as much sharding as possible.
>
> Once that is done, we can see what workloads it covers and
> decide if we are willing to copy the volume of code necessary
> to implement all supported Postgres XC or XL workloads.
> (The Postgres XL license now matches the Postgres license,
> http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
> Postgres XC has always used the Postgres license.)
>
> If we are not willing to add code for the missing Postgres XC/XL
> features, Postgres XC/XL will probably remain a separate fork of
> Postgres.  I don't think anyone knows the answer to this question, and I
> don't know how to find the answer except to keep going with our current
> FDW sharding approach.
>

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: The plan for FDW-based sharding

От
Oleg Bartunov
Дата:


On Wed, Feb 24, 2016 at 12:17 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
Hi, Bruce!

The important point for me is to distinguish different kind of plans: implementation plan and research plan.
If we're talking about implementation plan then it should be proven that proposed approach works in this case. I.e research should be already done.
If we're talking about research plan then we should realize that result is unpredictable. And we would probably need to dramatically change our way.

This two things would work with FDW:
1) Pull data from data nodes to coordinator.
2) Pushdown computations from coordinator to data nodes: joins, aggregates etc.
It's proven and clear. This is good.
Another point is that these FDW advances are useful by themselves. This is good too.

However, the model of FDW assumes that communication happen only between coordinator and data node. But full-weight distributed optimized can't be done under this restriction, because it requires every node to communicate every other node if it makes distributed query faster. And as I get, FDW approach currently have no research and no particular plan for that.

Before we consider repartitioning joins, we should probably get everything previously discussed working first.
– Join Pushdown For Parallelism, FDWs
– PartialAggregate/FinalizeAggregate
– Aggregate Pushdown For Parallelism, FDWs
– Declarative Partitioning
– Parallel-Aware Append

So, as I get we didn't ever think about possibility of data redistribution using FDW. Probably, something changed since that time. But I haven't heard about it.

On Tue, Feb 23, 2016 at 7:43 PM, Bruce Momjian <bruce@momjian.us> wrote:
Second, as part of this staged implementation, there are several use
cases that will be shardable at first, and then only later, more complex
ones.  For example, here are some use cases and the technology they
require:

1. Cross-node read-only queries on read-only shards using aggregate
queries, e.g. data warehouse:

This is the simplest to implement as it doesn't require a global
transaction manager, global snapshot manager, and the number of rows
returned from the shards is minimal because of the aggregates.

2. Cross-node read-only queries on read-only shards using non-aggregate
queries:

This will stress the coordinator to collect and process many returned
rows, and will show how well the FDW transfer mechanism scales.

FDW would work for queries which fits pull-pushdown model. I see no plan to make other queries work.
 
3. Cross-node read-only queries on read/write shards:

This will require a global snapshot manager to make sure the shards
return consistent data.

4. Cross-node read-write queries:

This will require a global snapshot manager and global snapshot manager.

At this point, it unclear why don't you refer work done in the direction of distributed transaction manager (which is also distributed snapshot manager in your terminology)
 
In 9.6, we will have FDW join and sort pushdown
(http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-s
calability.html
).  Unfortunately I don't think we will have aggregate
pushdown, so we can't test #1, but we might be able to test #2, even in
9.5.  Also, we might have better partitioning syntax in 9.6.

We need things like parallel partition access and replicated lookup
tables for more join pushdown.

In a way, because these enhancements are useful independent of sharding,
we have not tested to see how well an FDW sharding setup will work and
for which workloads.
 
This is the point I agree. I'm not objecting against any single FDW advance, because it's useful by itself.

We know Postgres XC/XL works, and scales, but we also know they require
too many code changes to be merged into Postgres (at least based on
previous discussions).  The FDW sharding approach is to enhance the
existing features of Postgres to allow as much sharding as possible.

This comparison doesn't seems correct to me. Postgres XC/XL supports data redistribution between nodes. And I haven't heard any single idea of supporting this in FDW. You are comparing not equal things.
 
Once that is done, we can see what workloads it covers and
decide if we are willing to copy the volume of code necessary
to implement all supported Postgres XC or XL workloads.
(The Postgres XL license now matches the Postgres license,
http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
Postgres XC has always used the Postgres license.)

If we are not willing to add code for the missing Postgres XC/XL
features, Postgres XC/XL will probably remain a separate fork of
Postgres.  I don't think anyone knows the answer to this question, and I
don't know how to find the answer except to keep going with our current
FDW sharding approach.

I have nothing against particular FDW advances. However, it's unclear for me that FDW should be the only sharding approach.
It's unproven that FDW can do work that Postgres XC/XL does. With FDW we can have some low-hanging fruits. That's good.
But it's unclear we can have high-hanging fruits (like data redistribution) with FDW approach. And if we can it's unclear that it would be easier than with other approaches.
Just let's don't call this community chosen plan for implementing sharding.
Until we have full picture we can't select one way and reject others.

I already several times pointed, that we need XTM to be able to continue development in different directions, since there is no clear winner.  Moreover, I think there is no fits-all  solution and while I agree we need one built-in in the core, other approaches should have ability to exists without patching.

 

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: The plan for FDW-based sharding

От
Bruce Momjian
Дата:
On Wed, Feb 24, 2016 at 12:17:28PM +0300, Alexander Korotkov wrote:
> Hi, Bruce!
> 
> The important point for me is to distinguish different kind of plans:
> implementation plan and research plan.
> If we're talking about implementation plan then it should be proven that
> proposed approach works in this case. I.e research should be already done.
> If we're talking about research plan then we should realize that result is
> unpredictable. And we would probably need to dramatically change our way.

Yes, good point.  I would say FDW-based sharding is certainly still a
research approach, but an odd one because we are adding code even while
in research mode.  I think that is possible because the FDW improvements
have other uses beyond sharding.

I think another aspect is that we already know that modifying the
Postgres source code can produce a useful sharding solution --- XC, XL,
Greenplum, and CitusDB all prove that, and pg_shard does it as a plugin.
So, we know that with unlimited code changes, it is possible.  What we
don't know is whether it is possible with acceptable code changes, and
how much of the feature-set can be supported this way.

We had a similar case with the Windows port, where SRA (my employer at
the time) and Nusphere both had native Windows ports of Postgres, and
they supplied source code to help with the port.  So, in that case also,
we knew a native Windows port was possible, and we (or at least I) could
see the code that was required to do it.  The big question was whether a
native Windows port could be added in a community-acceptable way, and
the community agreed we could try if we didn't make the code messier ---
that was a success.

For pg_upgrade, I had code from EDB (my employer at the time) that kind
of worked, but needed lots of polish, and again, I could do it in
contrib as long as I didn't mess up the backend code --- that worked
well too.

So, I guess I am saying, the FDW/sharding thing is a research project,
but one that is implementing code because of existing proven solutions
and because the improvements are benefiting other use-cases beyond
sharding.

Also, in the big picture, the existence of many Postgres forks, all
doing sharding, indicates that there is demand for this capability, and
if we can get some this capability into Postgres we will increase the
number of people using native Postgres.  We might also be able to reduce
the amount of duplicate work being done in all these forks and allow
them to more easily focus on more advanced use-cases.

> This two things would work with FDW:
> 1) Pull data from data nodes to coordinator.
> 2) Pushdown computations from coordinator to data nodes: joins, aggregates etc.
> It's proven and clear. This is good.
> Another point is that these FDW advances are useful by themselves. This is good
> too.
> 
> However, the model of FDW assumes that communication happen only between
> coordinator and data node. But full-weight distributed optimized can't be done
> under this restriction, because it requires every node to communicate every
> other node if it makes distributed query faster. And as I get, FDW approach
> currently have no research and no particular plan for that.

This is very true.  I imagine cross-node connections will certainly
complicate the implementation and lead to significant code changes,
which might be unacceptable.  I think we need to go with a
non-cross-node implementation first, then if that is accepted, we can
start to think what cross-node code changes would look like.  It
certainly would require FDW knowledge to exist on every shard.  Some
have suggested that FDWs wouldn't work well for cross-node connections
or wouldn't scale and we shouldn't be using them --- I am not sure what
to think of that.

> As I get from Robert Haas's talk (https://docs.google.com/viewer?a=v&pid=sites&
> srcid=ZGVmYXVsdGRvbWFpbnxyb2JlcnRtaGFhc3xneDo1ZmFhYzBhNjNhNzVhMDM0)
> 
>     Before we consider repartitioning joins, we should probably get everything
>     previously discussed working first.
>     – Join Pushdown For Parallelism, FDWs
>     – PartialAggregate/FinalizeAggregate
>     – Aggregate Pushdown For Parallelism, FDWs
>     – Declarative Partitioning
>     – Parallel-Aware Append
> 
> 
> So, as I get we didn't ever think about possibility of data redistribution
> using FDW. Probably, something changed since that time. But I haven't heard
> about it.

No, you didn't miss it.  :-(  We just haven't gotten to studying that
yet.  One possible outcome is that built-in Postgres has non-cross-node
sharding, and forks of Postgres have cross-node sharding, again assuming
cross-node sharding requires an unacceptable amount of code change.  I
don't think anyone knows the answer yet.

> On Tue, Feb 23, 2016 at 7:43 PM, Bruce Momjian <bruce@momjian.us> wrote:
> 
>     Second, as part of this staged implementation, there are several use
>     cases that will be shardable at first, and then only later, more complex
>     ones.  For example, here are some use cases and the technology they
>     require:
> 
>     1. Cross-node read-only queries on read-only shards using aggregate
>     queries, e.g. data warehouse:
> 
>     This is the simplest to implement as it doesn't require a global
>     transaction manager, global snapshot manager, and the number of rows
>     returned from the shards is minimal because of the aggregates.
> 
>     2. Cross-node read-only queries on read-only shards using non-aggregate
>     queries:
> 
>     This will stress the coordinator to collect and process many returned
>     rows, and will show how well the FDW transfer mechanism scales.
> 
> 
> FDW would work for queries which fits pull-pushdown model. I see no plan to
> make other queries work.

Yep, see above.

>     3. Cross-node read-only queries on read/write shards:
> 
>     This will require a global snapshot manager to make sure the shards
>     return consistent data.
> 
>     4. Cross-node read-write queries:
> 
>     This will require a global snapshot manager and global snapshot manager.
> 
> 
> At this point, it unclear why don't you refer work done in the direction of
> distributed transaction manager (which is also distributed snapshot manager in
> your terminology)
> http://www.postgresql.org/message-id/56BB7880.4020604@postgrespro.ru

Yes, there is certainly great work being done on that.  I should have
included a URL for that --- glad you did.  I wasn't aware it also was a
distributed snapshot manager.  :-)  And again, as you said earlier, it
is useful for more things that just FDW sharding.

>     In 9.6, we will have FDW join and sort pushdown
>     (http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-s
>     calability.html).  Unfortunately I don't think we will have aggregate
>     pushdown, so we can't test #1, but we might be able to test #2, even in
>     9.5.  Also, we might have better partitioning syntax in 9.6.
> 
>     We need things like parallel partition access and replicated lookup
>     tables for more join pushdown.
> 
>     In a way, because these enhancements are useful independent of sharding,
>     we have not tested to see how well an FDW sharding setup will work and
>     for which workloads.
> 
>  
> This is the point I agree. I'm not objecting against any single FDW advance,
> because it's useful by itself.
> 
> 
>     We know Postgres XC/XL works, and scales, but we also know they require
>     too many code changes to be merged into Postgres (at least based on
>     previous discussions).  The FDW sharding approach is to enhance the
>     existing features of Postgres to allow as much sharding as possible.
> 
> 
> This comparison doesn't seems correct to me. Postgres XC/XL supports data
> redistribution between nodes. And I haven't heard any single idea of supporting
> this in FDW. You are comparing not equal things.

Well, as far as I know XC doesn't support data redistribution between
nodes and I saw good benchmarks of that, as well as XL.  We didn't merge
in the XC code, so I assume the XL implementation of non-cross-node
sharding also would be too much code to digest, which is why we are
trying FDW sharding.  As I said, we will see how much of the Postgres
XC/XL workload can be accomplished with FDWs.

>     Once that is done, we can see what workloads it covers and
>     decide if we are willing to copy the volume of code necessary
>     to implement all supported Postgres XC or XL workloads.
>     (The Postgres XL license now matches the Postgres license,
>     http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
>     Postgres XC has always used the Postgres license.)
> 
>     If we are not willing to add code for the missing Postgres XC/XL
>     features, Postgres XC/XL will probably remain a separate fork of
>     Postgres.  I don't think anyone knows the answer to this question, and I
>     don't know how to find the answer except to keep going with our current
>     FDW sharding approach.
> 
> 
> I have nothing against particular FDW advances. However, it's unclear for me
> that FDW should be the only sharding approach.
> It's unproven that FDW can do work that Postgres XC/XL does. With FDW we can
> have some low-hanging fruits. That's good.
> But it's unclear we can have high-hanging fruits (like data redistribution)
> with FDW approach. And if we can it's unclear that it would be easier than with
> other approaches.
> Just let's don't call this community chosen plan for implementing sharding.
> Until we have full picture we can't select one way and reject others.

I agree.  I think the FDW approach is the only existing approach for
built-in sharding though.  The forks of Postgres doing sharding are,
just that, forks and just Postgres community ecosystem projects.   (Yes,
they are open source.)  If the forks were community-chosen plans we
hopefully would not have 5+ of them.  If FDW works, it has the potential
to be the community-chosen plan, at least for the workloads it supports,
because it is built into community Postgres in a way the others cannot.

That doesn't mean the forks go away, but rather their value is in doing
things the FDW approach can't, but there are a lot of "if's" in there.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +



Re: The plan for FDW-based sharding

От
Bruce Momjian
Дата:
On Wed, Feb 24, 2016 at 12:35:15PM +0300, Oleg Bartunov wrote:
>     I have nothing against particular FDW advances. However, it's unclear for
>     me that FDW should be the only sharding approach.
>     It's unproven that FDW can do work that Postgres XC/XL does. With FDW we
>     can have some low-hanging fruits. That's good.
>     But it's unclear we can have high-hanging fruits (like data redistribution)
>     with FDW approach. And if we can it's unclear that it would be easier than
>     with other approaches.
>     Just let's don't call this community chosen plan for implementing sharding.
>     Until we have full picture we can't select one way and reject others.
> 
> 
> I already several times pointed, that we need XTM to be able to continue
> development in different directions, since there is no clear winner.  Moreover,
> I think there is no fits-all  solution and while I agree we need one built-in
> in the core, other approaches should have ability to exists without patching.

Yep.  I think much of what we eventually add to core will be either
copied from an existing soltion, which then doesn't need to be
maintained anymore, or used by existing solutions.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +



Re: The plan for FDW-based sharding

От
Bruce Momjian
Дата:
On Wed, Feb 24, 2016 at 12:22:20PM +0300, Konstantin Knizhnik wrote:
> Sorry, but based on this plan it is possible to make a conclusion
> that there are only two possible cluster solutions for Postgres:
> XC/XL and FDW-based.  From my point of view there are  much more
> possible alternatives.
> Our main idea with XTM (eXtensible Transaction Manager API) was to
> make it possible to develop cluster solutions for Postgres as
> extensions without patching code of Postgres core. And FDW is one of
> the mechanism which makes it possible to reach this goal.

Yes, this is a good example of code reuse.

> IMHO it will be hard to implement efficient execution of complex
> OLAP queries (including cross-node joins  and aggregation) within
> FDW paradigm. It will be necessary to build distributed query
> execution plan and coordinate it execution at cluster nodes. And
> definitely we need specialized optimizer for distributed queries.
> Right now solution of the problem are provided by XL and Greenplum,
> but both are forks of Posrgres with a lot of changes in Postgres
> core. The challenge is to provide the similar functionality, but at
> extension level (using custom nodes, pluggable transaction manager,
> ...).

Agreed.

> But, as you noticed,  complex OLAP is just one of the scenarios and
> this is not the only possible way of using clusters. In some cases
> FDW-based sharding can be quite efficient. Or pg_shard approach
> which also adds sharding at extension level and in some aspects is
> more flexible than FDW-based solution. Not all scenarios require
> global transaction manager. But if one need global consistency, then
> XTM API allows to provide ACID for both approaches (and not only for
> them).

Yep.

> We currently added to commitfest our XTM patch together with
> postgres_fdw patch integrating timestamp-based DTM implementation in
> postgres_fdw. It illustrates how global consistency canbe reached
> for FDW-based sharding.
> If this XTM patch will be committed, then in 9.6 we will have wide
> flexibility to play with different distributed transaction managers.
> And it can be used for many cluster solutions.
> 
> IMHO it will be very useful to extend your classification of cluster
> use cases, more precisely  formulate demands in all cases,
> investigate  how them can be covered by existed cluster solutions
> for Postgres and which niches are still vacant. We are currently
> continue work on "multimaster" - some more convenient alternative to
> hot-standby replication. Looks like PostgreSQL is missing some
> product providing functionality similar to Oracle RAC or MySQL
> Gallera. It is yet another direction of cluster development for
> PostgreSQL.  Let's be more open and flexible.

Yes, I listed only the workloads I could think of.  It would be helpful
to list more workloads and start to decide what can be accomplished with
each approach.  I don't even know all the workloads supported by the
sharding forks of Postgres.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +



Re: The plan for FDW-based sharding

От
Bruce Momjian
Дата:
On Wed, Feb 24, 2016 at 09:34:37AM -0500, Bruce Momjian wrote:
> > I have nothing against particular FDW advances. However, it's unclear for me
> > that FDW should be the only sharding approach.
> > It's unproven that FDW can do work that Postgres XC/XL does. With FDW we can
> > have some low-hanging fruits. That's good.
> > But it's unclear we can have high-hanging fruits (like data redistribution)
> > with FDW approach. And if we can it's unclear that it would be easier than with
> > other approaches.
> > Just let's don't call this community chosen plan for implementing sharding.
> > Until we have full picture we can't select one way and reject others.
> 
> I agree.  I think the FDW approach is the only existing approach for
> built-in sharding though.  The forks of Postgres doing sharding are,
> just that, forks and just Postgres community ecosystem projects.   (Yes,
> they are open source.)  If the forks were community-chosen plans we
> hopefully would not have 5+ of them.  If FDW works, it has the potential
> to be the community-chosen plan, at least for the workloads it supports,
> because it is built into community Postgres in a way the others cannot.
> 
> That doesn't mean the forks go away, but rather their value is in doing
> things the FDW approach can't, but there are a lot of "if's" in there.

Actually, this seems similar to how we handled replication.  For years
we had multiple external replication solutions.  When we implemented
streaming replication, we knew it would become the default for workloads
it supports.  The external solutions didn't go away, but their value was
in handling workloads that streaming replication didn't support.

I think the only difference is that we knew streaming replication would
have this effect before we implemented it, while with FDW-based
sharding, we don't know.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +



Re: The plan for FDW-based sharding

От
Alvaro Herrera
Дата:
Bruce Momjian wrote:
> On Wed, Feb 24, 2016 at 01:08:29AM +0000, Simon Riggs wrote:

> > It's never been our policy to try to include major projects in single code
> > drops. Any move of XL/XC code into PostgreSQL core would need to be done piece
> > by piece across many releases. XL is definitely too big for the elephant to eat
> > in one mouthful.
> 
> Is there any plan to move the XL/XC code into Postgres?  If so, I have
> not heard of it.  I thought everyone agreed it was too much code change,
> which is why it is a separate code tree.  Is that incorrect?

Yes, I think that's incorrect.

What was said, as I understood it, is that Postgres-XL is too big to
merge in a single commit -- just like merging BDR would have been.
Indulge me while I make a parallel with BDR for a bit.
2ndQuadrant never pushed for merging BDR in a single commit; what was
done was to split it, and propose individual pieces for commit.  Many of
these pieces are now already committed (event triggers, background
workers, logical decoding, replication slots, and many others).  The
"BDR patch" is now much smaller, and it's quite possible that we will
see it merged someday.  Will it be different from what it was when the
BDR project started, all those years ago?  You bet.  Having the
prototype BDR initially was what allowed the whole plan to make sense,
because it showed that the pieces interacted in the right ways to make
it work as a whole.

(I'm not saying 2ndQuadrant is so wise to do things this way.  I'm
pretty sure you can see the same thing in parallel query development,
for instance.)

In the same way, Postgres-XL is far too big to merge in a single commit.
But that doesn't mean it will never be merged.  What is more likely to
happen instead is that some pieces of it are going to be submitted
separately for consideration.  It is a slow process, but progress is
real and tangible.  We know this process will yield a useful outcome,
because the architecture has already proven by the existance of
Postgres-XL itself.  It's the prototype that proves the overall design,
even if the pieces change shape during the process.  (Really, it's way
more than merely a prototype at this point because of how long it has
matured.)

In contrast, we don't have a prototype for FDW-based sharding; as you
admitted, there is no actual plan, other than "let's push FDWs in this
direction and hope that sharding will emerge".  We don't really know
what pieces we need or how will they interact with each other; we have a
vague idea of a direction but there's no clear path forward.  As the
saying goes, if you don't know where you're going, you will probably end
up somewhere else.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: The plan for FDW-based sharding

От
Bruce Momjian
Дата:
On Wed, Feb 24, 2016 at 01:02:21PM -0300, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > On Wed, Feb 24, 2016 at 01:08:29AM +0000, Simon Riggs wrote:
> 
> > > It's never been our policy to try to include major projects in single code
> > > drops. Any move of XL/XC code into PostgreSQL core would need to be done piece
> > > by piece across many releases. XL is definitely too big for the elephant to eat
> > > in one mouthful.
> > 
> > Is there any plan to move the XL/XC code into Postgres?  If so, I have
> > not heard of it.  I thought everyone agreed it was too much code change,
> > which is why it is a separate code tree.  Is that incorrect?
> 
> Yes, I think that's incorrect.
> 
> What was said, as I understood it, is that Postgres-XL is too big to
> merge in a single commit -- just like merging BDR would have been.
> Indulge me while I make a parallel with BDR for a bit.
> 2ndQuadrant never pushed for merging BDR in a single commit; what was
> done was to split it, and propose individual pieces for commit.  Many of
> these pieces are now already committed (event triggers, background
> workers, logical decoding, replication slots, and many others).  The
> "BDR patch" is now much smaller, and it's quite possible that we will
> see it merged someday.  Will it be different from what it was when the
> BDR project started, all those years ago?  You bet.  Having the
> prototype BDR initially was what allowed the whole plan to make sense,
> because it showed that the pieces interacted in the right ways to make
> it work as a whole.

Yes, that is my understanding too.

> (I'm not saying 2ndQuadrant is so wise to do things this way.  I'm
> pretty sure you can see the same thing in parallel query development,
> for instance.)
> 
> In the same way, Postgres-XL is far too big to merge in a single commit.
> But that doesn't mean it will never be merged.  What is more likely to
> happen instead is that some pieces of it are going to be submitted
> separately for consideration.  It is a slow process, but progress is
> real and tangible.  We know this process will yield a useful outcome,

I was not aware there was any process to merge XC/XL into Postgres, at
least from the XC/XL side.  I know there is desire to take code from
XC/XL on the FDW-sharding side.

I think the most conservative merge approach is to try to enhance
existing Postgres features first (FDWs, partitioning, parallelism),
perhaps features that didn't exist at the time XC/XL were designed. If
they work, keep them and add the XC/XL-specific parts.  If the
enhance-features approach doesn't work, we then have to consider how
much additional code will be needed.  We have to evaluate this for the
FDW-based approach too, but it is likely to be smaller, which is its
attraction.

> because the architecture has already proven by the existence of
> Postgres-XL itself.  It's the prototype that proves the overall design,
> even if the pieces change shape during the process.  (Really, it's way
> more than merely a prototype at this point because of how long it has
> matured.)

True, it is beyond a prototype.

> In contrast, we don't have a prototype for FDW-based sharding; as you
> admitted, there is no actual plan, other than "let's push FDWs in this
> direction and hope that sharding will emerge".  We don't really know
> what pieces we need or how will they interact with each other; we have a
> vague idea of a direction but there's no clear path forward.  As the
> saying goes, if you don't know where you're going, you will probably end
> up somewhere else.

I think I have covered that already.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +



Re: The plan for FDW-based sharding

От
Michael Paquier
Дата:
On Wed, Feb 24, 2016 at 11:34 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Wed, Feb 24, 2016 at 12:17:28PM +0300, Alexander Korotkov wrote:
>> Hi, Bruce!
>>
>> The important point for me is to distinguish different kind of plans:
>> implementation plan and research plan.
>> If we're talking about implementation plan then it should be proven that
>> proposed approach works in this case. I.e research should be already done.
>> If we're talking about research plan then we should realize that result is
>> unpredictable. And we would probably need to dramatically change our way.
>
> Yes, good point.  I would say FDW-based sharding is certainly still a
> research approach, but an odd one because we are adding code even while
> in research mode.  I think that is possible because the FDW improvements
> have other uses beyond sharding.
>
> I think another aspect is that we already know that modifying the
> Postgres source code can produce a useful sharding solution --- XC, XL,
> Greenplum, and CitusDB all prove that, and pg_shard does it as a plugin.
> So, we know that with unlimited code changes, it is possible.  What we
> don't know is whether it is possible with acceptable code changes, and
> how much of the feature-set can be supported this way.
>
> We had a similar case with the Windows port, where SRA (my employer at
> the time) and Nusphere both had native Windows ports of Postgres, and
> they supplied source code to help with the port.  So, in that case also,
> we knew a native Windows port was possible, and we (or at least I) could
> see the code that was required to do it.  The big question was whether a
> native Windows port could be added in a community-acceptable way, and
> the community agreed we could try if we didn't make the code messier ---
> that was a success.
>
> For pg_upgrade, I had code from EDB (my employer at the time) that kind
> of worked, but needed lots of polish, and again, I could do it in
> contrib as long as I didn't mess up the backend code --- that worked
> well too.
>
> So, I guess I am saying, the FDW/sharding thing is a research project,
> but one that is implementing code because of existing proven solutions
> and because the improvements are benefiting other use-cases beyond
> sharding.
>
> Also, in the big picture, the existence of many Postgres forks, all
> doing sharding, indicates that there is demand for this capability, and
> if we can get some this capability into Postgres we will increase the
> number of people using native Postgres.  We might also be able to reduce
> the amount of duplicate work being done in all these forks and allow
> them to more easily focus on more advanced use-cases.
>
>> This two things would work with FDW:
>> 1) Pull data from data nodes to coordinator.
>> 2) Pushdown computations from coordinator to data nodes: joins, aggregates etc.
>> It's proven and clear. This is good.
>> Another point is that these FDW advances are useful by themselves. This is good
>> too.
>>
>> However, the model of FDW assumes that communication happen only between
>> coordinator and data node. But full-weight distributed optimized can't be done
>> under this restriction, because it requires every node to communicate every
>> other node if it makes distributed query faster. And as I get, FDW approach
>> currently have no research and no particular plan for that.
>
> This is very true.  I imagine cross-node connections will certainly
> complicate the implementation and lead to significant code changes,
> which might be unacceptable.  I think we need to go with a
> non-cross-node implementation first, then if that is accepted, we can
> start to think what cross-node code changes would look like.  It
> certainly would require FDW knowledge to exist on every shard.  Some
> have suggested that FDWs wouldn't work well for cross-node connections
> or wouldn't scale and we shouldn't be using them --- I am not sure what
> to think of that.
>
>> As I get from Robert Haas's talk (https://docs.google.com/viewer?a=v&pid=sites&
>> srcid=ZGVmYXVsdGRvbWFpbnxyb2JlcnRtaGFhc3xneDo1ZmFhYzBhNjNhNzVhMDM0)
>>
>>     Before we consider repartitioning joins, we should probably get everything
>>     previously discussed working first.
>>     – Join Pushdown For Parallelism, FDWs
>>     – PartialAggregate/FinalizeAggregate
>>     – Aggregate Pushdown For Parallelism, FDWs
>>     – Declarative Partitioning
>>     – Parallel-Aware Append
>>
>>
>> So, as I get we didn't ever think about possibility of data redistribution
>> using FDW. Probably, something changed since that time. But I haven't heard
>> about it.
>
> No, you didn't miss it.  :-(  We just haven't gotten to studying that
> yet.  One possible outcome is that built-in Postgres has non-cross-node
> sharding, and forks of Postgres have cross-node sharding, again assuming
> cross-node sharding requires an unacceptable amount of code change.  I
> don't think anyone knows the answer yet.
>
>> On Tue, Feb 23, 2016 at 7:43 PM, Bruce Momjian <bruce@momjian.us> wrote:
>>
>>     Second, as part of this staged implementation, there are several use
>>     cases that will be shardable at first, and then only later, more complex
>>     ones.  For example, here are some use cases and the technology they
>>     require:
>>
>>     1. Cross-node read-only queries on read-only shards using aggregate
>>     queries, e.g. data warehouse:
>>
>>     This is the simplest to implement as it doesn't require a global
>>     transaction manager, global snapshot manager, and the number of rows
>>     returned from the shards is minimal because of the aggregates.
>>
>>     2. Cross-node read-only queries on read-only shards using non-aggregate
>>     queries:
>>
>>     This will stress the coordinator to collect and process many returned
>>     rows, and will show how well the FDW transfer mechanism scales.
>>
>>
>> FDW would work for queries which fits pull-pushdown model. I see no plan to
>> make other queries work.
>
> Yep, see above.
>
>>     3. Cross-node read-only queries on read/write shards:
>>
>>     This will require a global snapshot manager to make sure the shards
>>     return consistent data.
>>
>>     4. Cross-node read-write queries:
>>
>>     This will require a global snapshot manager and global snapshot manager.
>>
>>
>> At this point, it unclear why don't you refer work done in the direction of
>> distributed transaction manager (which is also distributed snapshot manager in
>> your terminology)
>> http://www.postgresql.org/message-id/56BB7880.4020604@postgrespro.ru
>
> Yes, there is certainly great work being done on that.  I should have
> included a URL for that --- glad you did.  I wasn't aware it also was a
> distributed snapshot manager.  :-)  And again, as you said earlier, it
> is useful for more things that just FDW sharding.
>
>>     In 9.6, we will have FDW join and sort pushdown
>>     (http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-s
>>     calability.html).  Unfortunately I don't think we will have aggregate
>>     pushdown, so we can't test #1, but we might be able to test #2, even in
>>     9.5.  Also, we might have better partitioning syntax in 9.6.
>>
>>     We need things like parallel partition access and replicated lookup
>>     tables for more join pushdown.
>>
>>     In a way, because these enhancements are useful independent of sharding,
>>     we have not tested to see how well an FDW sharding setup will work and
>>     for which workloads.
>>
>>
>> This is the point I agree. I'm not objecting against any single FDW advance,
>> because it's useful by itself.
>>
>>
>>     We know Postgres XC/XL works, and scales, but we also know they require
>>     too many code changes to be merged into Postgres (at least based on
>>     previous discussions).  The FDW sharding approach is to enhance the
>>     existing features of Postgres to allow as much sharding as possible.
>>
>>
>> This comparison doesn't seems correct to me. Postgres XC/XL supports data
>> redistribution between nodes. And I haven't heard any single idea of supporting
>> this in FDW. You are comparing not equal things.
>
> Well, as far as I know XC doesn't support data redistribution between
> nodes and I saw good benchmarks of that, as well as XL.

XC does support that in 1.2 with a very basic approach (coded that
years ago), though it takes an exclusive lock on the table involved.
And actually I think what I did in this case really sucked, the effort
was centralized on the Coordinator to gather and then redistribute the
tuples, at least tuples that do not need to move were not moved at
all.

>>     Once that is done, we can see what workloads it covers and
>>     decide if we are willing to copy the volume of code necessary
>>     to implement all supported Postgres XC or XL workloads.
>>     (The Postgres XL license now matches the Postgres license,
>>     http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
>>     Postgres XC has always used the Postgres license.)

Postgres-XC used the GPL license first, and has moved to PostgreSQL
license exactly to allow Postgres core to reuse it later on if needed.
--
Michael



Re: The plan for FDW-based sharding

От
Bruce Momjian
Дата:
On Thu, Feb 25, 2016 at 01:53:12PM +0900, Michael Paquier wrote:
> > Well, as far as I know XC doesn't support data redistribution between
> > nodes and I saw good benchmarks of that, as well as XL.
> 
> XC does support that in 1.2 with a very basic approach (coded that
> years ago), though it takes an exclusive lock on the table involved.
> And actually I think what I did in this case really sucked, the effort
> was centralized on the Coordinator to gather and then redistribute the
> tuples, at least tuples that do not need to move were not moved at
> all.

Yes, there is a lot of complexity involved in sending results between
nodes.

> >>     Once that is done, we can see what workloads it covers and
> >>     decide if we are willing to copy the volume of code necessary
> >>     to implement all supported Postgres XC or XL workloads.
> >>     (The Postgres XL license now matches the Postgres license,
> >>     http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
> >>     Postgres XC has always used the Postgres license.)
> 
> Postgres-XC used the GPL license first, and has moved to PostgreSQL
> license exactly to allow Postgres core to reuse it later on if needed.

Ah, yes, I remember that now.  Thanks.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +



Re: The plan for FDW-based sharding

От
Robert Haas
Дата:
On Wed, Feb 24, 2016 at 3:05 PM, Oleg Bartunov <obartunov@gmail.com> wrote:
> I already several times pointed, that we need XTM to be able to continue
> development in different directions, since there is no clear winner.
> Moreover, I think there is no fits-all  solution and while I agree we need
> one built-in in the core, other approaches should have ability to exists
> without patching.

I don't think I necessarily agree with that.  Transaction management
is such a fundamental part of the system that I think making it
pluggable is going to be really hard.  I understand that you've done
several implementations based on your proposed API, and that's good as
far as it goes, but how do we know that's really going to be general
enough for what other people might need?  And what makes us think we
really need multiple transaction managers, anyway?  Even writing one
good distributed transaction manager seems like a really hard project
- why would we want to write two or three or five?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: The plan for FDW-based sharding

От
Oleg Bartunov
Дата:


On Fri, Feb 26, 2016 at 3:50 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Feb 24, 2016 at 3:05 PM, Oleg Bartunov <obartunov@gmail.com> wrote:
> I already several times pointed, that we need XTM to be able to continue
> development in different directions, since there is no clear winner.
> Moreover, I think there is no fits-all  solution and while I agree we need
> one built-in in the core, other approaches should have ability to exists
> without patching.

I don't think I necessarily agree with that.  Transaction management
is such a fundamental part of the system that I think making it
pluggable is going to be really hard.  I understand that you've done
several implementations based on your proposed API, and that's good as
far as it goes, but how do we know that's really going to be general
enough for what other people might need? 

Right now tm is hardcoded and it's doesn't matter  "if other people might need" at all.  We at least provide developers ("other people")  ability to work on their implementations and the patch  is safe and doesn't sacrifices anything in core.

 
And what makes us think we
really need multiple transaction managers, anyway? 


If you brave enough to say that one tm-fits-all and you are able to teach existed tm to play well  in various clustering environment during development period, which is short, than probably we don't need  multiple tms. But It's too perfect to believe and practical solution is to let multiple groups to work on their solutions.

 
Even writing one
good distributed transaction manager seems like a really hard project
- why would we want to write two or three or five?

again, right now it's simply impossible to any bright person to work on dtms.  It's time to start working on dtm, I believe. The fact you don't think about distributed transactions support doesn't mean there no "other people", who has different ideas on postgres future.  That's why we propose this patch, let's play the game ! 

 

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: The plan for FDW-based sharding

От
Robert Haas
Дата:
On Fri, Feb 26, 2016 at 7:21 PM, Oleg Bartunov <obartunov@gmail.com> wrote:
> Right now tm is hardcoded and it's doesn't matter  "if other people might
> need" at all.  We at least provide developers ("other people")  ability to
> work on their implementations and the patch  is safe and doesn't sacrifices
> anything in core.

I don't believe that.  When we install APIs into core, we're
committing to keep those APIs around.  And I think that we're far too
early in the development of transaction managers for PostgreSQL to
think that we know what APIs we want to commit to over the long term.

>> And what makes us think we
>> really need multiple transaction managers, anyway?
>
> If you brave enough to say that one tm-fits-all and you are able to teach
> existed tm to play well  in various clustering environment during
> development period, which is short, than probably we don't need  multiple
> tms. But It's too perfect to believe and practical solution is to let
> multiple groups to work on their solutions.

Nobody's preventing multiple groups for working on their solutions.
That's not the question.  The question is why we should install hooks
in core at this early stage without waiting to see which
implementations prove to be best and whether those hooks are actually
general enough to cater to everything people want to do.  There is
talk of integrating XC/XL work into PostgreSQL; it has a GTM.
Postgres Pro has several GTMs.  Maybe there will be others.

Frankly, I'd like to see a GTM in core at some point because I'd like
everybody who uses PostgreSQL to have access to a GTM.  What I don't
want is for every PostgreSQL company to develop its own GTM and
distribute it separately from everybody else's.  IIUC, MySQL kinda did
that with storage engines and it resulted in the fragmentation of the
community.  We've had the same thing happen with replication tools -
every PostgreSQL company develops their own set.  It would have been
better to have ONE set that was distributed by the core project so
that we didn't all do the same work over again.

I don't understand the argument that without these hooks in core,
people can't continue to work on this.  It isn't hard to work on GTM
without any core changes at all.  You just patch your copy of
PostgreSQL.  We do this all the time, for every patch.  We don't add
hooks for every patch.

> dtms.  It's time to start working on dtm, I believe. The fact you don't
> think about distributed transactions support doesn't mean there no "other
> people", who has different ideas on postgres future.  That's why we propose
> this patch, let's play the game !

I don't like to play games with the architecture of PostgreSQL.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: The plan for FDW-based sharding

От
"Joshua D. Drake"
Дата:
On 02/26/2016 08:06 AM, Robert Haas wrote:
> On Fri, Feb 26, 2016 at 7:21 PM, Oleg Bartunov <obartunov@gmail.com> wrote:
>> Right now tm is hardcoded and it's doesn't matter  "if other people might
>> need" at all.  We at least provide developers ("other people")  ability to
>> work on their implementations and the patch  is safe and doesn't sacrifices
>> anything in core.
>
> I don't believe that.  When we install APIs into core, we're
> committing to keep those APIs around.  And I think that we're far too
> early in the development of transaction managers for PostgreSQL to
> think that we know what APIs we want to commit to over the long term.

Correct.

[snip]

>
> Frankly, I'd like to see a GTM in core at some point because I'd like
> everybody who uses PostgreSQL to have access to a GTM.  What I don't
> want is for every PostgreSQL company to develop its own GTM and
> distribute it separately from everybody else's.  IIUC, MySQL kinda did
> that with storage engines and it resulted in the fragmentation of the
> community.

No it didn't. It allowed MySQL people to use the tool that best fit 
their needs.

> We've had the same thing happen with replication tools -
> every PostgreSQL company develops their own set.  It would have been
> better to have ONE set that was distributed by the core project so
> that we didn't all do the same work over again.

The reason people developed a bunch of external replication tools (and 
continue to) is because .Org has shown a unique lack of leadership in 
providing solutions for the problem. Historically speaking .Org was anti 
replication in core. It wasn't about who was going to be best. It was 
who was going to be best for what problem. The inclusion of the 
replication tools we have now speaks very loudly to the that lack of 
leadership.

The moment .Org showed leadership and developed a reasonable solution to 
80% of the problem, a great majority of people moved to hot standby and 
streaming replication. It is easy. It does not answer all the questions 
but it is default, in core and that gives people piece of mind. This is 
also why once PgLogical is up to -core quality and in -core, the great 
majority of people will work to dump Slony/Londiste/Insertproghere and 
use PgLogical.

If .Org was interested in showing leadership in this area, a few hackers 
would get together with a few other hackers from XL and XC (although as 
I understand it XL is further along), have a few heart to heart, mind to 
mind meetings and determine:

* Are either of these two solutions worth it?Yes? Then let's start working on an integration plan and get it done.No?
Thenlet's start working on a .Org plan to solve that problem.
 

But that likely won't happen because NIH.

>
> I don't understand the argument that without these hooks in core,
> people can't continue to work on this.  It isn't hard to work on GTM
> without any core changes at all.  You just patch your copy of
> PostgreSQL.  We do this all the time, for every patch.  We don't add
> hooks for every patch.
>
>> dtms.  It's time to start working on dtm, I believe. The fact you don't
>> think about distributed transactions support doesn't mean there no "other
>> people", who has different ideas on postgres future.  That's why we propose
>> this patch, let's play the game !
>
> I don't like to play games with the architecture of PostgreSQL.
>

Robert, this is all a game. It is a game of who wins the intellectual 
prize to whatever problem. Who gets the market or mind share and who 
gets to pretend they win the Oscar for coolest design.

Sincerely,

jD

-- 
Command Prompt, Inc.                  http://the.postgres.company/                        +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.



Re: The plan for FDW-based sharding

От
Robert Haas
Дата:
On Fri, Feb 26, 2016 at 10:00 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
> Robert, this is all a game. It is a game of who wins the intellectual prize
> to whatever problem. Who gets the market or mind share and who gets to
> pretend they win the Oscar for coolest design.

JD, I don't have a horse in this race.  I am not developing a GTM and
I would be quite happy never to have to develop a GTM.  That doesn't
mean I think we should add these proposed hooks.  I think that's just
freezing the way that potential GTMs have to interact with the rest of
the system before we actually have a solution that the community is
willing to endorse.  I don't know what problem that solves.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: The plan for FDW-based sharding

От
Konstantin Knizhnik
Дата:
We do not have formal prove that proposed XTM is "general enough" to 
handle all possible transaction manager implementations.
But there are two general ways of dealing with isolation: snapshot based 
and CSN  based.
pg_dtm and pg_tsdtm prove that both of them can be implemented using XTM.
If you know some approach to distributed transaction manager 
implementation, please let us know.
Otherwise your statement "is not general enough" is not concrete enough.
Postgres-XL GTM can be in principle implemented as extension based on XTM.

This API is based on existed PostgreSQL TM functions: we do not 
introduce some new abstractions.
Is it possible that some other TM function has to be encapsulated? Yes, 
it is.
But I do not see much problems with adding this function to XTM in 
future if it is actually needed.
It happens with most APIs. It is awful when API functions are changed, 
breaking application based on this API.
But as far as functions encapsulated in XTM are in any case present in 
PostgreSQL core, I do not think
that them will be changed in future unless there are some plans to 
completely rewrite Postgres transaction manager...

Yes, it is certainly possible to develop cluster by cloning PostgreSQL.
But it cause big problems both for developers, which have to permanently 
synchronize their branch with master,
and, what is more important, for customers, which can not use standard 
version of PostgreSQL.
It may cause problems with system certification, with running Postgres 
in cloud,...
Actually the history of Postgres-XL/XC and Greenplum IMHO shows that it 
is wrong direction.



On 26.02.2016 19:06, Robert Haas wrote:
> On Fri, Feb 26, 2016 at 7:21 PM, Oleg Bartunov <obartunov@gmail.com> wrote:
>> Right now tm is hardcoded and it's doesn't matter  "if other people might
>> need" at all.  We at least provide developers ("other people")  ability to
>> work on their implementations and the patch  is safe and doesn't sacrifices
>> anything in core.
> I don't believe that.  When we install APIs into core, we're
> committing to keep those APIs around.  And I think that we're far too
> early in the development of transaction managers for PostgreSQL to
> think that we know what APIs we want to commit to over the long term.
>
>>> And what makes us think we
>>> really need multiple transaction managers, anyway?
>> If you brave enough to say that one tm-fits-all and you are able to teach
>> existed tm to play well  in various clustering environment during
>> development period, which is short, than probably we don't need  multiple
>> tms. But It's too perfect to believe and practical solution is to let
>> multiple groups to work on their solutions.
> Nobody's preventing multiple groups for working on their solutions.
> That's not the question.  The question is why we should install hooks
> in core at this early stage without waiting to see which
> implementations prove to be best and whether those hooks are actually
> general enough to cater to everything people want to do.  There is
> talk of integrating XC/XL work into PostgreSQL; it has a GTM.
> Postgres Pro has several GTMs.  Maybe there will be others.
>
> Frankly, I'd like to see a GTM in core at some point because I'd like
> everybody who uses PostgreSQL to have access to a GTM.  What I don't
> want is for every PostgreSQL company to develop its own GTM and
> distribute it separately from everybody else's.  IIUC, MySQL kinda did
> that with storage engines and it resulted in the fragmentation of the
> community.  We've had the same thing happen with replication tools -
> every PostgreSQL company develops their own set.  It would have been
> better to have ONE set that was distributed by the core project so
> that we didn't all do the same work over again.
>
> I don't understand the argument that without these hooks in core,
> people can't continue to work on this.  It isn't hard to work on GTM
> without any core changes at all.  You just patch your copy of
> PostgreSQL.  We do this all the time, for every patch.  We don't add
> hooks for every patch.
>
>> dtms.  It's time to start working on dtm, I believe. The fact you don't
>> think about distributed transactions support doesn't mean there no "other
>> people", who has different ideas on postgres future.  That's why we propose
>> this patch, let's play the game !
> I don't like to play games with the architecture of PostgreSQL.
>

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: The plan for FDW-based sharding

От
Alvaro Herrera
Дата:
Konstantin Knizhnik wrote:

> Yes, it is certainly possible to develop cluster by cloning PostgreSQL.
> But it cause big problems both for developers, which have to permanently
> synchronize their branch with master,
> and, what is more important, for customers, which can not use standard
> version of PostgreSQL.
> It may cause problems with system certification, with running Postgres in
> cloud,...
> Actually the history of Postgres-XL/XC and Greenplum IMHO shows that it is
> wrong direction.

That's not the point, though.  I don't think a Postgres clone with a GTM
solves any particular problem that's not already solved by the existing
forks.  However, if you have a clone at home and you make a GTM work on
it, then you take the GTM as a patch and post it for discussion.
There's no need for hooks for that.  Just make sure your GTM solves the
problem that it is supposed to solve.

Excuse me if I've missed the discussion elsewhere -- why does
PostgresPro have *two* GTMs instead of a single one?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: The plan for FDW-based sharding

От
Bruce Momjian
Дата:
On Fri, Feb 26, 2016 at 03:30:29PM -0300, Alvaro Herrera wrote:
> That's not the point, though.  I don't think a Postgres clone with a GTM
> solves any particular problem that's not already solved by the existing
> forks.  However, if you have a clone at home and you make a GTM work on
> it, then you take the GTM as a patch and post it for discussion.
> There's no need for hooks for that.  Just make sure your GTM solves the
> problem that it is supposed to solve.
> 
> Excuse me if I've missed the discussion elsewhere -- why does
> PostgresPro have *two* GTMs instead of a single one?

I think the issue is that a GTM that works for a low-latency network
doesn't work well for a high-latency network, so the high-latency GTM
has fewer features and guarantees.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +



Re: The plan for FDW-based sharding

От
Konstantin Knizhnik
Дата:
On 02/26/2016 09:30 PM, Alvaro Herrera wrote:
> Konstantin Knizhnik wrote:
>
>> Yes, it is certainly possible to develop cluster by cloning PostgreSQL.
>> But it cause big problems both for developers, which have to permanently
>> synchronize their branch with master,
>> and, what is more important, for customers, which can not use standard
>> version of PostgreSQL.
>> It may cause problems with system certification, with running Postgres in
>> cloud,...
>> Actually the history of Postgres-XL/XC and Greenplum IMHO shows that it is
>> wrong direction.
> That's not the point, though.  I don't think a Postgres clone with a GTM
> solves any particular problem that's not already solved by the existing
> forks.  However, if you have a clone at home and you make a GTM work on
> it, then you take the GTM as a patch and post it for discussion.
> There's no need for hooks for that.  Just make sure your GTM solves the
> problem that it is supposed to solve.
>
> Excuse me if I've missed the discussion elsewhere -- why does
> PostgresPro have *two* GTMs instead of a single one?
>
There are many different clusters which require different approaches for managing distributed transactions.
Some clusters do no need distributed transactions at all: if you are executing OLAP queries on read-only database GTM
will just add extra overhead.
 

pg_dtm uses centralized arbiter. It is similar with Postgres-XL DTM. Presence of single arbiter signficantly simplify
alldistributed algorithms: failure detection, global deadlock elimination, ... But at the same time arbiter is SPOF and
mainfactor 
 
limiting cluster scalability.

pg_tsdtm  is based on another approach: it is using system time as CSN and doesn't require arbiter. In theory there is
nolimit for scalability. But differences in system time and necessity to use more rounds of communication have negative
impacton 
 
performance.

So there is no ideal solution which can work well for all cluster. This is why it is not possible to develop just one
GTM,propose it as a patch for review and then (hopefully) commit it in Postgres core. IMHO it will never happen. And I
donot think that 
 
it is actually needed. What we need is a way to be able to create own transaction managers as Postgres extension not
affectingits  core.
 

All arguments against XTM can be applied to any other extension API in Postgres, for example FDW.
Is it general enough? There are many useful operations which currently are not handled by this API. For example
performingaggregation and grouping at foreign server side.  But still it is very useful and flexible mechanism,
allowingto implement many 
 
wonderful things.
From my point of view good system should be as open and customizable as possible, if it doesn't affect  performance.
Replacing direct function calls with indirect function calls in almost all cases can not suffer performance as well as
addinghooks.
 
So without any extra price we get better flexibility. What's wrong with it?






-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: The plan for FDW-based sharding

От
Kevin Grittner
Дата:
On Fri, Feb 26, 2016 at 2:19 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

> pg_tsdtm  is based on another approach: it is using system time
> as CSN

Which brings up an interesting point, if we want logical
replication to be free of serialization anomalies for those using
serializable transactions, we need to support applying transactions
in an order which may not be the same as commit order -- CSN (as
such) would be the wrong thing.  If serializable transaction 1 (T1)
modifies a row and concurrent serializable transaction 2 (T2) reads
the old version of the row, and modifies something based on that,
T2 must be applied to a logical replica first even if T1 commits
before it; otherwise the logical replica could see a state not
consistent with business rules and which could not have been seen
(due to SSI) on the source database.  Any DTM API which does not
support some mechanism to rearrange the order of transactions from
commit order to some other order (based on, for example, read-write
dependencies) is not complete.  If it does support that, it gives
us a way forward for presenting consistent data on logical
replicas.

To avoid confusion, it might be best to reserve CSN for actual
commit sequence numbers, or at least values which increase
monotonically with each commit.  The term of art for what I
described above is "apparent order of execution", so maybe we want
to use AOE or AOoE for the order we choose to use in a particular
implementation.  It doesn't seem to me to be outright inaccurate
for cases where the system time on the various systems is used.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: The plan for FDW-based sharding

От
Simon Riggs
Дата:
On 26 February 2016 at 22:48, Kevin Grittner <kgrittn@gmail.com> wrote:
On Fri, Feb 26, 2016 at 2:19 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

> pg_tsdtm  is based on another approach: it is using system time
> as CSN

Which brings up an interesting point, if we want logical
replication to be free of serialization anomalies for those using
serializable transactions, we need to support applying transactions
in an order which may not be the same as commit order -- CSN (as
such) would be the wrong thing.  If serializable transaction 1 (T1)
modifies a row and concurrent serializable transaction 2 (T2) reads
the old version of the row, and modifies something based on that,
T2 must be applied to a logical replica first even if T1 commits
before it; otherwise the logical replica could see a state not
consistent with business rules and which could not have been seen
(due to SSI) on the source database. 

How would SSI allow that commit order?

Surely there is a read-write dependency that would cause T2 to be aborted?
 
Any DTM API which does not
support some mechanism to rearrange the order of transactions from
commit order to some other order (based on, for example, read-write
dependencies) is not complete.  If it does support that, it gives
us a way forward for presenting consistent data on logical
replicas.

You appear to be saying that SSI allows transactions to commit in a non-serializable order.

Do you have a test case?

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: The plan for FDW-based sharding

От
Robert Haas
Дата:
On Fri, Feb 26, 2016 at 10:56 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> We do not have formal prove that proposed XTM is "general enough" to handle
> all possible transaction manager implementations.
> But there are two general ways of dealing with isolation: snapshot based and
> CSN  based.

I don't believe that for a minute.  For example, consider this article:

https://en.wikipedia.org/wiki/Global_serializability

I think the neutrality of that article is *very* debatable, but it
certainly contradicts the idea that snapshots and CSNs are the only
methods of achieving global serializability.

Or consider this lecture:

http://hssl.cs.jhu.edu/~randal/416/lectures.old/ln5.2.pdf

That's a great introduction to the problem we're trying to solve here,
but again, snapshots are not mentioned, and CSNs certainly aren't
mentioned.

This write-up goes further, explaining three different methods for
ensuring global serializability, none of which mention snapshots or
CSNs:

http://heaven.eee.metu.edu.tr/~vision/LectureNotes/EE442/Ee442ch7.html

Actually, I think the second approach is basically a snapshot/CSN-type
approach, but it doesn't use that terminology and the connection to
what you are proposing is very unclear.

I think you're approaching this problem from a viewpoint that is
entirely too focused on the code that exists in PostgreSQL today.
Lots of people have done lots of academic research on how to solve
this problem, and you can't possibly say that CSNs and snapshots are
the only solution to this problem unless you haven't read any of those
papers.  The articles above aren't exceptional in mentioning neither
of the approaches that you are advocating - they are typical of the
literature in this area.  How can it be that the only solutions to
this problem are ones that are totally different from the approaches
that university professors who spend time doing research on
concurrency have spent time exploring?

I think we need to back up here and examine our underlying design
assumptions.  The goal here shouldn't necessarily be to replace
PostgreSQL's current transaction management with a distributed version
of the same thing.  We might want to do that, but I think the goal is
or should be to provide ACID semantics in a multi-node environment,
and specifically the I in ACID: transaction isolation.  Making the
existing transaction manager into something that can be spread across
multiple nodes is one way of accomplishing that.  Maybe the best one.
Certainly one that's been experimented within Postgres-XC.  But it is
often the case that an algorithm that works tolerably well on a single
machine starts performing extremely badly in a distributed
environment, because the latency of communicating between multiple
systems is vastly higher than the latency of communicating between
CPUs or cores on the same system.  So I don't think we should be
assuming that's the way forward.

For example, consider a table with a million rows spread across any
number of servers.  Consider also a series of update transactions each
of which reads exactly one row and then writes that row.  If we adopt
any solution that involves a central coordinator to arbitrate commit
ordering, this is going to require at least one and probably two
million network round trips, one per transaction to get a snapshot and
a second to commit.  But all of this is completely unnecessary.
Because each transaction touches only a single node, a perfect global
transaction manager doesn't really need to do anything at all in this
case.  The existing PostreSQL mechanisms - snapshot isolation, and SSI
if you have it turned on - will provide just as much transaction
isolation on this workload as they would on a workload that only
touched a single node.  If we design a GTM that does two million
network round trips in this scenario, we have just wasted two million
network round trips.

Now consider another workload where each transaction reads a row one
one server, reads a row on another server, and then updates the second
row.  Here, the GTM has a job to do.  If T1 reads R1, reads R2, writes
R2; and T2 concurrently reads R2, reads R1, and then writes R1, it
could happen that both transactions see the pre-update values of the
row they read first and yet both transactions go on to commit.  That's
not equivalent to any serial history, so transaction isolation is
broken.  A GTM which aims to provide true cluster-wide serializability
must do something to keep that from happening.  If all of this were
happening on a single node, those transactions would succeed if run at
READ COMMITTED but SSI would roll one of them back at SERIALIZABLE.
So maybe the goal for the GTM isn't to provide true serializability
across the cluster but some lesser degree of transaction isolation.
But then exactly which serialization anomalies are we trying to
prevent, and why is it OK to prevent those and not others?

I have seen zero discussion of any of this.  What I think we ought to
be doing here is describing precisely what might break, and then
deciding which of those problems we want to fix, and then deciding how
we can do that with the least amount of network traffic.  Jumping to
"let's make the transaction API pluggable" is presupposing the answer
to the first two questions without any discussion, and I'm afraid that
it's not going to lead to a very agreeable solution to the third one.

> Yes, it is certainly possible to develop cluster by cloning PostgreSQL.
> But it cause big problems both for developers, which have to permanently
> synchronize their branch with master,
> and, what is more important, for customers, which can not use standard
> version of PostgreSQL.
> It may cause problems with system certification, with running Postgres in
> cloud,...
> Actually the history of Postgres-XL/XC and Greenplum IMHO shows that it is
> wrong direction.

I think the history of Postgres-XC/XL shows that developing technology
outside of the PostgreSQL community is a risky business.  You might
end up developing something that is not widely used or adopted, and
the lack of community review might cause that technology to be less
good than it would have been had it been done through the community
process. It seems to me that installing a bunch of hooks here and then
having you go off and develop outside the community has those same
perils. (Of course, in that case and this one, working outside the
community also lets you can also go faster and do things the community
doesn't like, which are sometimes advantages.)

Also, what you are proposing solves problems for you while maybe
creating them for other people.  You're saying that we should have
hooks so that you don't have to merge with master.  But that's just
transferring the maintenance burden from you to core.  Instead of you
having to merge when things change, core has got to maintain the hooks
as things change so that things are easy for you.  If there are no
code changes in the relevant area anyway, then merging is trivial and
you shouldn't need to worry about it.  I could submit a patch adding
hooks to core to enable all of the things (or even just some of the
things) that EnterpriseDB has changed in Advanced Server, and that
patch would be rejected so fast it would make your head spin, because
of course the core project doesn't want to be burdened with
maintaining a whole bunch of hooks for the convenience of
EnterpriseDB.  Which is understandable.  I think it's fine for you to
ask whether PostgreSQL will accept a certain set of hooks, but we've
all got to understand that there is a difference between what is
convenient for us or our employers and what is actually best for the
project.  I am not under any illusions that those two things are the
same, and while I do a lot of things that I hope will benefit my
employer, when I am writing to this mailing list I do not do things
unless they are in the interest of PostgreSQL.  When those two things
intersect, great; when they don't, and the work is community work,
PostgreSQL wins.  I see very clearly that what you are proposing here
will benefit your customers, but unless it will also benefit the
PostgreSQL community in general, it's not a good submission.

But I don't really want to spend a lot of time arguing about politics
here.  The real issue is whether this is a good approach.  If it is,
then it's the right thing to do for PostgreSQL and we should commit
it.  If it's not, then we should reject it.  Let's focus on the
technical concerns I wrote about in the first part of the email rather
than wrangling about business interests.  I'm not blind to the fact
that we work for different companies and I realize that can create
some tension, but if we want to *have* a PostgreSQL community we've
got to try to get past that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: The plan for FDW-based sharding

От
Robert Haas
Дата:
On Sat, Feb 27, 2016 at 1:49 AM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> pg_tsdtm  is based on another approach: it is using system time as CSN and
> doesn't require arbiter. In theory there is no limit for scalability. But
> differences in system time and necessity to use more rounds of communication
> have negative impact on performance.

How do you prevent clock skew from causing serialization anomalies?

> So there is no ideal solution which can work well for all cluster. This is
> why it is not possible to develop just one GTM, propose it as a patch for
> review and then (hopefully) commit it in Postgres core. IMHO it will never
> happen. And I do not think that it is actually needed. What we need is a way
> to be able to create own transaction managers as Postgres extension not
> affecting its  core.

This seems rather defeatist.  If the code is good and reliable, why
should it not be committed to core?

> All arguments against XTM can be applied to any other extension API in
> Postgres, for example FDW.
> Is it general enough? There are many useful operations which currently are
> not handled by this API. For example performing aggregation and grouping at
> foreign server side.  But still it is very useful and flexible mechanism,
> allowing to implement many wonderful things.

That is true.  And everybody is entitled to an opinion on each new
proposed hook, as to whether that hook is general or not.  We have
both accepted and rejected proposed hooks in the past.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: The plan for FDW-based sharding

От
Konstantin Knizhnik
Дата:
On 02/27/2016 06:57 AM, Robert Haas wrote:
> On Sat, Feb 27, 2016 at 1:49 AM, Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>> pg_tsdtm  is based on another approach: it is using system time as CSN and
>> doesn't require arbiter. In theory there is no limit for scalability. But
>> differences in system time and necessity to use more rounds of communication
>> have negative impact on performance.
> How do you prevent clock skew from causing serialization anomalies?

If node receives message from "feature" it just needs to wait until this future arrive.
Practically we just "adjust" system time in this case, moving it forward (certainly system time is not actually
changed,we just set correction value which need to be added to system time).
 
This approach was discussed in the article:
http://research.microsoft.com/en-us/people/samehe/clocksi.srds2013.pdf
I hope, in this article algorithm is explained much better than I can do here.

Few notes:
1. I can not prove that our pg_tsdtm absolutely correctly implements approach described in this article.
2. I didn't try to formally prove that our implementation can not cause some serialization anomalies.
3. We just run various synchronization tests (including simplest debit-credit test which breaks old version of
Postgtes-XL)during several days and we didn't get any inconsistencies.
 
4. We have tested pg_tsdtm both at single node, blade cluster and geographically distributed nodes (distance more than
thousandkilometers: one server was in Vladivostok, another in Kaliningrad). Ping between these two servers takes about
100msec.
 
Performance of our benchmark drops about 100 times but there was no inconsistencies.

Also I once again want to notice that primary idea of the proposed patch was not pg_tsdtm.
There are well know limitation of this  pg_tsdtm which we will try to address in future.
What we want is to include XTM API in PostgreSQL to be able to continue our experiments with different transaction
managersand implementing multimaster on top of it (our first practical goal) without affecting PostgreSQL core.
 

If XTM patch will be included in 9.6, then we can propose our multimaster as PostgreSQL extension and everybody can use
it.
Otherwise we have to propose our own fork of Postgres which significantly complicates using and maintaining it.

>> So there is no ideal solution which can work well for all cluster. This is
>> why it is not possible to develop just one GTM, propose it as a patch for
>> review and then (hopefully) commit it in Postgres core. IMHO it will never
>> happen. And I do not think that it is actually needed. What we need is a way
>> to be able to create own transaction managers as Postgres extension not
>> affecting its  core.
> This seems rather defeatist.  If the code is good and reliable, why
> should it not be committed to core?

Two reasons:
1. There is no ideal implementation of DTM which will fit all possible needs and be  efficient for all clusters.
2. Even if such implementation exists, still the right way of it integration is Postgres should use kind of TM API.
I hope that everybody will agree that doing it in this way:

#ifdef PGXC        /* In Postgres-XC, stop timestamp has to follow the timeline of GTM */        xlrec.xact_time =
xactStopTimestamp+ GTMdeltaTimestamp;
 
#else        xlrec.xact_time = xactStopTimestamp;
#endif

or in this way:
        xlrec.xact_time = xactUseGTM ? xactStopTimestamp + GTMdeltaTimestamp : xactStopTimestamp;

is very very bad idea.
In OO programming we should have abstract TM interface and several implementations of this interface, for example
MVCC_TM, 2PL_TM, Distributed_TM...
This is actually what can be done with our XTM API.
As far as Postgres is implemented in C, not in C++ we have to emulate interfaces using structures with function
pointers.
And please notice that there is completely no need to include DTM implementation in core, as far as it is not needed
foreverybody.
 
It can be easily distributed as extension.

I have that quite soon we can propose multimaster extension which should provides functionality similar with MySQL
Gallera.But even right now we have integrated pg_dtm and pg_tsdtm with pg_shard and postgres_fdw, allowing to provide
distributed
 
consistency for them.


>
>> All arguments against XTM can be applied to any other extension API in
>> Postgres, for example FDW.
>> Is it general enough? There are many useful operations which currently are
>> not handled by this API. For example performing aggregation and grouping at
>> foreign server side.  But still it is very useful and flexible mechanism,
>> allowing to implement many wonderful things.
> That is true.  And everybody is entitled to an opinion on each new
> proposed hook, as to whether that hook is general or not.  We have
> both accepted and rejected proposed hooks in the past.
>


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: The plan for FDW-based sharding

От
Konstantin Knizhnik
Дата:
On 02/27/2016 06:54 AM, Robert Haas wrote:
> On Fri, Feb 26, 2016 at 10:56 PM, Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>> We do not have formal prove that proposed XTM is "general enough" to handle
>> all possible transaction manager implementations.
>> But there are two general ways of dealing with isolation: snapshot based and
>> CSN  based.
> I don't believe that for a minute.  For example, consider this article:

Well, I have to agree that saying that there are just two ways of providing distributed isolation I was not right.
There is at least one more method: conservative locking. But it will cause huge number of extra network messages which
hasto be exchanged.
 
Also I mostly considered solutions compatible with PostgreSQL MVCC model.

And definitely their are other approaches. Like preserving transaction commit order (as it is done in Galera).
Some other them can be implemented with XTM (preserving commit order), some - not (2PL).
I have already noticed that XTM is not allowing to implement ANY transaction manager.
But we have considered several approaches to distributed transaction management explained in the article related with
reallyworking systems.
 
Some of them are real production system as SAP HANA, some are just prototypes, but working prototypes for which authors
haveperformed
 
some benchmarking and comparison with other approaches. The references you have mentioned are mostly theoretical
descriptionof the problem.
 
Nice to know it but it is hard to build some concrete implementation based on this articles.


Briefly answering other your questions:

> For example, consider a table with a million rows spread across any number of servers.

It is sharding scenario, pg_tsdtm will work well in this case does not requiring sending a lot of extra messages.

> Now consider another workload where each transaction reads a row one
one server, reads a row on another server,

It can be solved both with pg_dtm (central arbiter) and pg_tsdtm (no arbiter),
But actually you scenarios just once again proves that there can not be just one ideal distributed TM.

> So maybe the goal for the GTM isn't to provide true serializability
across the cluster but some lesser degree of transaction isolation.
But then exactly which serialization anomalies are we trying to
prevent, and why is it OK to prevent those and not others?

Absolutely agree. There are some theoretical discussion regarding CAP and different distributed level of isolation.
But at practice people want to solve their tasks. Most of PostgeSQL used are using default isolation level: read
committedalthough there are alot of "wonderful" anomalies with it.
 
Serialazable transaction in Oracle are actually violating fundamental serializability rule and still Oracle is one of
thermost popular database in the world...
 
The was isolation bug in Postgres-XL which doesn't prevent from using it by commercial customers...

So I do not say that discussing all this theoretical questions is not need as formally proven correctness of
distributedalgorithm.
 
But I do not understand hot why it should prevent from providing extensible TM API.
Yes, we can tot do everything with it. But still we can implement many different approaches.
I think that it somehow proves that it is "general enough".





 




> https://en.wikipedia.org/wiki/Global_serializability
>
> I think the neutrality of that article is *very* debatable, but it
> certainly contradicts the idea that snapshots and CSNs are the only
> methods of achieving global serializability.
>
> Or consider this lecture:
>
> http://hssl.cs.jhu.edu/~randal/416/lectures.old/ln5.2.pdf
>
> That's a great introduction to the problem we're trying to solve here,
> but again, snapshots are not mentioned, and CSNs certainly aren't
> mentioned.
>
> This write-up goes further, explaining three different methods for
> ensuring global serializability, none of which mention snapshots or
> CSNs:
>
> http://heaven.eee.metu.edu.tr/~vision/LectureNotes/EE442/Ee442ch7.html
>
> Actually, I think the second approach is basically a snapshot/CSN-type
> approach, but it doesn't use that terminology and the connection to
> what you are proposing is very unclear.
>
> I think you're approaching this problem from a viewpoint that is
> entirely too focused on the code that exists in PostgreSQL today.
> Lots of people have done lots of academic research on how to solve
> this problem, and you can't possibly say that CSNs and snapshots are
> the only solution to this problem unless you haven't read any of those
> papers.  The articles above aren't exceptional in mentioning neither
> of the approaches that you are advocating - they are typical of the
> literature in this area.  How can it be that the only solutions to
> this problem are ones that are totally different from the approaches
> that university professors who spend time doing research on
> concurrency have spent time exploring?
>
> I think we need to back up here and examine our underlying design
> assumptions.  The goal here shouldn't necessarily be to replace
> PostgreSQL's current transaction management with a distributed version
> of the same thing.  We might want to do that, but I think the goal is
> or should be to provide ACID semantics in a multi-node environment,
> and specifically the I in ACID: transaction isolation.  Making the
> existing transaction manager into something that can be spread across
> multiple nodes is one way of accomplishing that.  Maybe the best one.
> Certainly one that's been experimented within Postgres-XC.  But it is
> often the case that an algorithm that works tolerably well on a single
> machine starts performing extremely badly in a distributed
> environment, because the latency of communicating between multiple
> systems is vastly higher than the latency of communicating between
> CPUs or cores on the same system.  So I don't think we should be
> assuming that's the way forward.
>
> For example, consider a table with a million rows spread across any
> number of servers.  Consider also a series of update transactions each
> of which reads exactly one row and then writes that row.  If we adopt
> any solution that involves a central coordinator to arbitrate commit
> ordering, this is going to require at least one and probably two
> million network round trips, one per transaction to get a snapshot and
> a second to commit.  But all of this is completely unnecessary.
> Because each transaction touches only a single node, a perfect global
> transaction manager doesn't really need to do anything at all in this
> case.  The existing PostreSQL mechanisms - snapshot isolation, and SSI
> if you have it turned on - will provide just as much transaction
> isolation on this workload as they would on a workload that only
> touched a single node.  If we design a GTM that does two million
> network round trips in this scenario, we have just wasted two million
> network round trips.
>
> Now consider another workload where each transaction reads a row one
> one server, reads a row on another server, and then updates the second
> row.  Here, the GTM has a job to do.  If T1 reads R1, reads R2, writes
> R2; and T2 concurrently reads R2, reads R1, and then writes R1, it
> could happen that both transactions see the pre-update values of the
> row they read first and yet both transactions go on to commit.  That's
> not equivalent to any serial history, so transaction isolation is
> broken.  A GTM which aims to provide true cluster-wide serializability
> must do something to keep that from happening.  If all of this were
> happening on a single node, those transactions would succeed if run at
> READ COMMITTED but SSI would roll one of them back at SERIALIZABLE.
> So maybe the goal for the GTM isn't to provide true serializability
> across the cluster but some lesser degree of transaction isolation.
> But then exactly which serialization anomalies are we trying to
> prevent, and why is it OK to prevent those and not others?
>
> I have seen zero discussion of any of this.  What I think we ought to
> be doing here is describing precisely what might break, and then
> deciding which of those problems we want to fix, and then deciding how
> we can do that with the least amount of network traffic.  Jumping to
> "let's make the transaction API pluggable" is presupposing the answer
> to the first two questions without any discussion, and I'm afraid that
> it's not going to lead to a very agreeable solution to the third one.
>
>> Yes, it is certainly possible to develop cluster by cloning PostgreSQL.
>> But it cause big problems both for developers, which have to permanently
>> synchronize their branch with master,
>> and, what is more important, for customers, which can not use standard
>> version of PostgreSQL.
>> It may cause problems with system certification, with running Postgres in
>> cloud,...
>> Actually the history of Postgres-XL/XC and Greenplum IMHO shows that it is
>> wrong direction.
> I think the history of Postgres-XC/XL shows that developing technology
> outside of the PostgreSQL community is a risky business.  You might
> end up developing something that is not widely used or adopted, and
> the lack of community review might cause that technology to be less
> good than it would have been had it been done through the community
> process. It seems to me that installing a bunch of hooks here and then
> having you go off and develop outside the community has those same
> perils. (Of course, in that case and this one, working outside the
> community also lets you can also go faster and do things the community
> doesn't like, which are sometimes advantages.)
>
> Also, what you are proposing solves problems for you while maybe
> creating them for other people.  You're saying that we should have
> hooks so that you don't have to merge with master.  But that's just
> transferring the maintenance burden from you to core.  Instead of you
> having to merge when things change, core has got to maintain the hooks
> as things change so that things are easy for you.  If there are no
> code changes in the relevant area anyway, then merging is trivial and
> you shouldn't need to worry about it.  I could submit a patch adding
> hooks to core to enable all of the things (or even just some of the
> things) that EnterpriseDB has changed in Advanced Server, and that
> patch would be rejected so fast it would make your head spin, because
> of course the core project doesn't want to be burdened with
> maintaining a whole bunch of hooks for the convenience of
> EnterpriseDB.  Which is understandable.  I think it's fine for you to
> ask whether PostgreSQL will accept a certain set of hooks, but we've
> all got to understand that there is a difference between what is
> convenient for us or our employers and what is actually best for the
> project.  I am not under any illusions that those two things are the
> same, and while I do a lot of things that I hope will benefit my
> employer, when I am writing to this mailing list I do not do things
> unless they are in the interest of PostgreSQL.  When those two things
> intersect, great; when they don't, and the work is community work,
> PostgreSQL wins.  I see very clearly that what you are proposing here
> will benefit your customers, but unless it will also benefit the
> PostgreSQL community in general, it's not a good submission.
>
> But I don't really want to spend a lot of time arguing about politics
> here.  The real issue is whether this is a good approach.  If it is,
> then it's the right thing to do for PostgreSQL and we should commit
> it.  If it's not, then we should reject it.  Let's focus on the
> technical concerns I wrote about in the first part of the email rather
> than wrangling about business interests.  I'm not blind to the fact
> that we work for different companies and I realize that can create
> some tension, but if we want to *have* a PostgreSQL community we've
> got to try to get past that.
>


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: The plan for FDW-based sharding

От
Álvaro Hernández Tortosa
Дата:

On 27/02/16 09:19, Konstantin Knizhnik wrote:
> On 02/27/2016 06:54 AM, Robert Haas wrote:
>
[...]
>
>> So maybe the goal for the GTM isn't to provide true serializability
> across the cluster but some lesser degree of transaction isolation.
> But then exactly which serialization anomalies are we trying to
> prevent, and why is it OK to prevent those and not others?
>
> Absolutely agree. There are some theoretical discussion regarding CAP 
> and different distributed level of isolation.
> But at practice people want to solve their tasks. Most of PostgeSQL 
> used are using default isolation level: read committed although there 
> are alot of "wonderful" anomalies with it.
> Serialazable transaction in Oracle are actually violating fundamental 
> serializability rule and still Oracle is one of ther most popular 
> database in the world...
> The was isolation bug in Postgres-XL which doesn't prevent from using 
> it by commercial customers...
    I think this might be a dangerous line of thought. While I agree 
PostgreSQL should definitely look at the market and answer questions 
that (current and prospective) users may ask, and be more practical than 
idealist, easily ditching isolation guarantees might not be a good thing.
     That Oracle is the leader with their isolation problems or that 
most people run PostgreSQL under read committed is not a good argument 
to cut the corner and just go to bare minimum (if any) isolation 
guarantees. First, because PostgreSQL has always been trusted and 
understood as a system with *strong* guarantees (whatever that means). . 
Second, because what we may perceive as OK from the market, might change 
soon. From my observations, while I agree with you most people "don't 
care" or, worse, "don't realize", is rapidly changing. More and more 
people are becoming aware of the problems of distributed systems and the 
significant consequences they may have on them.
    A lot of them have been illustrated in the famous Jepsen posts. As 
an example, and a good one given that you have mentioned Galera before, 
is this one: https://aphyr.com/posts/327-jepsen-mariadb-galera-cluster 
which demonstrates how Galera fails to provide Snapshot Isolation, even 
on healthy state --despite they claim that.
    As of today, I would expect any distributed system to clearly state 
its guarantees in the documentation. And them adhere to them, like for 
instance proving it with tests such as Jepsen.

>
> So I do not say that discussing all this theoretical questions is not 
> need as formally proven correctness of distributed algorithm.
    I would like to see work forward here, so I really appreciate all 
your work here. I cannot give an opinion on whether the DTM API is good 
or not, but I agree with Robert a good technical discussion on these 
issues is a good, and a needed, starting point. Feedback may also help 
you avoid pitfalls that may have gone unnoticed until tons of code are 
implemented.
    Academical approaches are sometimes "very academical", but studying 
them doesn't hurt either :)

    Álvaro


-- 
Álvaro Hernández Tortosa


-----------
8Kdata




Re: The plan for FDW-based sharding

От
Kevin Grittner
Дата:
On Fri, Feb 26, 2016 at 5:37 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 26 February 2016 at 22:48, Kevin Grittner <kgrittn@gmail.com> wrote:

>> if we want logical
>> replication to be free of serialization anomalies for those using
>> serializable transactions, we need to support applying transactions
>> in an order which may not be the same as commit order -- CSN (as
>> such) would be the wrong thing.  If serializable transaction 1 (T1)
>> modifies a row and concurrent serializable transaction 2 (T2) reads
>> the old version of the row, and modifies something based on that,
>> T2 must be applied to a logical replica first even if T1 commits
>> before it; otherwise the logical replica could see a state not
>> consistent with business rules and which could not have been seen
>> (due to SSI) on the source database.
>
> How would SSI allow that commit order?
>
> Surely there is a read-write dependency that would cause T2 to be
> aborted?

*A* read-write dependency does not cause an abort under SSI, it
takes a *pattern* of read-write dependencies which has been proven
to appear in any set of concurrent transactions which can cause a
serialization anomaly.  A read-only transaction can be part of that
pattern.  On a single database SSI can see whether a read has
caused such a problem.  If you replicate the transactions to
somewhere else and read them SSI cannot tell whether there is an
anomaly (at least, not without exchanging a lot of information that
isn't currently happening), so some other mechanism would probably
need to be used.  One possibility is to pass along information
about when things are in a state on the source that is known to be
free of anomalies if read; another would be to reorder the
application of transactions to match the apparent order of
execution.  The latter would not work for "physical" replication,
but should be fine for logical replication.  An implementation
might create a list in commit order, but not release the front of
the list for processing if it is a SERIALIZABLE transaction which
has written data until all overlapping SERIALIZABLE transactions
complete, so it can move any subsequently-committed SERIALIZABLE
transaction which read the "old" version of the data ahead of it.

>> Any DTM API which does not
>> support some mechanism to rearrange the order of transactions from
>> commit order to some other order (based on, for example, read-write
>> dependencies) is not complete.  If it does support that, it gives
>> us a way forward for presenting consistent data on logical
>> replicas.
>
> You appear to be saying that SSI allows transactions to commit in a
> non-serializable order.

Absolutely not.  If you want to understand this better, this paper
might be helpful:

http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf

> Do you have a test case?

There are a couple in this section of the Wiki page of examples:

https://wiki.postgresql.org/wiki/SSI#Read_Only_Transactions

Just picture the read-only transaction executing on a replica.

Thinking of commit sequence number as the right order to apply
transactions during replication seems to me to be a holdover from
the techniques initially developed for transaction in the 1960s --
specifically, strict two-phase locking (S2PL) is very easy to get
one's head around and when using it the apparent order of execution
always *does* match commit order.  Unfortunately S2PL performs so
poorly that it was ripped out of PostgreSQL years ago.  In general,
I think it is time we gave up on thinking that is based on it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: The plan for FDW-based sharding

От
Konstantin Knizhnik
Дата:
Neither pg_dtm, neither pg_tsdtm supports serializable isolation level.
We implemented distributed snapshot isolation - repeatable-read isolation level.
We also do not support read-committed isolation level now.

We do not try to preserve transaction commit order at all nodes.
But in principle it can be implemented using XTM API: it allows to redefine function which actually sets transaction
status. pg_dtm performs 2PC here.
 
And in principle it is possible to enforce commits in any particular order.

Concerning CSNs, may be you are right and it is not correct to use this notion in this case. Actually there are many
"CSNs"involved in transaction commit.
 
First of all each transaction is assigned local CSN (timestamp) when it is ready to commit. Then CSNs of all nodes are
exchangedand maximal CSN is chosen.
 
This maximum is writen as final transaction CSN and is used in visibility check.


On 02/27/2016 01:48 AM, Kevin Grittner wrote:
> On Fri, Feb 26, 2016 at 2:19 PM, Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>
>> pg_tsdtm  is based on another approach: it is using system time
>> as CSN
> Which brings up an interesting point, if we want logical
> replication to be free of serialization anomalies for those using
> serializable transactions, we need to support applying transactions
> in an order which may not be the same as commit order -- CSN (as
> such) would be the wrong thing.  If serializable transaction 1 (T1)
> modifies a row and concurrent serializable transaction 2 (T2) reads
> the old version of the row, and modifies something based on that,
> T2 must be applied to a logical replica first even if T1 commits
> before it; otherwise the logical replica could see a state not
> consistent with business rules and which could not have been seen
> (due to SSI) on the source database.  Any DTM API which does not
> support some mechanism to rearrange the order of transactions from
> commit order to some other order (based on, for example, read-write
> dependencies) is not complete.  If it does support that, it gives
> us a way forward for presenting consistent data on logical
> replicas.
>
> To avoid confusion, it might be best to reserve CSN for actual
> commit sequence numbers, or at least values which increase
> monotonically with each commit.  The term of art for what I
> described above is "apparent order of execution", so maybe we want
> to use AOE or AOoE for the order we choose to use in a particular
> implementation.  It doesn't seem to me to be outright inaccurate
> for cases where the system time on the various systems is used.
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: The plan for FDW-based sharding

От
Kevin Grittner
Дата:
On Sat, Feb 27, 2016 at 1:14 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:

> We do not try to preserve transaction commit order at all nodes.
> But in principle it can be implemented using XTM API: it allows to redefine
> function which actually sets transaction status.  pg_dtm performs 2PC here.
> And in principle it is possible to enforce commits in any particular order.

That's encouraging.

> Concerning CSNs, may be you are right and it is not correct to use this
> notion in this case. Actually there are many "CSNs" involved in transaction
> commit.

Perhaps we should distinguish "commit sequence number" from "apply
sequence number"?  I really think we need to differentiate the
order to be applied from the order previously committed in order to
avoid long-term confusion.  Calling both "CSN" is going to cause
not only miscommunication but muddled thinking, IMO.

> First of all each transaction is assigned local CSN (timestamp) when it is
> ready to commit. Then CSNs of all nodes are exchanged and maximal CSN is
> chosen.
> This maximum is writen as final transaction CSN and is used in visibility
> check.

Is this an implementation of some particular formal technique?  If
so, do you have a reference to a paper on it?  I get the sense that
there has been a lot written about distributed transactions, and
that it would be a mistake to ignore it, but I have not (yet)
reviewed the literature for it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: The plan for FDW-based sharding

От
Simon Riggs
Дата:
On 27 February 2016 at 17:54, Kevin Grittner <kgrittn@gmail.com> wrote:
On a single database SSI can see whether a read has
caused such a problem.  If you replicate the transactions to
somewhere else and read them SSI cannot tell whether there is an
anomaly

OK, I thought you were saying something else. What you're saying is that SSI doesn't work on replicas, yet, whether that is physical or logical.
 
Row level locking (S2PL) can be used on logical standbys, so its actually a better situation.

(at least, not without exchanging a lot of information that
isn't currently happening), so some other mechanism would probably
need to be used.  One possibility is to pass along information
about when things are in a state on the source that is known to be
free of anomalies if read; another would be to reorder the
application of transactions to match the apparent order of
execution.  The latter would not work for "physical" replication,
but should be fine for logical replication.  An implementation
might create a list in commit order, but not release the front of
the list for processing if it is a SERIALIZABLE transaction which
has written data until all overlapping SERIALIZABLE transactions
complete, so it can move any subsequently-committed SERIALIZABLE
transaction which read the "old" version of the data ahead of it.

The best way would be to pass across "anomaly barriers", since they can easily be inserted into the WAL stream. The main issue seems to be how and when to detect them.

For logical replay, applying in batches is actually a good thing since it allows parallelism. We can remove them all from the target's procarray all at once to avoid intermediate states becoming visible. So that would be the preferred mechanism.

Collecting a list of transactions that must be applied before the current one could be accumulated during SSI processing and added to the commit record. But reordering the transaction apply is something we'd need to get some real clear theory on before we considered it.

Anyway, next release. 

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: The plan for FDW-based sharding

От
Kevin Grittner
Дата:
On Sat, Feb 27, 2016 at 3:57 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 27 February 2016 at 17:54, Kevin Grittner <kgrittn@gmail.com> wrote:
>>
>> On a single database SSI can see whether a read has
>> caused such a problem.  If you replicate the transactions to
>> somewhere else and read them SSI cannot tell whether there is an
>> anomaly
>
> OK, I thought you were saying something else. What you're saying is that SSI
> doesn't work on replicas, yet, whether that is physical or logical.

Right.

> Row level locking (S2PL) can be used on logical standbys, so its actually a
> better situation.

Except that S2PL has the concurrency and performance problems that
caused us to rip out a working S2PL implementation in PostgreSQL
core.  Layering it on outside of that isn't going to offer better
concurrency or perform better than what we ripped out; but it does
work.

>> One possibility is to pass along information
>> about when things are in a state on the source that is known to be
>> free of anomalies if read; another would be to reorder the
>> application of transactions to match the apparent order of
>> execution.  The latter would not work for "physical" replication,
>> but should be fine for logical replication.  An implementation
>> might create a list in commit order, but not release the front of
>> the list for processing if it is a SERIALIZABLE transaction which
>> has written data until all overlapping SERIALIZABLE transactions
>> complete, so it can move any subsequently-committed SERIALIZABLE
>> transaction which read the "old" version of the data ahead of it.
>
> The best way would be to pass across "anomaly barriers", since they can
> easily be inserted into the WAL stream. The main issue seems to be how and
> when to detect them.

That, and how to choose whether to run right away with the last
known consistent snapshot, or wait for the next one.  There seem to
be use cases for both.  None of it seems extraordinarily hard; it's
just never been anyone's top priority.  :-/

> For logical replay, applying in batches is actually a good thing since it
> allows parallelism. We can remove them all from the target's procarray all
> at once to avoid intermediate states becoming visible. So that would be the
> preferred mechanism.

That could be part of a solution.  What I sketched out with the
"apparent order of execution" ordering of the transactions
(basically, commit order except when one SERIALIZABLE transaction
needs to be dragged in front of another due to a read-write
dependency) is possibly the simplest approach, but batching may
well give better performance.

> Collecting a list of transactions that must be applied before the current
> one could be accumulated during SSI processing and added to the commit
> record. But reordering the transaction apply is something we'd need to get
> some real clear theory on before we considered it.

Oh, there is a lot of very clear theory on it.  I even considered
whether it might work at the physical level, but that seems fraught
with potential land-mines due to the subtle ways in which we manage
race conditions at the detail level.  It's one of those things that
seems theoretically possible, but probably a really bad idea in
practice.  For logical replication, though, there is a clear way to
determine a reasonable order of applying changes that will never
yield a serialization anomaly -- if we do that, we dodge the choice
between using a "stale" safe snapshot or waiting an indeterminate
length of time for a "fresh" safe snapshot -- at the cost of
delaying logical replication itself at various points.

Anyway, we seem to be on the same page; just some minor
miscommunication at some point.  I apologize if I was unclear.

Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: The plan for FDW-based sharding

От
Konstantin Knizhnik
Дата:
On 02/27/2016 11:38 PM, Kevin Grittner wrote:
>
> Is this an implementation of some particular formal technique?  If
> so, do you have a reference to a paper on it?  I get the sense that
> there has been a lot written about distributed transactions, and
> that it would be a mistake to ignore it, but I have not (yet)
> reviewed the literature for it.

The reference to the article is at our WiKi pages explaining our DTM: https://wiki.postgresql.org/wiki/DTM

http://research.microsoft.com/en-us/people/samehe/clocksi.srds2013.pdf

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: The plan for FDW-based sharding

От
Simon Riggs
Дата:
On 27 February 2016 at 22:38, Kevin Grittner <kgrittn@gmail.com> wrote:
 
That could be part of a solution.  What I sketched out with the
"apparent order of execution" ordering of the transactions
(basically, commit order except when one SERIALIZABLE transaction
needs to be dragged in front of another due to a read-write
dependency) is possibly the simplest approach, but batching may
well give better performance.

> Collecting a list of transactions that must be applied before the current
> one could be accumulated during SSI processing and added to the commit
> record. But reordering the transaction apply is something we'd need to get
> some real clear theory on before we considered it.

Oh, there is a lot of very clear theory on it.  I even considered
whether it might work at the physical level, but that seems fraught
with potential land-mines due to the subtle ways in which we manage
race conditions at the detail level.  It's one of those things that
seems theoretically possible, but probably a really bad idea in
practice.  For logical replication, though, there is a clear way to
determine a reasonable order of applying changes that will never
yield a serialization anomaly -- if we do that, we dodge the choice
between using a "stale" safe snapshot or waiting an indeterminate
length of time for a "fresh" safe snapshot -- at the cost of
delaying logical replication itself at various points.

I think we're going to have practical difficulties with these concepts.

If an xid commits with inConflicts, those refer to transactions that may not yet have assigned xids. They may be assigned xids for hours or days even so its hard to know whether they will eventually become write transactions or not, making it a challenge to even know whether we should delay. And if even if we did know, delaying apply of commits for hours to allow us to reorder transactions isn't practical in all cases, clearly, more so if the impact is caused by one minor table that nobody much cares about.

What I see as more practical is reducing the scope of "safe transactions" down to "safe scopes", where particular tables or sets of tables are known safe at particular times, so we know more about which things we can look at safely.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: The plan for FDW-based sharding

От
Robert Haas
Дата:
On Sat, Feb 27, 2016 at 2:29 AM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
>> How do you prevent clock skew from causing serialization anomalies?
>
> If node receives message from "feature" it just needs to wait until this
> future arrive.
> Practically we just "adjust" system time in this case, moving it forward
> (certainly system time is not actually changed, we just set correction value
> which need to be added to system time).
> This approach was discussed in the article:
> http://research.microsoft.com/en-us/people/samehe/clocksi.srds2013.pdf
> I hope, in this article algorithm is explained much better than I can do
> here.

Hmm, the approach in that article is very interesting, but it sounds
different than what you are describing - they do not, AFAICT, have
anything like a "correction value".

> There are well know limitation of this  pg_tsdtm which we will try to
> address in future.

How well known are those limitations?  Are they documented somewhere?
Or are they only well-known to you?

> What we want is to include XTM API in PostgreSQL to be able to continue our
> experiments with different transaction managers and implementing multimaster
> on top of it (our first practical goal) without affecting PostgreSQL core.
>
> If XTM patch will be included in 9.6, then we can propose our multimaster as
> PostgreSQL extension and everybody can use it.
> Otherwise we have to propose our own fork of Postgres which significantly
> complicates using and maintaining it.

Well I still think what I said before is valid.  If the code is good,
let it be a core submission.  If it's not ready yet, submit it to core
when it is.  If it can't be made good, forget it.

>> This seems rather defeatist.  If the code is good and reliable, why
>> should it not be committed to core?
>
> Two reasons:
> 1. There is no ideal implementation of DTM which will fit all possible needs
> and be  efficient for all clusters.

Hmm, what is the reasoning behind that statement?  I mean, it is
certainly true that there are some places where we have decided that
one-size-fits-all is not the right approach.  Indexing, for example.
But there are many other places where we have not chosen to make
things pluggable, and that I don't think it should be taken for
granted that plugability is always an advantage.

I fear that building a DTM that is fully reliable and also
well-performing is going to be really hard, and I think it would be
far better to have one such DTM that is 100% reliable than two or more
implementations each of which are 99% reliable.

> 2. Even if such implementation exists, still the right way of it integration
> is Postgres should use kind of TM API.

Sure, APIs are generally good, but that doesn't mean *this* API is good.

> I hope that everybody will agree that doing it in this way:
>
> #ifdef PGXC
>         /* In Postgres-XC, stop timestamp has to follow the timeline of GTM
> */
>         xlrec.xact_time = xactStopTimestamp + GTMdeltaTimestamp;
> #else
>         xlrec.xact_time = xactStopTimestamp;
> #endif

PGXC chose that style in order to simplify merging.  I wouldn't have
picked the same thing, but I don't know why it deserves scorn.

> or in this way:
>
>         xlrec.xact_time = xactUseGTM ? xactStopTimestamp + GTMdeltaTimestamp
> : xactStopTimestamp;
>
> is very very bad idea.

I don't know why that is such a bad idea.  It's a heck of a lot faster
than insisting on calling some out-of-line function.  It might be a
bad idea, but I think we need to decide that, not assume it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: The plan for FDW-based sharding

От
Bruce Momjian
Дата:
On Tue, Mar  1, 2016 at 10:19:45AM -0500, Robert Haas wrote:
> > Two reasons:
> > 1. There is no ideal implementation of DTM which will fit all possible needs
> > and be  efficient for all clusters.
> 
> Hmm, what is the reasoning behind that statement?  I mean, it is
> certainly true that there are some places where we have decided that
> one-size-fits-all is not the right approach.  Indexing, for example.

Uh, is that even true of indexing?  While the plug-in nature of indexing
allows for easier development and testing, does anyone create plug-in
indexing that isn't shipped by us?  I thought WAL support was something
that prevented external indexing solutions from working.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +



Re: The plan for FDW-based sharding

От
Robert Haas
Дата:
On Tue, Mar 1, 2016 at 10:37 AM, Bruce Momjian <bruce@momjian.us> wrote:
> On Tue, Mar  1, 2016 at 10:19:45AM -0500, Robert Haas wrote:
>> > Two reasons:
>> > 1. There is no ideal implementation of DTM which will fit all possible needs
>> > and be  efficient for all clusters.
>>
>> Hmm, what is the reasoning behind that statement?  I mean, it is
>> certainly true that there are some places where we have decided that
>> one-size-fits-all is not the right approach.  Indexing, for example.
>
> Uh, is that even true of indexing?  While the plug-in nature of indexing
> allows for easier development and testing, does anyone create plug-in
> indexing that isn't shipped by us?  I thought WAL support was something
> that prevented external indexing solutions from working.

True.  There is an API, though, and having pluggable WAL support seems
desirable too.  At the same time, I don't think we know of anyone
maintaining a non-core index AM ... and there are probably good
reasons for that.  We end up revising the index AM API pretty
regularly every time somebody wants to do something new, so it's not
really a stable API that extensions can just tap into.  I suspect that
a transaction manager API would end up similarly situated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: The plan for FDW-based sharding

От
Konstantin Knizhnik
Дата:
Thank you very much for you comments.<br /><br /><div class="moz-cite-prefix">On 01.03.2016 18:19, Robert Haas
wrote:<br/></div><blockquote cite="mid:CA+TgmobUnzf+A_Fk_EYXTnHiwVKZsKV=gRVuzTCoibhPWhq2UA@mail.gmail.com"
type="cite"><prewrap="">On Sat, Feb 27, 2016 at 2:29 AM, Konstantin Knizhnik
 
<a class="moz-txt-link-rfc2396E" href="mailto:k.knizhnik@postgrespro.ru"><k.knizhnik@postgrespro.ru></a> wrote:
</pre><blockquote type="cite"><blockquote type="cite"><pre wrap="">How do you prevent clock skew from causing
serializationanomalies?
 
</pre></blockquote><pre wrap="">
If node receives message from "feature" it just needs to wait until this
future arrive.
Practically we just "adjust" system time in this case, moving it forward
(certainly system time is not actually changed, we just set correction value
which need to be added to system time).
This approach was discussed in the article:
<a class="moz-txt-link-freetext"
href="http://research.microsoft.com/en-us/people/samehe/clocksi.srds2013.pdf">http://research.microsoft.com/en-us/people/samehe/clocksi.srds2013.pdf</a>
I hope, in this article algorithm is explained much better than I can do
here.
</pre></blockquote><pre wrap="">
Hmm, the approach in that article is very interesting, but it sounds
different than what you are describing - they do not, AFAICT, have
anything like a "correction value"</pre></blockquote><br /> In the article them used anotion "wait":<br /><br /><div
data-canvas-width="98.2568"style="left: 702.708px; top:     787.766px; font-size: 14.944px; font-family: sans-serif;
transform: scaleX(0.847041);">if T.SnapshotTime>GetClockTime()</div><div data-canvas-width="98.2568" style="left:
787.44px;top:     804.37px; font-size: 14.944px; font-family: sans-serif; transform:     scaleX(0.847041);">then wait
untilT.SnapshotTime<GetClockTime()</div><br /> Originally we really do sleep here, but then we think that instead of
sleepingwe can just adjust local time.<br /> Sorry, I do not have format prove it is equivalent but... at least we have
notencountered any inconsistencies after this fix and performance is improved.<br /><blockquote
cite="mid:CA+TgmobUnzf+A_Fk_EYXTnHiwVKZsKV=gRVuzTCoibhPWhq2UA@mail.gmail.com"type="cite"><pre wrap="">
 

</pre><blockquote type="cite"><pre wrap="">There are well know limitation of this  pg_tsdtm which we will try to
address in future.
</pre></blockquote><pre wrap="">
How well known are those limitations?  Are they documented somewhere?
Or are they only well-known to you?
</pre></blockquote> Sorry, well know for us.<br /> But them are described at DTM wiki page.<br /> Right now pg_tsdtm is
notsupporting correct distributed deadlock detection (is not building global lock graph) and is detecting distributed
deadlocksjust based on timeouts.<br /> It doesn't support explicit locks but "select for update" will work
correctly.<br/><br /><blockquote cite="mid:CA+TgmobUnzf+A_Fk_EYXTnHiwVKZsKV=gRVuzTCoibhPWhq2UA@mail.gmail.com"
type="cite"><prewrap="">
 
</pre><blockquote type="cite"><pre wrap="">What we want is to include XTM API in PostgreSQL to be able to continue our
experiments with different transaction managers and implementing multimaster
on top of it (our first practical goal) without affecting PostgreSQL core.

If XTM patch will be included in 9.6, then we can propose our multimaster as
PostgreSQL extension and everybody can use it.
Otherwise we have to propose our own fork of Postgres which significantly
complicates using and maintaining it.
</pre></blockquote><pre wrap="">
Well I still think what I said before is valid.  If the code is good,
let it be a core submission.  If it's not ready yet, submit it to core
when it is.  If it can't be made good, forget it.</pre></blockquote><br /> I have nothing against committing DTM code
incore. But still the best way of integration it is to use a-la-OO approach.<br /> So still need API. Inserting if-s or
switchesin existed code is IMHO ugly idea.<br /><br /> Also it is not enough for DTM code to be just "good". It should
provideexpected functionality.<br /> But which functionality is expected? From my experience of development different
clustersolutions I can say that<br /> different customers have very different requirements. It is very hard if ever
possibleto satisfy them all.<br /><br /> Right now I do not feel that I can predict all possible requirements to
DTM.<br/> This is why we want to provide some API, propose some implementations of this API, receive feedbecks and get
betterunderstanding which functionality is actually needed by customers.<br /><br /><br /><blockquote
cite="mid:CA+TgmobUnzf+A_Fk_EYXTnHiwVKZsKV=gRVuzTCoibhPWhq2UA@mail.gmail.com"type="cite"><pre wrap="">
 

</pre><blockquote type="cite"><blockquote type="cite"><pre wrap="">This seems rather defeatist.  If the code is good
andreliable, why
 
should it not be committed to core?
</pre></blockquote><pre wrap="">
Two reasons:
1. There is no ideal implementation of DTM which will fit all possible needs
and be  efficient for all clusters.
</pre></blockquote><pre wrap="">
Hmm, what is the reasoning behind that statement?  I mean, it is
certainly true that there are some places where we have decided that
one-size-fits-all is not the right approach.  Indexing, for example.
But there are many other places where we have not chosen to make
things pluggable, and that I don't think it should be taken for
granted that plugability is always an advantage.

I fear that building a DTM that is fully reliable and also
well-performing is going to be really hard, and I think it would be
far better to have one such DTM that is 100% reliable than two or more
implementations each of which are 99% reliable.
</pre></blockquote><br /> The question is not about it's reliability, but mostly about its functionality and
flexibility.<br/><br /><blockquote cite="mid:CA+TgmobUnzf+A_Fk_EYXTnHiwVKZsKV=gRVuzTCoibhPWhq2UA@mail.gmail.com"
type="cite"><prewrap="">
 
</pre><blockquote type="cite"><pre wrap="">2. Even if such implementation exists, still the right way of it
integration
is Postgres should use kind of TM API.
</pre></blockquote><pre wrap="">
Sure, APIs are generally good, but that doesn't mean *this* API is good.</pre></blockquote><br /> Well, I do not what
tosay "better than nothing", but I find this API to be a reasonable compromise between flexibility and minimization of
changesin PostgreSQL core. If you have some suggestions how to improve it,  I will be glad to receive them.<br /><br
/><blockquotecite="mid:CA+TgmobUnzf+A_Fk_EYXTnHiwVKZsKV=gRVuzTCoibhPWhq2UA@mail.gmail.com" type="cite"><pre wrap="">
 

</pre><blockquote type="cite"><pre wrap="">I hope that everybody will agree that doing it in this way:

#ifdef PGXC       /* In Postgres-XC, stop timestamp has to follow the timeline of GTM
*/       xlrec.xact_time = xactStopTimestamp + GTMdeltaTimestamp;
#else       xlrec.xact_time = xactStopTimestamp;
#endif
</pre></blockquote><pre wrap="">
PGXC chose that style in order to simplify merging.  I wouldn't have
picked the same thing, but I don't know why it deserves scorn.

</pre><blockquote type="cite"><pre wrap="">or in this way:
       xlrec.xact_time = xactUseGTM ? xactStopTimestamp + GTMdeltaTimestamp
: xactStopTimestamp;

is very very bad idea.
</pre></blockquote><pre wrap="">
I don't know why that is such a bad idea.  It's a heck of a lot faster
than insisting on calling some out-of-line function.  It might be a
bad idea, but I think we need to decide that, not assume it.

</pre></blockquote> It violates modularity, complicates code, makes it more error prone.<br /> I still prefer to
extractall DTM code in separate module.<br /> It should not necessary be an extension.<br /> But from the other side -
itis not required to put in in core.<br /> At least at this stage. As i already wrote - not just because code is not
goodenough or is not reliable enough,<br /> but because I am not sure that it is fits all (or just most) of use
cases.<br/><br /><pre class="moz-signature" cols="72">-- 
 
Konstantin Knizhnik
Postgres Professional: <a class="moz-txt-link-freetext"
href="http://www.postgrespro.com">http://www.postgrespro.com</a>
The Russian Postgres Company </pre>

Re: The plan for FDW-based sharding

От
Konstantin Knizhnik
Дата:

On 01.03.2016 19:03, Robert Haas wrote:
> On Tue, Mar 1, 2016 at 10:37 AM, Bruce Momjian <bruce@momjian.us> wrote:
>> On Tue, Mar  1, 2016 at 10:19:45AM -0500, Robert Haas wrote:
>>>> Two reasons:
>>>> 1. There is no ideal implementation of DTM which will fit all possible needs
>>>> and be  efficient for all clusters.
>>> Hmm, what is the reasoning behind that statement?  I mean, it is
>>> certainly true that there are some places where we have decided that
>>> one-size-fits-all is not the right approach.  Indexing, for example.
>> Uh, is that even true of indexing?  While the plug-in nature of indexing
>> allows for easier development and testing, does anyone create plug-in
>> indexing that isn't shipped by us?  I thought WAL support was something
>> that prevented external indexing solutions from working.
> True.  There is an API, though, and having pluggable WAL support seems
> desirable too.  At the same time, I don't think we know of anyone
> maintaining a non-core index AM ... and there are probably good
> reasons for that.  We end up revising the index AM API pretty
> regularly every time somebody wants to do something new, so it's not
> really a stable API that extensions can just tap into.  I suspect that
> a transaction manager API would end up similarly situated.
>

IMHO non-stable API is better than lack of API.
Just because it makes it possible to implement features in modular way.
And refactoring of API is not so difficult thing...


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: The plan for FDW-based sharding

От
Petr Jelinek
Дата:
On 01/03/16 18:18, Konstantin Knizhnik wrote:
>
> On 01.03.2016 19:03, Robert Haas wrote:
>> On Tue, Mar 1, 2016 at 10:37 AM, Bruce Momjian <bruce@momjian.us> wrote:
>>> On Tue, Mar  1, 2016 at 10:19:45AM -0500, Robert Haas wrote:
>>>>> Two reasons:
>>>>> 1. There is no ideal implementation of DTM which will fit all
>>>>> possible needs
>>>>> and be  efficient for all clusters.
>>>> Hmm, what is the reasoning behind that statement?  I mean, it is
>>>> certainly true that there are some places where we have decided that
>>>> one-size-fits-all is not the right approach.  Indexing, for example.
>>> Uh, is that even true of indexing?  While the plug-in nature of indexing
>>> allows for easier development and testing, does anyone create plug-in
>>> indexing that isn't shipped by us?  I thought WAL support was something
>>> that prevented external indexing solutions from working.
>> True.  There is an API, though, and having pluggable WAL support seems
>> desirable too.  At the same time, I don't think we know of anyone
>> maintaining a non-core index AM ... and there are probably good
>> reasons for that.  We end up revising the index AM API pretty
>> regularly every time somebody wants to do something new, so it's not
>> really a stable API that extensions can just tap into.  I suspect that
>> a transaction manager API would end up similarly situated.
>>
>
> IMHO non-stable API is better than lack of API.
> Just because it makes it possible to implement features in modular way.
> And refactoring of API is not so difficult thing...
>

Since this thread heavily discusses the XTM, I have question about the 
XTM as proposed because one thing is very unclear to me - what happens 
when user changes the XTM plugin on the server? I didn't see any xid 
handover API which makes me wonder if changes of a plugin (or for 
example failure to load previously used plugin due to admin error) will 
send server to similar situation as xid wraparound.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: The plan for FDW-based sharding

От
Petr Jelinek
Дата:
On 27/02/16 04:54, Robert Haas wrote:
> On Fri, Feb 26, 2016 at 10:56 PM, Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>> We do not have formal prove that proposed XTM is "general enough" to handle
>> all possible transaction manager implementations.
>> But there are two general ways of dealing with isolation: snapshot based and
>> CSN  based.
>
> I don't believe that for a minute.  For example, consider this article:
>
> https://en.wikipedia.org/wiki/Global_serializability
>
> I think the neutrality of that article is *very* debatable, but it
> certainly contradicts the idea that snapshots and CSNs are the only
> methods of achieving global serializability.
>
> Or consider this lecture:
>
> http://hssl.cs.jhu.edu/~randal/416/lectures.old/ln5.2.pdf
>
> That's a great introduction to the problem we're trying to solve here,
> but again, snapshots are not mentioned, and CSNs certainly aren't
> mentioned.
>
> This write-up goes further, explaining three different methods for
> ensuring global serializability, none of which mention snapshots or
> CSNs:
>
> http://heaven.eee.metu.edu.tr/~vision/LectureNotes/EE442/Ee442ch7.html
>
> Actually, I think the second approach is basically a snapshot/CSN-type
> approach, but it doesn't use that terminology and the connection to
> what you are proposing is very unclear.
>
> I think you're approaching this problem from a viewpoint that is
> entirely too focused on the code that exists in PostgreSQL today.
> Lots of people have done lots of academic research on how to solve
> this problem, and you can't possibly say that CSNs and snapshots are
> the only solution to this problem unless you haven't read any of those
> papers.  The articles above aren't exceptional in mentioning neither
> of the approaches that you are advocating - they are typical of the
> literature in this area.  How can it be that the only solutions to
> this problem are ones that are totally different from the approaches
> that university professors who spend time doing research on
> concurrency have spent time exploring?
>
> I think we need to back up here and examine our underlying design
> assumptions.  The goal here shouldn't necessarily be to replace
> PostgreSQL's current transaction management with a distributed version
> of the same thing.  We might want to do that, but I think the goal is
> or should be to provide ACID semantics in a multi-node environment,
> and specifically the I in ACID: transaction isolation.  Making the
> existing transaction manager into something that can be spread across
> multiple nodes is one way of accomplishing that.  Maybe the best one.
> Certainly one that's been experimented within Postgres-XC.  But it is
> often the case that an algorithm that works tolerably well on a single
> machine starts performing extremely badly in a distributed
> environment, because the latency of communicating between multiple
> systems is vastly higher than the latency of communicating between
> CPUs or cores on the same system.  So I don't think we should be
> assuming that's the way forward.
>

I have similar problem with the FDW approach though. It seems to me like 
because we have something that solves access to external tables somebody 
decided that it should be used as base for the whole sharding solution 
but there is no real concept of how it will look like together, no ideas 
what it will be usable for and not even simple prototype that would 
prove that the idea is sound (although again, I am not clear on what the 
actual idea is beyond "we will use FDWs").

Don't get me wrong, I agree that the current FDW enhancements are 
useful, I am just worried about them being presented as future of 
sharding in Postgres when nobody has sketched how the future might look 
like. And  once we get to more interesting parts like consistency, 
distributed query planning, p2p connections (and I am really concerned 
about these as FDWs abstract some knowledge that coordinator and or data 
nodes might need to do these well), etc we might very well find 
ourselves painted in the corner and have to start from beginning, while 
if we had some idea on how the whole thing might look like we could 
identify this early and not postpone built-in sharding by several years 
just because somebody said we will use FDWs and that's what we worked on 
in those years.

Note that I am not saying that other discussed approaches are any 
better, I am saying that we should know approximately what we actually 
want and not just beat FDWs with a hammer and hope sharding will 
eventually emerge and call that the plan.

--   Petr Jelinek                  http://www.2ndQuadrant.com/  PostgreSQL Development, 24x7 Support, Training &
Services



Re: The plan for FDW-based sharding

От
Bruce Momjian
Дата:
On Tue, Mar  1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote:
> Note that I am not saying that other discussed approaches are any
> better, I am saying that we should know approximately what we
> actually want and not just beat FDWs with a hammer and hope sharding
> will eventually emerge and call that the plan.

I will say it again --- FDWs are the only sharding method I can think of
that has a chance of being accepted into Postgres core.  It is a plan,
and if it fails, it fails.  If is succeeds, that's good.  What more do
you want me to say?  I know of no other way to answer the questions you
asked above.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +



Re: The plan for FDW-based sharding

От
Bruce Momjian
Дата:
On Tue, Mar  1, 2016 at 02:02:44PM -0500, Bruce wrote:
> On Tue, Mar  1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote:
> > Note that I am not saying that other discussed approaches are any
> > better, I am saying that we should know approximately what we
> > actually want and not just beat FDWs with a hammer and hope sharding
> > will eventually emerge and call that the plan.
> 
> I will say it again --- FDWs are the only sharding method I can think of
> that has a chance of being accepted into Postgres core.  It is a plan,
> and if it fails, it fails.  If is succeeds, that's good.  What more do
> you want me to say?  I know of no other way to answer the questions you
> asked above.

I guess all I can say is that if FDWs existed when Postgres XC/XL were
being developed, that they likely would have been used or at least
considered.  I think we are basically making that attempt now.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +



Re: The plan for FDW-based sharding

От
Konstantin Knizhnik
Дата:
On 03/01/2016 09:19 PM, Petr Jelinek wrote:
>
> Since this thread heavily discusses the XTM, I have question about the XTM as proposed because one thing is very
unclearto me - what happens when user changes the XTM plugin on the server? I didn't see any xid handover API which
makesme wonder if 
 
> changes of a plugin (or for example failure to load previously used plugin due to admin error) will send server to
similarsituation as xid wraparound.
 
>
Transaction manager is very "intimate" part of DBMS and certainly bugs and problems in custom TM implementation can
breakthe server.
 
So if you are providing custom TM implementation, you should take full responsibility on system integrity.
XTM API itself doesn't enforce any XID handling policy. As far as we do not want to change tuple header format, XID is
still32-bit integer.
 

In case of pg_dtm, global transactions at all nodes are assigned the same XID by arbiter. Arbiter is handling XID
wraparound.
In pg_tsdtm each node maintains its own XIDs, actually pg_tsdtm doesn't change way of assigning CIDs by Postgres. So
wraparoundin this case is handled in standard way. Instead of assigning own global XIDs, pg_tsdtm provides mapping
betweenlocal XIDs and 
 
global CSNs. Visibility checking rules looks on CSNs, not on XIDs.

In both cases if system is for some reasons restarted and DTM plugin failed to be loaded, you can still access database
locally.No data can be lost.
 


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: The plan for FDW-based sharding

От
Tomas Vondra
Дата:
Hi,

On 03/01/2016 08:02 PM, Bruce Momjian wrote:
> On Tue, Mar  1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote:
>> Note that I am not saying that other discussed approaches are any
>> better, I am saying that we should know approximately what we
>> actually want and not just beat FDWs with a hammer and hope sharding
>> will eventually emerge and call that the plan.
>
> I will say it again --- FDWs are the only sharding method I can think
> of that has a chance of being accepted into Postgres core.

I don't quite see why that would be the case. Firstly, it assumes that 
FDW-based approach is going to work, but given the lack of prototype or 
even a technical analysis discussing the missing pieces, that's very 
difficult to judge.

I find it a bit annoying that there are objections from people who 
implemented (or attempted to implement) sharding on PostgreSQL, yet no 
reasonable analysis of their arguments and how the FDW approach will 
address them. My my understanding is they deem FDWs a bad foundation for 
sharding because it was designed for a different purpose, but the 
abstractions are a bad fit for sharding (which assumes isolated nodes, 
certain form of execution etc.)

> It is a plan, and if it fails, it fails. If is succeeds, that's
> good. What more do you want me to say? I know of no other way to
> answer the questions you asked above.

Well, wouldn't it be great if we could do the decision based on some 
facts and not mere belief that it'll help. That's exactly what Petr is 
talking about - the fear that we'll spend a few years working on 
sharding based on FDWs, only to find out that it does not work too well. 
That'd be a pretty bad outcome, wouldn't it?

My other worry is that we'll eventually mess the FDW infrastructure, 
making it harder to use for the original purpose. Granted, most of the 
improvements proposed so far look sane and useful for FDWs in general, 
but sooner or later that ceases to be the case - there sill be changes 
needed merely for the sharding. Those will be tough decisions.

While I disagree with Simon on various things, I absolutely understand 
why he was asking about a prototype, and some sort of analysis of what 
usecases we expect to support initially/later/never, and what pieces are 
missing to get the sharding working. IIRC at the FOSDEM Dev Meeting 
you've claimed you're essentially working on a prototype - once we have 
the missing FDW pieces, we'll know if it works. I disagree that - it's 
not a prototype if it takes several years to find the outcome.

Also, in another branch of this thread you've said this (I don't want to 
sprinkle the thread with responses, so I'll just respond here):

> In a way, I don't see any need for an FDW sharding prototype
> because, as I said, we already know XC/XL work, so copying what they
> do doesn't help. What we need to know is if we can get near the XC/XL
>  benchmarks with an acceptable addition of code, which is what I
> thought I already said. Perhaps this can be done with FDWs, or some
> other approach I have not heard of yet.

I don't quite understand the reasoning presented here. The XC/XL are not 
based on FDWs at all, therefore the need for prototype of the FDW-based 
sharding is entirely independent to the fact that these solutions seem 
to work quite well.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: The plan for FDW-based sharding

От
Oleg Bartunov
Дата:


On Tue, Mar 1, 2016 at 7:03 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Mar 1, 2016 at 10:37 AM, Bruce Momjian <bruce@momjian.us> wrote:
> On Tue, Mar  1, 2016 at 10:19:45AM -0500, Robert Haas wrote:
>> > Two reasons:
>> > 1. There is no ideal implementation of DTM which will fit all possible needs
>> > and be  efficient for all clusters.
>>
>> Hmm, what is the reasoning behind that statement?  I mean, it is
>> certainly true that there are some places where we have decided that
>> one-size-fits-all is not the right approach.  Indexing, for example.
>
> Uh, is that even true of indexing?  While the plug-in nature of indexing
> allows for easier development and testing, does anyone create plug-in
> indexing that isn't shipped by us?  I thought WAL support was something
> that prevented external indexing solutions from working.

True.  There is an API, though, and having pluggable WAL support seems
desirable too.  At the same time, I don't think we know of anyone
maintaining a non-core index AM ... and there are probably good
reasons for that.  We end up revising the index AM API pretty

We'd love to develop new special index AM, that's why we all are for pluggable WAL. I think there are will be other AM developers, once we open the door for that.
 
regularly every time somebody wants to do something new, so it's not
really a stable API that extensions can just tap into.  I suspect that
a transaction manager API would end up similarly situated.

I don't expect many other TM developers, so there is no problem with improving API. We started from practical needs and analyses of many academical papers. We spent a year to play with several prototypes to prove our proposed API (expect more in several months). Everybody could download them a test. Wish we can do that with FDW-based sharding solution.

Of course, we can fork postgres as XC/XL people did and certainly eventually will do, if community don't accept our proposal, since it's very difficult to work on cross-releases projects. But then there are will be no winners, so why do we all are aggressively don't understand each other ! I was watching  XC/XL for years and thought I don't want to go this way of isolation from the community, so we decided to let TM pluggable to stay with community and let everybody prove their concepts. if you have ideas how to improve TM API, we are open, if you know it's broken by design, let's help us to fix it.  I have my understanding about FDW, but I deliberately don't participate in some very hot discussion, just because I feel myself not commited to work on. Your group is very enthusiastic on FDW, it's ok until you improve FDW in general way, I'm very happy on current work.  I prefer you show prototype of sharding solution, which convince us in functionality and perfromance. I agree with Thomas Vondra, that we don't want to wait for years to see the result, we want to expect results, based on prototype, which should be done between releases. If you don't have enough resources for this, let's do together with community.  Nobody as I've seen are against FDW sharding, people complained about "the only sharding solution" in postgres, without proving so.
 


 

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: The plan for FDW-based sharding

От
Oleg Bartunov
Дата:


On Wed, Mar 2, 2016 at 4:36 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:

Hi,

On 03/01/2016 08:02 PM, Bruce Momjian wrote:
On Tue, Mar  1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote:
Note that I am not saying that other discussed approaches are any
better, I am saying that we should know approximately what we
actually want and not just beat FDWs with a hammer and hope sharding
will eventually emerge and call that the plan.

I will say it again --- FDWs are the only sharding method I can think
of that has a chance of being accepted into Postgres core.



While I disagree with Simon on various things, I absolutely understand why he was asking about a prototype, and some sort of analysis of what usecases we expect to support initially/later/never, and what pieces are missing to get the sharding working. IIRC at the FOSDEM Dev Meeting you've claimed you're essentially working on a prototype - once we have the missing FDW pieces, we'll know if it works. I disagree that - it's not a prototype if it takes several years to find the outcome.


fully agree. Probably, we all need to help to build prototype in between-releases period. I see no legal way to resolve the situation.
 

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: The plan for FDW-based sharding

От
Alexander Korotkov
Дата:
On Tue, Mar 1, 2016 at 7:03 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Mar 1, 2016 at 10:37 AM, Bruce Momjian <bruce@momjian.us> wrote:
> On Tue, Mar  1, 2016 at 10:19:45AM -0500, Robert Haas wrote:
>> > Two reasons:
>> > 1. There is no ideal implementation of DTM which will fit all possible needs
>> > and be  efficient for all clusters.
>>
>> Hmm, what is the reasoning behind that statement?  I mean, it is
>> certainly true that there are some places where we have decided that
>> one-size-fits-all is not the right approach.  Indexing, for example.
>
> Uh, is that even true of indexing?  While the plug-in nature of indexing
> allows for easier development and testing, does anyone create plug-in
> indexing that isn't shipped by us?  I thought WAL support was something
> that prevented external indexing solutions from working.

True.  There is an API, though, and having pluggable WAL support seems
desirable too.  At the same time, I don't think we know of anyone
maintaining a non-core index AM ... and there are probably good
reasons for that.

It's because we didn't offer legal mechanism for pluggable AMs.
 
We end up revising the index AM API pretty
regularly every time somebody wants to do something new, so it's not
really a stable API that extensions can just tap into.

I can't buy this argument. One may say this about any single API. Thinking so will lead you to rejecting any extendability. And that would be in direct contradiction to original postgres concept.
During last 5 years we add 2 new AMs: SP-GiST and BRIN. And BRIN is very different from any other AM we have before.
And I wouldn't say that AM API have dramatical changes during that time. There were some changes. But it would be normal work for extension maintainers to adopt these changes like they do for other API changes.

There is simple example where we lack of extensible AMs: fast full-text search. We can't provide it with current GIN, because we lack of positional information in it. And we can't push these advances into core because current implementation has not perfect design. Ideal design would be push all required functionality into btree, then make GIN wrapper over btree then add required functionality. But this is roadmap for 5-10 years. These 5-10 years uses will suffer from having 3-rd party solutions for fast FTS instead of in core one. But our design questions is actually not something that users care about. It's not reliability questions. And having pluggable AMs would be really chance in this situation. Users could use extension right now. And then when after many years we finally implement the right design, they could migrate to in-core solution. But 5-10 years of fast FTS does matter.
 
I suspect that
a transaction manager API would end up similarly situated.

I disagree with you about AM API. But I agree that TM API should be in similar situation to AM API.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: The plan for FDW-based sharding

От
Alexander Korotkov
Дата:
On Tue, Mar 1, 2016 at 10:11 PM, Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Mar  1, 2016 at 02:02:44PM -0500, Bruce wrote:
> On Tue, Mar  1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote:
> > Note that I am not saying that other discussed approaches are any
> > better, I am saying that we should know approximately what we
> > actually want and not just beat FDWs with a hammer and hope sharding
> > will eventually emerge and call that the plan.
>
> I will say it again --- FDWs are the only sharding method I can think of
> that has a chance of being accepted into Postgres core.  It is a plan,
> and if it fails, it fails.  If is succeeds, that's good.  What more do
> you want me to say?  I know of no other way to answer the questions you
> asked above.

I guess all I can say is that if FDWs existed when Postgres XC/XL were
being developed, that they likely would have been used or at least
considered.  I think we are basically making that attempt now.

If FDWs existed then Postgres XC/XL were being developed then I believe they would try to build full-featured prototype of FDW based sharding. If this prototype succeed then we could make a full roadmap.
For now, we don't have a full roadmap, we have only some pieces. This is why people doubt. When you're speaking about advances that are natural to FDW, then no problem, nobody is against FDW advances. However, other things are unclear.
You can try to build full-featured prototype to convince people. Despite it would take some resources it will save more resources because it would save us from errors.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: The plan for FDW-based sharding

От
Konstantin Knizhnik
Дата:

On 01.03.2016 22:02, Bruce Momjian wrote:
> On Tue, Mar  1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote:
>> Note that I am not saying that other discussed approaches are any
>> better, I am saying that we should know approximately what we
>> actually want and not just beat FDWs with a hammer and hope sharding
>> will eventually emerge and call that the plan.
> I will say it again --- FDWs are the only sharding method I can think of
> that has a chance of being accepted into Postgres core.  It is a plan,
> and if it fails, it fails.  If is succeeds, that's good.  What more do
> you want me to say?  I know of no other way to answer the questions you
> asked above.
>
I do not understand why it can fail.
FDW approach may be not flexible enough for building optimal distributed
query execution plans for complex OLAP queries.
But for simple queries it should work fine. Simple queries corresponds
OLTP and simple OLAP.
For OLTP we definitely need transaction manager to provide global
consistency.
And we have actually prototype of integration postgres_fdw with out
pg_dtm and pg_tsdtm transaction managers.
The results are quite IMHO promising (see attached diagram).

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Вложения

Re: The plan for FDW-based sharding

От
Josh berkus
Дата:
On 02/24/2016 01:22 AM, Konstantin Knizhnik wrote:
> Sorry, but based on this plan it is possible to make a conclusion that
> there are only two possible cluster solutions for Postgres:
> XC/XL and FDW-based.  From my point of view there are  much more
> possible alternatives.

Definitely.

Currently we have five approaches to sharding inside postgres in the 
field, in chronological order:

1. Greenplum's executor-based approach with motion nodes

2. Skype's function-based approach (PL/proxy)

3. XC/XL's approach, which I believe is also query executor-based

4. CitusDB's pg_shard which is based on query hooks

5. FDW-based (currently theoretical)

One of the things which causes bad reactions and arguments, Bruce, is 
that a lot of your posts and presentations detailing plans for the FDW 
approach carry the subtext that all four of the other approaches are 
dead ends and not worth considering.  Given that the other approaches, 
whatever their limitations, have working code in the field and the FDW 
approach does not, that's more than a little offensive.

If we want to move forwards on serious work on FDW-based sharding, the 
folks working on it should stop treating it as a "fait accompli" that 
this is the Chosen Way for the PostgreSQL project.  Otherwise, you'll 
spend all of your time arguing that point instead of working on features 
that matter.

Bruce made a long comparison with built-in replication, but there's a 
big difference here.  We decided that WAL-based replication was the way 
to go for built-in as a community decision here on -hackers and at 
various conferences.  Both the plan and the implementation for 
replication transcended company backing, involving even active 
competitors, and involved discussions with maintainers of the older 
replication projects.

In contrast, this FDW plan *still* feels very much like a small group 
made up of employees of only two companies came up with it in private 
and decided that it should be the plan for the whole project.  I know 
that Bruce and others have good reasons for starting the FDW project, 
but there hasn't been much of an attempt to obtain community consensus 
around it. If Bruce and others want contributors to work on FDWs instead 
of other sharding approaches, then they need to win over those people as 
to why they should do that.  It's how this community works.

Alternately, you can just work on the individual FDW features, which 
*everyone* thinks are a good idea, and when most of them are done, 
FDW-based scaleout will be such an obvious solution that nobody will 
argue with it.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)



Re: The plan for FDW-based sharding

От
Alexander Korotkov
Дата:
On Wed, Mar 2, 2016 at 9:53 PM, Josh berkus <josh@agliodbs.com> wrote:
On 02/24/2016 01:22 AM, Konstantin Knizhnik wrote:
Sorry, but based on this plan it is possible to make a conclusion that
there are only two possible cluster solutions for Postgres:
XC/XL and FDW-based.  From my point of view there are  much more
possible alternatives.

Definitely.

Currently we have five approaches to sharding inside postgres in the field, in chronological order:

1. Greenplum's executor-based approach with motion nodes

2. Skype's function-based approach (PL/proxy)

3. XC/XL's approach, which I believe is also query executor-based

4. CitusDB's pg_shard which is based on query hooks

5. FDW-based (currently theoretical)

One of the things which causes bad reactions and arguments, Bruce, is that a lot of your posts and presentations detailing plans for the FDW approach carry the subtext that all four of the other approaches are dead ends and not worth considering.  Given that the other approaches, whatever their limitations, have working code in the field and the FDW approach does not, that's more than a little offensive.

If we want to move forwards on serious work on FDW-based sharding, the folks working on it should stop treating it as a "fait accompli" that this is the Chosen Way for the PostgreSQL project.  Otherwise, you'll spend all of your time arguing that point instead of working on features that matter.

Bruce made a long comparison with built-in replication, but there's a big difference here.  We decided that WAL-based replication was the way to go for built-in as a community decision here on -hackers and at various conferences.  Both the plan and the implementation for replication transcended company backing, involving even active competitors, and involved discussions with maintainers of the older replication projects.

In contrast, this FDW plan *still* feels very much like a small group made up of employees of only two companies came up with it in private and decided that it should be the plan for the whole project.  I know that Bruce and others have good reasons for starting the FDW project, but there hasn't been much of an attempt to obtain community consensus around it. If Bruce and others want contributors to work on FDWs instead of other sharding approaches, then they need to win over those people as to why they should do that.  It's how this community works.

Alternately, you can just work on the individual FDW features, which *everyone* thinks are a good idea, and when most of them are done, FDW-based scaleout will be such an obvious solution that nobody will argue with it.

+1

Thank you, Josh. I think this is excellent summary for conversation about FDW-based sharding.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 

Re: The plan for FDW-based sharding

От
Michael Paquier
Дата:
On Wed, Mar 2, 2016 at 6:54 PM, Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> If FDWs existed then Postgres XC/XL were being developed then I believe they
> would try to build full-featured prototype of FDW based sharding. If this
> prototype succeed then we could make a full roadmap.

Speaking here with my XC hat, that's actually the case. A couple of
years back when I worked on it, there were discussions about reusing
FDW routines for the purpose of XC, which would have been roughly
reusing postgres_fdw + the possibility to send XID, snapshot and
transaction timestamp to the remote nodes after getting that from the
GTM (global transaction manager ensuring global data visibility and
consistency), and have the logic for query pushdown in the FDW itself
when planning query on what would have been roughly foreign tables
(not entering in the details here, those would have not been entirely
foreign tables). At this point the global picture was not completely
set, XC being based on 9.1~9.2 and the FDW base routines were not as
extended as they are now. As history has told, this global picture has
never showed up, though it would should XC have been merged with 9.3.
The point is that XC would have moved as using the FDW approach, as a
set of plugins.

This was a reason behind this email of 2013 on -hackers actually:
http://www.postgresql.org/message-id/CAB7nPqTDjf-58wuf-xZ01NKJ7WF0E+EUKgGQHd0igVsOD4hCJQ@mail.gmail.com

There were as well discussions about making the connection pooler a
background worker and plug in that in a shared memory context that all
backends connecting to this XC-like-postgres_fdw would use, though
this is another story, for another time...
-- 
Michael



Re: The plan for FDW-based sharding

От
Tatsuo Ishii
Дата:
> On Wed, Mar 2, 2016 at 6:54 PM, Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
>> If FDWs existed then Postgres XC/XL were being developed then I believe they
>> would try to build full-featured prototype of FDW based sharding. If this
>> prototype succeed then we could make a full roadmap.
> 
> Speaking here with my XC hat, that's actually the case. A couple of
> years back when I worked on it, there were discussions about reusing
> FDW routines for the purpose of XC, which would have been roughly
> reusing postgres_fdw + the possibility to send XID, snapshot and
> transaction timestamp to the remote nodes after getting that from the
> GTM (global transaction manager ensuring global data visibility and
> consistency), and have the logic for query pushdown in the FDW itself
> when planning query on what would have been roughly foreign tables
> (not entering in the details here, those would have not been entirely
> foreign tables). At this point the global picture was not completely
> set, XC being based on 9.1~9.2 and the FDW base routines were not as
> extended as they are now. As history has told, this global picture has
> never showed up, though it would should XC have been merged with 9.3.
> The point is that XC would have moved as using the FDW approach, as a
> set of plugins.
> 
> This was a reason behind this email of 2013 on -hackers actually:
> http://www.postgresql.org/message-id/CAB7nPqTDjf-58wuf-xZ01NKJ7WF0E+EUKgGQHd0igVsOD4hCJQ@mail.gmail.com
> 
> There were as well discussions about making the connection pooler a
> background worker and plug in that in a shared memory context that all
> backends connecting to this XC-like-postgres_fdw would use, though
> this is another story, for another time...

Thanks for the history. Very interesting...

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: The plan for FDW-based sharding

От
Oleg Bartunov
Дата:
<p dir="ltr"><br /> On Mar 3, 2016 4:47 AM, "Michael Paquier" <<a
href="mailto:michael.paquier@gmail.com">michael.paquier@gmail.com</a>>wrote:<br /> ><br /> > On Wed, Mar 2,
2016at 6:54 PM, Alexander Korotkov<br /> > <<a
href="mailto:a.korotkov@postgrespro.ru">a.korotkov@postgrespro.ru</a>>wrote:<br /> > > If FDWs existed then
PostgresXC/XL were being developed then I believe they<br /> > > would try to build full-featured prototype of
FDWbased sharding. If this<br /> > > prototype succeed then we could make a full roadmap.<br /> ><br /> >
Speakinghere with my XC hat, that's actually the case. A couple of<br /> > years back when I worked on it, there
werediscussions about reusing<br /> > FDW routines for the purpose of XC, which would have been roughly<br /> >
reusingpostgres_fdw + the possibility to send XID, snapshot and<br /> > transaction timestamp to the remote nodes
aftergetting that from the<br /> > GTM (global transaction manager ensuring global data visibility and<br /> >
consistency),and have the logic for query pushdown in the FDW itself<br /> > when planning query on what would have
beenroughly foreign tables<br /> > (not entering in the details here, those would have not been entirely<br /> >
foreigntables). At this point the global picture was not completely<br /> > set, XC being based on 9.1~9.2 and the
FDWbase routines were not as<br /> > extended as they are now. As history has told, this global picture has<br />
>never showed up, though it would should XC have been merged with 9.3.<br /> > The point is that XC would have
movedas using the FDW approach, as a<br /> > set of plugins.<br /> ><br /> > This was a reason behind this
emailof 2013 on -hackers actually:<br /> > <a
href="http://www.postgresql.org/message-id/CAB7nPqTDjf-58wuf-xZ01NKJ7WF0E+EUKgGQHd0igVsOD4hCJQ@mail.gmail.com">http://www.postgresql.org/message-id/CAB7nPqTDjf-58wuf-xZ01NKJ7WF0E+EUKgGQHd0igVsOD4hCJQ@mail.gmail.com</a><p
dir="ltr">Goodto remember!  <br /><p dir="ltr">> Michael<br /> ><br /> ><br /> > --<br /> > Sent via
pgsql-hackersmailing list (<a href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)<br /> >
Tomake changes to your subscription:<br /> > <a
href="http://www.postgresql.org/mailpref/pgsql-hackers">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/> 

Re: The plan for FDW-based sharding

От
Robert Haas
Дата:
On Wed, Mar 2, 2016 at 1:53 PM, Josh berkus <josh@agliodbs.com> wrote:
> One of the things which causes bad reactions and arguments, Bruce, is that a
> lot of your posts and presentations detailing plans for the FDW approach
> carry the subtext that all four of the other approaches are dead ends and
> not worth considering.  Given that the other approaches, whatever their
> limitations, have working code in the field and the FDW approach does not,
> that's more than a little offensive.

Yeah, I agree with that.  I am utterly mystified by why Bruce keeps
beating this drum, and am frankly pretty annoyed about it.  In the
first place, he seems to think that he invented the idea of using FDWs
for sharding in PostgreSQL, but I don't think that's true.  I think it
was partly my idea, and partly something that the NTT folks have been
working on for years (cf, e.g.,
cb1ca4d800621dcae67ca6c799006de99fa4f0a5).  As far as I understand it,
Bruce came in near the end of that conversation and now wants to claim
credit for something that doesn't really exist yet and, to the extent
that it does exist, wasn't even his idea.  In the second place, the
only thing that these repeated emails and development meeting
discussions of the topic actually accomplish is to be piss people off.
I do believe that enhancing the foreign data wrapper interface can be
part of a horizontal scalability story for PostgreSQL, but as long as
nobody is objecting to the individual enhancements, which I don't see
anybody doing, then why the heck do we have to keep arguing about this
big picture story?  It doesn't matter at all, and it doesn't even
really exist, yet somehow Bruce keeps bringing it up, which I think
serves no useful purpose whatsoever.

> If we want to move forwards on serious work on FDW-based sharding, the folks
> working on it should stop treating it as a "fait accompli" that this is the
> Chosen Way for the PostgreSQL project.  Otherwise, you'll spend all of your
> time arguing that point instead of working on features that matter.

The only person treating it that way is Bruce.

> In contrast, this FDW plan *still* feels very much like a small group made
> up of employees of only two companies came up with it in private and decided
> that it should be the plan for the whole project.  I know that Bruce and
> others have good reasons for starting the FDW project, but there hasn't been
> much of an attempt to obtain community consensus around it. If Bruce and
> others want contributors to work on FDWs instead of other sharding
> approaches, then they need to win over those people as to why they should do
> that.  It's how this community works.

There hasn't been much of an attempt to obtain community consensus
about it because there isn't actually some grand plan, private or
otherwise, much as Bruce's emails might make you think otherwise.
EnterpriseDB *does* have a plan to try to continue enhancing foreign
data wrappers so that you can run queries against foreign tables and
get reasonable plans, something that currently isn't true.  I haven't
heard anybody objecting to that, and I don't expect to hear anybody
objecting to that, because it's hard to imagine why you wouldn't want
queries against foreign data wrappers to produce better plans than
they do today.  At worst, you might think it doesn't matter either
way, but actually, I think there are a substantial number of people
who are pretty happy about join pushdown and I expect that when and if
we get aggregate pushdown working there will be even more people who
are happy about that.

The only other ongoing work that EnterpriseDB has that at all touches
on this area is Ashutosh Bapat's work on 2PC for FDWs.  I'm not
convinced that's fully baked, and it conflicts with the XTM stuff the
Postgres Pro guys are doing, which I *also* don't think is fully
baked, so I'm not real keen on pressing forward aggressively with
either approach right now.  I think we (eventually) need a solution to
the problem of consistent cross-node consistency, but I am deeply
unconvinced that anything currently on the table is going to get us
there.  I did recommend the 2PC for FDW project, but I'm not amazingly
happy with how it came out, and I think we need to think harder about
other approaches before adopting something.

> Alternately, you can just work on the individual FDW features, which
> *everyone* thinks are a good idea, and when most of them are done, FDW-based
> scaleout will be such an obvious solution that nobody will argue with it.

That's exactly what the people at EnterpriseDB who are actually doing
work in this area are attempting to do.  Meanwhile, there's also
Bruce, who is neither doing nor planning to do any work in this area,
nor advising either EnterpriseDB or the PostgreSQL community to
undertake any particular project, but who *is* making it sound like
there is a super sekret plan that nobody else gets to see.  However,
as the guy who actually wrote the plan that EnterpriseDB is following,
I happen to know that there's nothing more to it than what I wrote
above.

Argh!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: The plan for FDW-based sharding

От
Robert Haas
Дата:
On Tue, Mar 1, 2016 at 12:07 PM, Konstantin Knizhnik
<k.knizhnik@postgrespro.ru> wrote:
> In the article them used anotion "wait":
>
> if T.SnapshotTime>GetClockTime()
> then wait until T.SnapshotTime<GetClockTime()
>
> Originally we really do sleep here, but then we think that instead of
> sleeping we can just adjust local time.
> Sorry, I do not have format prove it is equivalent but... at least we have
> not encountered any inconsistencies after this fix and performance is
> improved.

I think that those things are probably not equivalent.  They would be
if you could cause the adjustment to advance in lock-step on every
node at the same time, but you probably can't.  And I think it is
extremely unwise to assume that the fact that nothing obviously broke
means that you got it right.  This is the sort of work where formal
proofs of correctness are, IMHO, extremely wise.

> I fear that building a DTM that is fully reliable and also
> well-performing is going to be really hard, and I think it would be
> far better to have one such DTM that is 100% reliable than two or more
> implementations each of which are 99% reliable.
>
> The question is not about it's reliability, but mostly about its
> functionality and flexibility.

Well, *my* concern is about reliability.  A lot of code can be made
faster at the price of less reliability, but that usually doesn't work
out well in the end.  Performance matters too, of course, but the way
to get there is to start with a good algorithm, write reliable code to
implement it, and then optimize.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: The plan for FDW-based sharding

От
"Joshua D. Drake"
Дата:
On 03/04/2016 04:41 PM, Robert Haas wrote:
> As far as I understand it,
> Bruce came in near the end of that conversation and now wants to claim
> credit for something that doesn't really exist yet and, to the extent
> that it does exist, wasn't even his idea.

Robert,

This does not sound like Bruce at all. Bruce is a lot of things, 
stubborn, sometimes temperamental, a lot of times like you... a hot head 
but he does not take credit for other people's work in my experience.

> get reasonable plans, something that currently isn't true.  I haven't
> heard anybody objecting to that, and I don't expect to hear anybody
> objecting to that, because it's hard to imagine why you wouldn't want
> queries against foreign data wrappers to produce better plans than
> they do today.  At worst, you might think it doesn't matter either
> way, but actually, I think there are a substantial number of people
> who are pretty happy about join pushdown and I expect that when and if
> we get aggregate pushdown working there will be even more people who
> are happy about that.

Agreed.

> That's exactly what the people at EnterpriseDB who are actually doing
> work in this area are attempting to do.  Meanwhile, there's also
> Bruce, who is neither doing nor planning to do any work in this area,
> nor advising either EnterpriseDB or the PostgreSQL community to
> undertake any particular project, but who *is* making it sound like
> there is a super sekret plan that nobody else gets to see.  However,

I don't see this Robert. I don't see some secret hidden plan. I don't 
see any cabal. I see a guy that has an idea, just like everyone else on 
this list.

> as the guy who actually wrote the plan that EnterpriseDB is following,
> I happen to know that there's nothing more to it than what I wrote
> above.

Even if there was, so what? IF EDB wants to have a secret plan to push a 
lot of cool features to .Org, who cares? In the end, it all has to go 
through peer review and the meritocracy anyway.

Sincerely,

JD




-- 
Command Prompt, Inc.                  http://the.postgres.company/                        +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.



Re: The plan for FDW-based sharding

От
Robert Haas
Дата:
On Fri, Mar 4, 2016 at 8:27 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
> This does not sound like Bruce at all. Bruce is a lot of things, stubborn,
> sometimes temperamental, a lot of times like you... a hot head but he does
> not take credit for other people's work in my experience.

On the whole, Bruce is a much nicer guy than I am.  But I can't see
eye to eye with him on this.  I admit I may be being unfair to him,
but I'm telling it like I see it.  Like I do.

> Even if there was, so what? IF EDB wants to have a secret plan to push a lot
> of cool features to .Org, who cares? In the end, it all has to go through
> peer review and the meritocracy anyway.

I would just like to say that if I or my employer ever get accused of
having a nefarious plan, and somehow I get to pick *which* nefarious
plan I or my employer is to be accused of having, "a secret plan to
push a lot of cool features to .Org" sounds like a good one for me to
pick, especially since, yeah, we have that plan.  We plan to (try to)
push a lot of cool features to .Org.  We - or at least I - do not plan
to do it in a way that is anything but respectful to the community
process.  Specifically, and in no particular order, we plan to
continue contributing performance and scalability enhancements,
improvements to parallel query, and FDW-related improvements, just as
we have for 9.6.  We may also try to contribute other stuff that we
think will be cool and benefit PostgreSQL.  Suggestions are welcome.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: The plan for FDW-based sharding

От
Craig Ringer
Дата:
On 27 February 2016 at 11:54, Robert Haas <robertmhaas@gmail.com> wrote:
 
 
I could submit a patch adding
hooks to core to enable all of the things (or even just some of the
things) that EnterpriseDB has changed in Advanced Server, and that
patch would be rejected so fast it would make your head spin, because
of course the core project doesn't want to be burdened with
maintaining a whole bunch of hooks for the convenience of
EnterpriseDB.

I can imagine that many such hooks would have little use beyond PPAS, but I'm somewhat curious as to if any would have wider applications. It's not unusual for me to be working on something and think "gee, I wish there was a hook here".

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: The plan for FDW-based sharding

От
Craig Ringer
Дата:
On 27 February 2016 at 15:29, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:
 
Two reasons:
1. There is no ideal implementation of DTM which will fit all possible needs and be  efficient for all clusters.
2. Even if such implementation exists, still the right way of it integration is Postgres should use kind of TM API.


I've got to say that this is somewhat reminicient of the discussions around in-core pooling, where argument 1 is applied to justify excluding pooling from core/contrib.

I don't have a strong position on whether a DTM should be in core or not as I haven't done enough work in the area. I do think it's interesting to strongly require that a DTM be in core while we also reject things like pooling that are needed by a large proportion of users.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: The plan for FDW-based sharding

От
Craig Ringer
Дата:
On 28 February 2016 at 06:38, Kevin Grittner <kgrittn@gmail.com> wrote:
 

> For logical replay, applying in batches is actually a good thing since it
> allows parallelism. We can remove them all from the target's procarray all
> at once to avoid intermediate states becoming visible. So that would be the
> preferred mechanism.

That could be part of a solution.  What I sketched out with the
"apparent order of execution" ordering of the transactions
(basically, commit order except when one SERIALIZABLE transaction
needs to be dragged in front of another due to a read-write
dependency) is possibly the simplest approach, but batching may
well give better performance.

I'd be really interested in some ideas on how that information might be usefully accessed. If we could write info on when to apply commits to the xlog in serializable mode that'd be very handy, especially when looking to the future with logical decoding of in-progress transactions, parallel apply, etc.

For parallel apply I anticipated that we'd probably have workers applying xacts in parallel and committing them in upstream commit order. They'd sometimes deadlock with each other; when this happened all workers whose xacts committed after the first aborted xact would have to abort and start again. Not ideal, but safe.

Being able to avoid that by using SSI information was in the back of my mind, but with no idea how to even begin to tackle it. What you've mentioned here is helpful and I'd be interested if you could share a bit more of your experience in the area.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: The plan for FDW-based sharding

От
Craig Ringer
Дата:
On 2 March 2016 at 00:03, Robert Haas <robertmhaas@gmail.com> wrote:
 

True.  There is an API, though, and having pluggable WAL support seems
desirable too.  At the same time, I don't think we know of anyone
maintaining a non-core index AM ... and there are probably good
reasons for that.  We end up revising the index AM API pretty
regularly every time somebody wants to do something new, so it's not
really a stable API that extensions can just tap into.  I suspect that
a transaction manager API would end up similarly situated.

IMO that needs to be true of all hooks into the real innards.

The ProcessUtility_hook API changed a couple of times after introduction and nobody screamed. I think we just have to mark such places as having cross-version API volatility, so you should be prepared to #if PG_VERSION_NUM around them if you use them.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: The plan for FDW-based sharding

От
Craig Ringer
Дата:
On 2 March 2016 at 03:02, Bruce Momjian <bruce@momjian.us> wrote:
On Tue, Mar  1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote:
> Note that I am not saying that other discussed approaches are any
> better, I am saying that we should know approximately what we
> actually want and not just beat FDWs with a hammer and hope sharding
> will eventually emerge and call that the plan.

I will say it again --- FDWs are the only sharding method I can think of
that has a chance of being accepted into Postgres core.  It is a plan,
and if it fails, it fails.  If is succeeds, that's good.  What more do
you want me to say?

That you won't push it too hard if it works, but works badly, and will be prepared to back off on the last steps despite all the lead-up work/time/investment you've put into it.

If FDW-based sharding works, I'm happy enough, I have no horse in this race. If it doesn't work I don't much care either. What I'm worried about is it if works like partitioning using inheritance works - horribly badly, but just well enough that it's served as an effective barrier to doing anything better.

That's what I want to prevent. Sharding that only-just-works and then stops us getting anything better into core.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: The plan for FDW-based sharding

От
Kevin Grittner
Дата:
On Fri, Mar 4, 2016 at 10:10 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 28 February 2016 at 06:38, Kevin Grittner <kgrittn@gmail.com> wrote:

>> What I sketched out with the "apparent order of execution"
>> ordering of the transactions (basically, commit order except
>> when one SERIALIZABLE transaction needs to be dragged in front
>> of another due to a read-write dependency) is possibly the
>> simplest approach, but batching may well give better
>> performance.
>
> I'd be really interested in some ideas on how that information might be
> usefully accessed. If we could write info on when to apply commits to the
> xlog in serializable mode that'd be very handy, especially when looking to
> the future with logical decoding of in-progress transactions, parallel
> apply, etc.

Are you suggesting the possibility of holding off on writing the
commit record for a SERIALIZABLE transaction to WAL until it is
known that no other SERIALIZABLE transaction comes ahead of it in
the apparent order of execution?  If so, that's an interesting idea
that I hadn't given much thought to yet -- I had been assuming
current WAL writes, with adjustments to the timing of application
of the records.

> For parallel apply I anticipated that we'd probably have workers applying
> xacts in parallel and committing them in upstream commit order. They'd
> sometimes deadlock with each other; when this happened all workers whose
> xacts committed after the first aborted xact would have to abort and start
> again. Not ideal, but safe.
>
> Being able to avoid that by using SSI information was in the back of my
> mind, but with no idea how to even begin to tackle it. What you've mentioned
> here is helpful and I'd be interested if you could share a bit more of your
> experience in the area.

My thinking so far has been that reordering the application of
transaction commits on a replica would best be done as the minimal
rearrangement possible from commit order which allows the work of
transactions to become visible in an order consistent with some
one-at-a-time run of those transactions.  Partly that is because
the commit order is something that is fairly obvious to see and is
what most people intuitively look at, even when it is wrong.
Deviating from this intuitive order seems likely to introduce
confusion, even when the results are 100% correct.

The only place you *need* to vary from commit order for correctness
is when there are overlapping SERIALIZABLE transactions, one
modifies data and commits, and another reads the old version of the
data but commits later.  Due to the action of SSI on the source
machine, you know that there could not be any SERIALIZABLE
transaction which saw the inconsistent state between the two
commits, but on replicas we don't yet manage that.  The key is that
there is a read-write dependency (a/k/a rw-conflict) between the
two transactions which tells you that the second to commit has to
come before the first in any graph of apparent order of execution.

The tricky part is that when there are two overlapping SERIALIZABLE
transactions and one of them has modified data and committed, and
there is an overlapping SERIALIZABLE transaction which is not READ
ONLY which has not yet reached completion (COMMIT or ROLLBACK) the
correct ordering remains in doubt -- there is no way to know which
might need to commit first, or whether it even matters.  I am
skeptical about whether in logical replication (including MMR), it
is going to be possible to manage this by finding "safe snapshots".
The only alternative I can see, though, is to suspend replication
while correct transaction ordering remains in doubt.  A big READ
ONLY transaction would not cause a replication stall, but a big
READ WRITE transaction could cause an indefinite stall.  Simon
seemed to be saying that this is unacceptable, but I tend to think
it is a viable approach for some workloads, especially if the READ
ONLY transaction property is used when possible.

There might be some wiggle room in terms of letting
non-SERIALIZABLE transactions commit while the ordering of
SERIALIZABLE transactions remain in doubt, but that would involve
allowing bigger deviations from commit order in transaction
application, which may confuse people.  The argument on the other
side is that if they use transaction isolation less strict than
SERIALIZABLE that they are vulnerable to seeing anomalies anyway,
so they must be OK with that.

Hopefully this is in some way helpful....

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: The plan for FDW-based sharding

От
Peter Geoghegan
Дата:
On Fri, Mar 4, 2016 at 4:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Yeah, I agree with that.  I am utterly mystified by why Bruce keeps
> beating this drum, and am frankly pretty annoyed about it.  In the
> first place, he seems to think that he invented the idea of using FDWs
> for sharding in PostgreSQL, but I don't think that's true.  I think it
> was partly my idea, and partly something that the NTT folks have been
> working on for years (cf, e.g.,
> cb1ca4d800621dcae67ca6c799006de99fa4f0a5).  As far as I understand it,
> Bruce came in near the end of that conversation and now wants to claim
> credit for something that doesn't really exist yet and, to the extent
> that it does exist, wasn't even his idea.

I think that it's easy to have the same idea as someone else
independently. I've had that happen several times myself; ideas that
other people had that I felt I could have easily had myself, or did in
fact have. Most of the ideas that I have are fairly heavily based on
known techniques. I don't think that I've ever creating a PostgreSQL
feature that was in some way truly original, except perhaps for some
aspects of how UPSERT works.

Who cares whose idea FDW sharding was? It matters not a whit. It
probably independently occurred to several people that the FDW
interface could be built to support horizontal sharding more directly.
The idea almost suggests itself.

> EnterpriseDB *does* have a plan to try to continue enhancing foreign
> data wrappers so that you can run queries against foreign tables and
> get reasonable plans, something that currently isn't true.  I haven't
> heard anybody objecting to that, and I don't expect to hear anybody
> objecting to that, because it's hard to imagine why you wouldn't want
> queries against foreign data wrappers to produce better plans than
> they do today.  At worst, you might think it doesn't matter either
> way, but actually, I think there are a substantial number of people
> who are pretty happy about join pushdown and I expect that when and if
> we get aggregate pushdown working there will be even more people who
> are happy about that.

I think that that's Bruce's point, to a large degree.

>> Alternately, you can just work on the individual FDW features, which
>> *everyone* thinks are a good idea, and when most of them are done, FDW-based
>> scaleout will be such an obvious solution that nobody will argue with it.
>
> That's exactly what the people at EnterpriseDB who are actually doing
> work in this area are attempting to do.  Meanwhile, there's also
> Bruce, who is neither doing nor planning to do any work in this area,
> nor advising either EnterpriseDB or the PostgreSQL community to
> undertake any particular project, but who *is* making it sound like
> there is a super sekret plan that nobody else gets to see.

Is he? I didn't get that impression.

I think Bruce is trying to facilitate discussion, which can sometimes
require being a bit provocative. I think you're being quite unfair,
and mischaracterizing his words. I've heard Bruce talk about
horizontal scaling on several occasions, including at a talk in San
Francisco about a year ago, and I just thought it was Bruce being
Bruce -- primarily, a facilitator. I think that he is not especially
motivated by taking credit either here or in general, and not at all
by taking credit for other people's work.

It's not hard to get agreement about something abstract, like the
general idea of a distributed transaction manager. I fear that any
particular detailed interpretation of what that phrase means will be
very hard to get accepted into PostgreSQL.

-- 
Peter Geoghegan



Re: The plan for FDW-based sharding

От
Thom Brown
Дата:
<p dir="ltr">On 6 Mar 2016 8:27 p.m., "Peter Geoghegan" <<a href="mailto:pg@heroku.com">pg@heroku.com</a>>
wrote:<br/> ><br /> > On Fri, Mar 4, 2016 at 4:41 PM, Robert Haas <<a
href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>>wrote:<br /> > > Yeah, I agree with that.  I am
utterlymystified by why Bruce keeps<br /> > > beating this drum, and am frankly pretty annoyed about it.  In
the<br/> > > first place, he seems to think that he invented the idea of using FDWs<br /> > > for sharding
inPostgreSQL, but I don't think that's true.  I think it<br /> > > was partly my idea, and partly something that
theNTT folks have been<br /> > > working on for years (cf, e.g.,<br /> > >
cb1ca4d800621dcae67ca6c799006de99fa4f0a5). As far as I understand it,<br /> > > Bruce came in near the end of
thatconversation and now wants to claim<br /> > > credit for something that doesn't really exist yet and, to the
extent<br/> > > that it does exist, wasn't even his idea.<br /> ><br /> > I think that it's easy to have
thesame idea as someone else<br /> > independently. I've had that happen several times myself; ideas that<br /> >
otherpeople had that I felt I could have easily had myself, or did in<br /> > fact have. Most of the ideas that I
haveare fairly heavily based on<br /> > known techniques. I don't think that I've ever creating a PostgreSQL<br />
>feature that was in some way truly original, except perhaps for some<br /> > aspects of how UPSERT works.<p
dir="ltr">Everythingis a remix.<p dir="ltr">Thom 

Re: The plan for FDW-based sharding

От
Robert Haas
Дата:
On Fri, Mar 4, 2016 at 10:23 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> I can imagine that many such hooks would have little use beyond PPAS, but
> I'm somewhat curious as to if any would have wider applications. It's not
> unusual for me to be working on something and think "gee, I wish there was a
> hook here".

Well, on the whole, we've adopted an approach of "hack core and
merge", so to some extent you have to use your imagination to think
about what it would look like if it were all done using hooks.  But
we've also actually added hooks to Advanced Server in some places
where PostgreSQL doesn't have them, and it's not hard to imagine that
somebody else might find those useful, at least.  Whether they'd be
useful enough that the PostgreSQL community would accept them if
EnterpriseDB were to approve open-sourcing them is another
question....

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: The plan for FDW-based sharding

От
Robert Haas
Дата:
On Fri, Mar 4, 2016 at 10:54 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> I've got to say that this is somewhat reminicient of the discussions around
> in-core pooling, where argument 1 is applied to justify excluding pooling
> from core/contrib.
>
> I don't have a strong position on whether a DTM should be in core or not as
> I haven't done enough work in the area. I do think it's interesting to
> strongly require that a DTM be in core while we also reject things like
> pooling that are needed by a large proportion of users.

I don't remember this discussion, but I don't think I feel differently
about either of these two issues.  I'm not opposed to having some
hooks in core to make it easier to build a DTM, but I'm not convinced
that these hooks are the right hooks or that the design underlying
those hooks is correct.  And, eventually, I would like to see a DTM in
core or contrib so that it can be accessible to everyone relatively
easily.  Now, on connection pooling, I am similarly not opposed to
having some well-designed hooks, but I also think in the long run it
would be better for some improvements in this area to be part of core.
None of that means I would support any particular hook proposal, of
course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: The plan for FDW-based sharding

От
Konstantin Knizhnik
Дата:
On 03/07/2016 04:28 AM, Robert Haas wrote:
> On Fri, Mar 4, 2016 at 10:54 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> I've got to say that this is somewhat reminicient of the discussions around
>> in-core pooling, where argument 1 is applied to justify excluding pooling
>> from core/contrib.
>>
>> I don't have a strong position on whether a DTM should be in core or not as
>> I haven't done enough work in the area. I do think it's interesting to
>> strongly require that a DTM be in core while we also reject things like
>> pooling that are needed by a large proportion of users.
> I don't remember this discussion, but I don't think I feel differently
> about either of these two issues.  I'm not opposed to having some
> hooks in core to make it easier to build a DTM, but I'm not convinced
> that these hooks are the right hooks or that the design underlying
> those hooks is correct.
What can I try to convince you that design of XTM API is correct?
I already wrote that we have not introduced some new abstractions.
What we have done is just encapsulate some existed Postgres functions.
The main reason was that we tried to minimize changes in Postgres core.
If seems to betempting if we can provide enough level of flexibility without rewriting core, isn't it?

What does it mean "enough level of flexibility"? We are interested in implementation of DTM, so if XTM API allows to do
itfor several considered approaches,
 
then it is "flexible enough".

So do you agree than before rewriting/refactoring xact.c/transam.c/procarray.c it is better first to try introduce XTM
overexisted code?
 
And if we find out that some useful functionality is missed and can not be overrden through this API in convenient and
efficientway,
 
without copying substantial peaces of code, then only in this case we should consider refactoring of core transaction
processingcode to make it more modular and tunable.
 

If you agree with this statement, then next question is which set of functions needs to be overridden by XTM.
PostgreSQL transaction manager has many different functions, some of them are doing almost the same things, but in
differentway.
 
For example consider TransactionIdIsInProgress,TransactionIdIsKnownCompleted, TransactionIdDidCommit,
TransactionIdDidAbort,TransactionIdGetStatus.
 
Some of them are accessing clog, some - procarray, some just check cached value. And so them are scattered through
differentPostgres modules.
 

So which of them has to be included in XTM API?
We have investigated code and usage of all this functions.
We found out that TransactionIdDidCommit is always called by visibility check after TransactionIdIsInProgress.
And it is in turn using TransactionIdGetStatus to extract information about transaction from clog.
So we have included in XTM TransactionIdIsInProgress and TransactionIdGetStatus, but not
TransactionIdDidCommit,TransactionIdDidAbortand TransactionIdIsKnownCompleted.
 

Similar story is with other functions. For example: transaction commit.
There are once again a bundle of functions: CommitTransactionCommand, CommitTransaction, CommitSubTransaction,
RecordTransactionCommit,TransactionIdSetTreeStatus.
 
CommitTransactionCommand - is function from public API. It is initiating switch of state of Postgres TM finite state
automaton.
We do not want to affect logic of this automaton: it is the same for DTM and local TM. So we are looking deeper.
CommitTransaction/CommitSubTransaction are called by this FSM. We also do not want to change logic of processing
subtransactions.
One more step deeper. So we arrive at TransactionIdSetTreeStatus. And this is why it is included in XTM.

Another example is tuple visibility check. There is a family of HeapTupleSatisfies* functions in  utils/time/tqual.c
(IMHO:very strange place for one of the core Postgres submodule:)
 
Should we override all of them? No, because them are mostly based on few other functions, such as
TransactionIdIsInProgress,TransactionIdIsInProgress, XidInMVCCSnapshot...
 
As far as we do not want to change heap tuple format, we leave all manipulations with tuple status bits as it is and
redefineonly XidInMVCCSnapshot() function.
 

So, I can provide arguments for all functions included in XTM: why it was included in this API and why some other
relatedfunctions were not included.
 
But I can not provide that is a necessary and sufficient subset of function.
I do not see big problems in extending and refactoring this API in future. Postgres lives for a years a without custom
TMsand  I do not expect that if presence of XTM API will cause development of many different TMs. Most likely very few
peopleor 
 
companies will try to develop their TMs.  So compatibility will not be a buig issue here.


> And, eventually, I would like to see a DTM in
> core or contrib so that it can be accessible to everyone relatively
> easily.

So am I. But before including something in core, it will be best to test it for many different scenarios.
It is especially true for DTM, because requirements of various cluster solution are very different.
And the most convenient way of doing it is to ship DTM as extension, not as some fork of Postgres.
It will greatly simplify using it.


> Now, on connection pooling, I am similarly not opposed to
> having some well-designed hooks, but I also think in the long run it
> would be better for some improvements in this area to be part of core.
> None of that means I would support any particular hook proposal, of
> course.
>


-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: The plan for FDW-based sharding

От
Craig Ringer
Дата:
On 5 March 2016 at 23:41, Kevin Grittner <kgrittn@gmail.com> wrote:

> I'd be really interested in some ideas on how that information might be
> usefully accessed. If we could write info on when to apply commits to the
> xlog in serializable mode that'd be very handy, especially when looking to
> the future with logical decoding of in-progress transactions, parallel
> apply, etc.

Are you suggesting the possibility of holding off on writing the
commit record for a SERIALIZABLE transaction to WAL until it is
known that no other SERIALIZABLE transaction comes ahead of it in
the apparent order of execution?  If so, that's an interesting idea
that I hadn't given much thought to yet -- I had been assuming
current WAL writes, with adjustments to the timing of application
of the records.

I wasn't, I simply wrote less than clearly. I intended to say "from the xlog" where I wrote "to the xlog". Nonetheless, that'd be a completely unrelated but interesting thing to explore...
 
> For parallel apply I anticipated that we'd probably have workers applying
> xacts in parallel and committing them in upstream commit order. They'd
> sometimes deadlock with each other; when this happened all workers whose
> xacts committed after the first aborted xact would have to abort and start
> again. Not ideal, but safe.
>
> Being able to avoid that by using SSI information was in the back of my
> mind, but with no idea how to even begin to tackle it. What you've mentioned
> here is helpful and I'd be interested if you could share a bit more of your
> experience in the area.

My thinking so far has been that reordering the application of
transaction commits on a replica would best be done as the minimal
rearrangement possible from commit order which allows the work of
transactions to become visible in an order consistent with some
one-at-a-time run of those transactions.  Partly that is because
the commit order is something that is fairly obvious to see and is
what most people intuitively look at, even when it is wrong.
Deviating from this intuitive order seems likely to introduce
confusion, even when the results are 100% correct.

The only place you *need* to vary from commit order for correctness
is when there are overlapping SERIALIZABLE transactions, one
modifies data and commits, and another reads the old version of the
data but commits later.

Ah, right. So here, even though X1 commits before X2 running concurrently under SSI, the logical order in which the xacts could've occurred serially is that where xact 2 runs and commits before X1, since xact 2 doesn't depend on xact 1. X2 read the old row version before xact 1 modified it, and logically occurs before xact1 in the serial rearrangement.

I don't fully grasp how that can lead to a situation where xacts can commit in an order that's valid upstream but not valid as a downstream apply order. I presume we're looking at read-only logical replicas here (rather than multimaster), and it's only a concern for SERIALIZABLE xacts since a READ COMMITTED xact on the master and replica would both be able to see the state where X1 is commited but X2 isn't yet. But I don't see how a read-only xact in SERIALIZABLE on the replica can get different results to what it'd get with SSI on the master. It's entirely possible for a read xact on the master to get a snapshot after X1 commits and after X2 commits, same as READ COMMITTED. SSI shouldn't AFAIK come into play with no writes to create a pivot. Is that wrong?

If we applied this sequence to the downstream in commit order we'd still get correct results on the heap after applying both. We'd have an intermediate state where X1 is commited but X2 isn't, but we can have the same on the master. SSI doesn't AFAIK mask X1 from becoming visible in a snapshot until X2 commits or anything, right?
 
  Due to the action of SSI on the source
machine, you know that there could not be any SERIALIZABLE
transaction which saw the inconsistent state between the two
commits, but on replicas we don't yet manage that.

OK, maybe that's what I'm missing. How exactly does SSI ensure that? (A RTFM link / hint is fine, but I didn't find it in the SSI section of TFM at least in a way I recognised).

The key is that
there is a read-write dependency (a/k/a rw-conflict) between the
two transactions which tells you that the second to commit has to
come before the first in any graph of apparent order of execution.

Yeah, I get that part. How does that stop a 3rd SERIALIZABLE xact from getting a snapshot between the two commits and reading from there?
 
The tricky part is that when there are two overlapping SERIALIZABLE
transactions and one of them has modified data and committed, and
there is an overlapping SERIALIZABLE transaction which is not READ
ONLY which has not yet reached completion (COMMIT or ROLLBACK) the
correct ordering remains in doubt -- there is no way to know which
might need to commit first, or whether it even matters.  I am
skeptical about whether in logical replication (including MMR), it
is going to be possible to manage this by finding "safe snapshots".
The only alternative I can see, though, is to suspend replication
while correct transaction ordering remains in doubt.  A big READ
ONLY transaction would not cause a replication stall, but a big
READ WRITE transaction could cause an indefinite stall.  Simon
seemed to be saying that this is unacceptable, but I tend to think
it is a viable approach for some workloads, especially if the READ
ONLY transaction property is used when possible.

We already have huge replication stalls when big write xacts occur. We don't start sending any data for the xact to a peer until it commits, and once we start we don't send any other xact data until that xact is received (and probably applied) by the peer.

I'd like to address that by introducing xact streaming / interleaved xacts, where we stream big xacts on the wire as they occur and buffer them on the peer, possibly speculatively applying them too. This requires that individual row changes be tagged with subxact IDs and that subxact-to-top-level-xact mapping info be sent, so the peer can accumulate the right xacts into the right buffers. Basically offloading reorder buffering to the peer.

That same mechanism would let replication continue while logical serializable commit-order is in-doubt, blocking only the actual commit from proceeding, and only on those xacts. I think.

That said I'm still clearly more fuzzy about the details of what SSI does, what it guarantees and how it works than I thought I was, so I may just be handwaving pointlessly at this point. I'd better read some code...

There might be some wiggle room in terms of letting
non-SERIALIZABLE transactions commit while the ordering of
SERIALIZABLE transactions remain in doubt, but that would involve
allowing bigger deviations from commit order in transaction
application, which may confuse people.  The argument on the other
side is that if they use transaction isolation less strict than
SERIALIZABLE that they are vulnerable to seeing anomalies anyway,
so they must be OK with that.

Yeah. I'd be inclined to do just that, and with that argument.

 
Hopefully this is in some way helpful....
 
Very, thankyou.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: The plan for FDW-based sharding

От
Robert Haas
Дата:
On Fri, Mar 4, 2016 at 11:17 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> If FDW-based sharding works, I'm happy enough, I have no horse in this race.
> If it doesn't work I don't much care either. What I'm worried about is it if
> works like partitioning using inheritance works - horribly badly, but just
> well enough that it's served as an effective barrier to doing anything
> better.
>
> That's what I want to prevent. Sharding that only-just-works and then stops
> us getting anything better into core.

That's a reasonable worry.  Thanks for articulating it so clearly.
I've thought about that issue and I admit it's both real and serious,
but I've sort of taken the attitude of saying, well, I don't know how
to solve that problem, but there's so much other important work that
needs to be done before we get to the point where that's the blocker
that solving that problem doesn't seem like the most important thing
right now.

The sharding discussion we had in Vienna convinced me that, in the
long run, having PostgreSQL servers talk to other PostgreSQL servers
only using SQL is not going to be a winner.  I believe Postgres-XL has
already done something about that; I think it is passing plans around
directly.  So you could look at that and say - ha, the FDW approach is
a dead end!  But from my point of view, the important thing about the
FDW interface is that it provides a pluggable interface to the
planner.  We can now push down joins and sorts; hopefully soon we will
be able to push down aggregates and limits and so on.  That's the hard
part.  The deparsing code that turns the plan we want to execute in to
an SQL query that can be shipped over the wire is a detail.
Serializing some other on-the-wire representation of what we want the
remote side to do is small potatoes compared to having all of the
logic that lets you decide, in the first instance, what you want the
remote side to do.  I can imagine, in the long term, adding a new
sub-protocol (probably mediated via COPY BOTH) that uses a different
and more expressive on-the-wire representation.

Another foreseeable problem with the FDW approach is that you might
want to have a hash-partitioned table where there are multiple copies
of each piece data and they are spread out across the shards and you
can add and remove shards and the data automatically rebalances.
Table inheritance (or table partitioning) + postgres_fdw doesn't sound
so great in this situation because when you rebalance you need to
change the partitioning constraints and that requires a full table
lock on every node and the whole thing seems likely to end up being
somewhat annoyingly manual and overly constrained by locking.  But I'd
say two things about that.  The first is that I honestly think that
this would be a pretty nice problem to have.  If we had things working
well enough that this was the kind of problem we were trying to
tackle, we'd be light-years ahead of where we are today.  Sure,
everybody hates table inheritance, but I don't think it's right to say
that partitioning work is blocked because table inheritance exists: I
think the problem is that getting true table partitioning correct is
*hard*.  And Amit Langote is working on that and hopefully we will get
there, but it's not an easy problem.  I don't think sharding is an
easy problem either, and I think getting to a point where ease-of-use
is our big limiting factor would actually be better than the current
scenario where "it doesn't work at all" is the limiting factor.  I
don't want that to *block* other approaches, BUT I also think that
anybody who tries to start over from scratch and ignore all the good
work that has been done in FDW-land is not going to have a very fun
time.

The second thing I want to say about this problem is that I don't want
to presume that it's not a *solvable* problem.  Just because we use
the FDW technology as a base doesn't mean we can't invent new and
quite different stuff along the way.  One idea I've been toying with
is trying to create some notion of a "distributed" table.  This would
be a new relkind.  You'd have a single relation at the SQL level, not
an inheritance hierarchy, but under the hood the data would be spread
across a bunch of remote servers using the FDW interface.  So then you
reuse all of the query planner work and other enhancements that have
been put into the FDW stuff, but you'd present a much cleaner user
interface.  Or, maybe better, you could create a new FDW,
sharding_fdw, that works like postgres_fdw except that instead of
putting the data on one particular foreign server, it spreads the data
out across multiple servers and manages the sharding process under the
hood.  That would, again, let you reuse a lot of the work that's been
done to improve the FDW infrastructure while creating something
significantly more powerful than what postgres_fdw is today.  I don't
know, I don't have any ideas about this.  I think your concern is
valid, and I share it.  But I just fundamentally believe that it's
better to enhance what we have than to start inventing totally new
abstractions.  The FDW API is *really* powerful, and getting more
powerful, and I just have a very hard time believing that starting
over will be better.  Somebody can do that if they like and I'm not
gonna get in the way, but if it's got problems that could have been
avoided by basing that same work on the FDW stuff we've already got, I
do plan to point that out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: The plan for FDW-based sharding

От
Kevin Grittner
Дата:
On Mon, Mar 7, 2016 at 6:13 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 5 March 2016 at 23:41, Kevin Grittner <kgrittn@gmail.com> wrote:

>> The only place you *need* to vary from commit order for correctness
>> is when there are overlapping SERIALIZABLE transactions, one
>> modifies data and commits, and another reads the old version of the
>> data but commits later.
>
> Ah, right. So here, even though X1 commits before X2 running concurrently
> under SSI, the logical order in which the xacts could've occurred serially
> is that where xact 2 runs and commits before X1, since xact 2 doesn't depend
> on xact 1. X2 read the old row version before xact 1 modified it, and
> logically occurs before xact1 in the serial rearrangement.

Right, because X2 is *seeing* data in a state that existed before X1 ran.

> I don't fully grasp how that can lead to a situation where xacts can commit
> in an order that's valid upstream but not valid as a downstream apply order.

With SSI, it can matter whether an intermediate state is *read*.

> I presume we're looking at read-only logical replicas here (rather than
> multimaster),

I have not worked out how this works with MMR.  I'm not sure that
there is one clear answer to that.

> and it's only a concern for SERIALIZABLE xacts since a READ
> COMMITTED xact on the master and replica would both be able to see the state
> where X1 is commited but X2 isn't yet.

REPEATABLE READ would allow the anomaly to be seen, too, if a
transaction acquired its snapshot between the two commits.

> But I don't see how a read-only xact
> in SERIALIZABLE on the replica can get different results to what it'd get
> with SSI on the master. It's entirely possible for a read xact on the master
> to get a snapshot after X1 commits and after X2 commits, same as READ
> COMMITTED. SSI shouldn't AFAIK come into play with no writes to create a
> pivot. Is that wrong?

As mentioned earlier in this thread, look at the examples in this
section of the Wiki page, and imagine that the READ ONLY
transaction involved did *not* run on the primary, but *did* run on
the replica:

https://wiki.postgresql.org/wiki/SSI#Read_Only_Transactions

> If we applied this sequence to the downstream in commit order we'd still get
> correct results on the heap after applying both.

... eventually.

> We'd have an intermediate
> state where X1 is commited but X2 isn't, but we can have the same on the
> master. SSI doesn't AFAIK mask X1 from becoming visible in a snapshot until
> X2 commits or anything, right?

If that intermediate state is *seen* on the master, a transaction
is rolled back.

>> The key is that
>> there is a read-write dependency (a/k/a rw-conflict) between the
>> two transactions which tells you that the second to commit has to
>> come before the first in any graph of apparent order of execution.
>
> Yeah, I get that part. How does that stop a 3rd SERIALIZABLE xact from
> getting a snapshot between the two commits and reading from there?

Serializable Snapshot Isolation doesn't generally block anything
that REPEATABLE READ (which is straight Snapshot Isolation) doesn't
block -- unless you explicitly request READ ONLY DEFERRABLE.  What
is does is monitor for situations that can present anomalies and
rolls back transactions as necessary to prevent anomalies in
successfully committed transactions.  We tried very hard to avoid
rolling back a transaction that could fail a second time on
conflict the same set of transactions, although there were some
corner cases where it could not be avoided when a transaction was
PREPARED and not yet committed.  Another possibly useful fact is
that we were able to guarantee that whenever there was a rollback,
some SERIALIZABLE transaction which overlaps the one being rolled
back has modified data and successfully committed -- ensuring that
there is some forward progress even in worst case situations.

>> The tricky part is that when there are two overlapping SERIALIZABLE
>> transactions and one of them has modified data and committed, and
>> there is an overlapping SERIALIZABLE transaction which is not READ
>> ONLY which has not yet reached completion (COMMIT or ROLLBACK) the
>> correct ordering remains in doubt -- there is no way to know which
>> might need to commit first, or whether it even matters.  I am
>> skeptical about whether in logical replication (including MMR), it
>> is going to be possible to manage this by finding "safe snapshots".
>> The only alternative I can see, though, is to suspend replication
>> while correct transaction ordering remains in doubt.  A big READ
>> ONLY transaction would not cause a replication stall, but a big
>> READ WRITE transaction could cause an indefinite stall.  Simon
>> seemed to be saying that this is unacceptable, but I tend to think
>> it is a viable approach for some workloads, especially if the READ
>> ONLY transaction property is used when possible.
>
> We already have huge replication stalls when big write xacts occur. We don't
> start sending any data for the xact to a peer until it commits, and once we
> start we don't send any other xact data until that xact is received (and
> probably applied) by the peer.
>
> I'd like to address that by introducing xact streaming / interleaved xacts,
> where we stream big xacts on the wire as they occur and buffer them on the
> peer, possibly speculatively applying them too. This requires that
> individual row changes be tagged with subxact IDs and that
> subxact-to-top-level-xact mapping info be sent, so the peer can accumulate
> the right xacts into the right buffers. Basically offloading reorder
> buffering to the peer.
>
> That same mechanism would let replication continue while logical
> serializable commit-order is in-doubt, blocking only the actual commit from
> proceeding, and only on those xacts. I think.

That makes sense to me.

> That said I'm still clearly more fuzzy about the details of what SSI does,
> what it guarantees and how it works than I thought I was, so I may just be
> handwaving pointlessly at this point. I'd better read some code...

You might want to also review the paper presented at the VLDB
conference:

http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf

Really I think the key is to consider it a monitor on top of the
Snapshot Isolation of REPEATABLE READ, which looks for patterns in
read-write dependencies and transaction boundaries (as the points
where snapshots are acquired and commits successfully complete)
that will cancel transactions as necessary to prevent anomalies.
The patterns that are used were recognized over the course of many
years of research into the topic by groups at MIT, Sidney, and
others.  Dan and I managed to extend the theory with respect to
READ ONLY transactions in a way that was reviewed by some of the
prior researchers and stood up to peer review at the VLDB
conference.  Getting your head around all the conditions involved
in making an anomaly possible is a bit of work, but it is all well
grounded in both theory and practical research.

I will admit that getting your head around the internal workings of
SSI is one or two orders of magnitude more work than getting your
head around S2PL.  The bright side is that end users don't need to
do that to be able to *use* it effectively.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: The plan for FDW-based sharding

От
Craig Ringer
Дата:
On 7 March 2016 at 23:02, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 4, 2016 at 11:17 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> If FDW-based sharding works, I'm happy enough, I have no horse in this race.
> If it doesn't work I don't much care either. What I'm worried about is it if
> works like partitioning using inheritance works - horribly badly, but just
> well enough that it's served as an effective barrier to doing anything
> better.
>
> That's what I want to prevent. Sharding that only-just-works and then stops
> us getting anything better into core.

That's a reasonable worry.  Thanks for articulating it so clearly.
I've thought about that issue and I admit it's both real and serious,
but I've sort of taken the attitude of saying, well, I don't know how
to solve that problem, but there's so much other important work that
needs to be done before we get to the point where that's the blocker
that solving that problem doesn't seem like the most important thing
right now.

[snip explanation]
 
I think your concern is
valid, and I share it.  But I just fundamentally believe that it's
better to enhance what we have than to start inventing totally new
abstractions.  The FDW API is *really* powerful, and getting more
powerful, and I just have a very hard time believing that starting
over will be better.  Somebody can do that if they like and I'm not
gonna get in the way, but if it's got problems that could have been
avoided by basing that same work on the FDW stuff we've already got, I
do plan to point that out.

Yep. As has been noted, each of these improvements is useful in their own right, and I'm not sure anyone's against them, just 
concerned about whether the overall vision for sharding will work out.

Personally I think that once the FDW infrastructure is closer to being usable for sharding, when we're at the point where new patches are proposed that're really specifically for sharding and not so general-use FDW improvements, that's when it'd be well worth building a proof of concept sharding implementation. Find unexpected wrinkles and issues before starting to stream stuff into core that can't be easily removed again. That was certainly useful when building BDR, and even then we still found lots of things that required revision, often repeatedly.

Either that, or bless experimental features/API as an official concept. I'd quite like that myself - stuff that's in Pg, but documented as "might change or go away in the next release, experimental feature". As we're doing more stuff that spans multiple release cycles, where patches in a prior cycle might need revision based on what we learn in a later one, we might need more freedom to change things that're committed and user visible.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: The plan for FDW-based sharding

От
Oleg Bartunov
Дата:


On Tue, Mar 8, 2016 at 6:40 AM, Craig Ringer <craig@2ndquadrant.com> wrote:

 
Either that, or bless experimental features/API as an official concept. I'd quite like that myself - stuff that's in Pg, but documented as "might change or go away in the next release, experimental feature". As we're doing more stuff that spans multiple release cycles, where patches in a prior cycle might need revision based on what we learn in a later one, we might need more freedom to change things that're committed and user visible.


+1
 
--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: The plan for FDW-based sharding

От
Bruce Momjian
Дата:
I have read the recent comments on this thread with great interest.  I
am glad people have expressed their concerns, rather than remain silent.
Now that the responses have decreased, I can reply.

I saw several concerns:

1.  My motivation for starting this thread was to decrease interest in
external sharding solutions.

2.  No prototype was produced.

3.  More work needs to be done to encourage others to be involved.

4.  An FDW-based sharding solution will only work for some workloads,
decreasing interest in a more general solution.

5.  I started this thread to take credit for the idea or feature.

Let me reply to each item as briefly as I can:

1.  I said good things about external sharding solutions in the email,
so it is hard to logically argue that the _intent_ was to reduce
interest in them.  I will admit that that might be the short-term
effect.

2.  We have not produced a prototype because we don't really need to
make any decision yet on viability.  We already need to improve FDW
pushdown, partitioning syntax, and perhaps a global transaction/snapshot
manger with or without sharding, so we might as well just make those
improvements, and then producing a prototype will be much easier and
more representative.

3.  I have tried to encourage others to get involved, with limited
success.  I do think the FDW is perhaps the only reasonable way to get
_built-in_ sharding.  The external sharding solutions are certainly
viable, but external.  It is possible we will make all the FDW
improvements, find out it doesn't work, but find out the improvements
allow us to go in another direction.

4.  Hard to argue with #4.  We got partitioning working with a complex
API that has not improved much over the years.  I think this will be
cleaned up with the FDW-sharding work, and it would be a shame to create
another partial solution (FDW sharding) out of that work.

5.  See below on why I talk about these things.

There seems to be serious interest in how this idea came about, so let
me say what I remember.  It is very possible others came to the same
conclusions independently, and earlier.  I think I first heard it form
Korry Douglas in an EDB-internal discussion.  I then heard it from Josh
Berkus or we discussed it at a conference.  That got me thinking, and
then an EDB customer talked about the need for multi-node write scaling,
and I realized that only sharding could do that.  (The data warehouse
use of sharding was already clear to me.)  I then understood the wisdom
of Postgres XC, which NTT worked on for perhaps a decade.  (I just left
their offices here in Tokyo.)  I discussed the FDW-sharding idea
internally inside EDB, and then mentioned it during a visit to NTT in
July, 2014.  I wrote and blogged about a new sharding presentation I
wrote in February, 2015
(http://momjian.us/main/blogs/pgblog/2015.html#February_1_2015).  I
presented the talk in three locations in 2015.

The reason I talk about these things (#5) is because I am trying to
encourage people to work on them, and I want to communicate to our users
that we realize sharding is important for certain workloads and that we
are attempting a built-in solution.  Frankly, I don't think many users
need sharding, but many users want to know it is available, so I think
it is important to talk about it.

As for why there is so much hostility, I think this is typical for any
ill-defined feature development.  There was simmering hostility to the
Windows port and pg_upgrade for many years because those projects were
not easy to define and risky, and had few active developers.  The
agreement was that work could continue as long as destabilization wasn't
introduced.  Ideally everything would have a well-defined plan, it is
sometimes hard to do.  Similar to our approach on parallelism (which is
also super-important and doesn't many active developers), sometimes you
just need to create infrastructure and see how well it solves problems.

The weird thing is that if you do implement an ill-defined feature,
there really isn't much huge positive feedback ---  people just use the
feature, and the complaints stop.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +



Re: The plan for FDW-based sharding

От
Craig Ringer
Дата:
On 11 March 2016 at 16:09, Bruce Momjian <bruce@momjian.us> wrote:
 
 
Ideally everything would have a well-defined plan, it is
sometimes hard to do.

BDR helped for logical decoding etc - having something concrete really helped shape and guide each part of it as it was (or is/will be, in some cases) migrated from BDR to core.

That said, it was necessary because for many of the things it needs there weren't really good, isolated improvements to make with obvious utility for other projects. Sure, commit timestamps are handy, replication origins will be handy, etc. They can be used by other projects and will be. Some are already. But unlike the FDW enhancements they're not things that will be used simply by being present without even requiring any special user action, so they had an understandably higher barrier to cross for acceptance.

Once you get to the point where you're not making FDW improvements that help a broad set of users and start doing things that'll really only aid some hypothetical sharding system that also requires other infrastructure changes, hooks, etc ... that's when I think it's going to be proof-of-concept prototype time.

Similar to our approach on parallelism (which is
also super-important and doesn't many active developers), sometimes you
just need to create infrastructure and see how well it solves problems.

Yep. Again, like BDR and logical decoding. We've had quite a lot of surprises as we find unexpected corner cases and challenges over time. Andres's original work on logical decoding went through a number of significant revisions as more was learned about the problem to solve. Sometimes you can only do that by actually building it. Logical decoding as it stands in core is only partway through that evolution as it is - I think we now have a good understanding of why logical decoding of prepared xacts, streaming of in-progress xacts etc will be needed down the track, but it would've been hard to come up with that at the start when we didn't have experience using what we've already got.
 
The weird thing is that if you do implement an ill-defined feature,
there really isn't much huge positive feedback ---  people just use the
feature, and the complaints stop.

... eventually.

Sometimes the bug reports start. Occasionally you get a "thanks, this looks interesting/handy". But usually just bug reports or complaints that whatever you built isn't good enough to meet some random person's particular use case. Ah well. 

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: The plan for FDW-based sharding

От
Bruce Momjian
Дата:
On Fri, Mar 11, 2016 at 04:30:13PM +0800, Craig Ringer wrote:
> ... eventually.
> 
> Sometimes the bug reports start. Occasionally you get a "thanks, this looks
> interesting/handy". But usually just bug reports or complaints that whatever
> you built isn't good enough to meet some random person's particular use case.
> Ah well. 

As they say, if this was easy, everyone would be doing it.  ;-)

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +



Re: The plan for FDW-based sharding

От
Oleg Bartunov
Дата:


On Fri, Mar 11, 2016 at 9:09 AM, Bruce Momjian <bruce@momjian.us> wrote:



3.  I have tried to encourage others to get involved, with limited
success.  I do think the FDW is perhaps the only reasonable way to get
_built-in_ sharding.  The external sharding solutions are certainly
viable, but external.  It is possible we will make all the FDW
improvements, find out it doesn't work, but find out the improvements
allow us to go in another direction.

I remember last summer emails and we really wanted to participate in development, but it happens all slots were occupied by edb and ntt people. We wanted to work on distributed transactions and proposed our XTM.  Our feeling that time from discussion was that we were invited, but all doors were closed. It was very bad experience. Hopefully, we understand our misunderstanding.
 

There seems to be serious interest in how this idea came about, so let
me say what I remember.

I think the idea was so obvious, so let's don't discuss this.
 

As for why there is so much hostility, I think this is typical for any
ill-defined feature development.  There was simmering hostility to the
Windows port and pg_upgrade for many years because those projects were
not easy to define and risky, and had few active developers.  The
agreement was that work could continue as long as destabilization wasn't
introduced.  Ideally everything would have a well-defined plan, it is
sometimes hard to do.  Similar to our approach on parallelism (which is
also super-important and doesn't many active developers), sometimes you
just need to create infrastructure and see how well it solves problems.



Our XTM is the yet another example of infrastructure we need to work on clustering. Should we wait other smart guy starts thinking on distributed transactions ?  We described in https://wiki.postgresql.org/wiki/DTM our  API, which is just a wrapper on existed functions, but it will allow us and fortunately others to play with their ideas.  We did several prototypes, including FDW, to demonstrate viability of API, and plan to continue our work on built-in high availability, multi-master.  Of course, there will be a lot to learn, but it will be much easier if XTM will exists not as separate patch, which is really small.
 

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: The plan for FDW-based sharding

От
Bruce Momjian
Дата:
On Fri, Mar 11, 2016 at 10:19:16AM +0100, Oleg Bartunov wrote:
> Our XTM is the yet another example of infrastructure we need to work on
> clustering. Should we wait other smart guy starts thinking on distributed
> transactions ?  We described in https://wiki.postgresql.org/wiki/DTM our  API,
> which is just a wrapper on existed functions, but it will allow us and
> fortunately others to play with their ideas.  We did several prototypes,
> including FDW, to demonstrate viability of API, and plan to continue our work
> on built-in high availability, multi-master.  Of course, there will be a lot to
> learn, but it will be much easier if XTM will exists not as separate patch,
> which is really small.

I think everyone agrees we want a global transaction manager of some
type.  I think choosing the one we want is the problem as there are
several possible directions.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +