Обсуждение: BDR Selective Replication

Поиск
Список
Период
Сортировка

BDR Selective Replication

От
swaxolez
Дата:
It's not clear to me but is selective replication working in BDR?  Does
anyone have any examples if so?

Thanks





--
View this message in context: http://postgresql.nabble.com/BDR-Selective-Replication-tp5846864.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: BDR Selective Replication

От
Craig Ringer
Дата:


On 26 April 2015 at 10:05, swaxolez <willem@pcfish.ca> wrote:
It's not clear to me but is selective replication working in BDR?  Does
anyone have any examples if so?

Yes, selective replication (using replication sets) is supported in the current 0.9 stable series.

The documentation on replication sets is very sparse at the moment; the next iteration will improve that.


There are also some improvements needed to the user interface - in particular, providing a function interface for changing replication set memberships for connections so there's no need to manually restart the apply backends after a change, and providing default replication sets for a node. Current development priorities mean that these aren't expected in the next few releases.

Note that selective replication affects *only* replication of rows. DDL is still replicated on tables that are not members of any active replication set. Also, changing replication set memberships won't synchronise the added table's rows from other nodes, it'll just start replicating new changes from its current state. You generally want to set up replication sets before starting to add data to tables.

All this applies to 0.9.0 and is, of course, subject to change in future releases, time and resources permitting.

 -- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BDR Selective Replication

От
swaxolez
Дата:
I get the feeling I might want to wait for the next point release before
deploying on anything other than a test platform. In the meantime, I'll play
around and see how it works.  These are fantastic additions to a fantastic
database.  Thanks for the good work!



--
View this message in context: http://postgresql.nabble.com/BDR-Selective-Replication-tp5846864p5846898.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: BDR Selective Replication

От
Craig Ringer
Дата:


On 26 April 2015 at 23:52, swaxolez <willem@pcfish.ca> wrote:
I get the feeling I might want to wait for the next point release before
deploying on anything other than a test platform. In the meantime, I'll play
around and see how it works.

In the mean time, take a look at the rest of the documentation for the coming version: http://bdr-project.org/docs/next/ . It's worth thinking carefully about whether multi-master is right for you and understanding the trade-offs involved with multi-master in general, and BDR in particular.

 BDR's development is driven mostly by customer priorities. Currently we're focused on improvements to dump and restore, DDL replication, and node removal, plus some backporting of 9.5 versions of underlying features. 

There's no current work planned on things like skipping DDL replication for tables that are not in a replication set, table sync when replication sets are changed, etc.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BDR Selective Replication

От
Jim Nasby
Дата:
On 4/26/15 7:49 AM, Craig Ringer wrote:
> There are also some improvements needed to the user interface - in
> particular, providing a function interface for changing replication set
> memberships for connections so there's no need to manually restart the
> apply backends after a change, and providing default replication sets
> for a node.

If 'default replication set' is the idea of "here's what tables *should*
be getting replicated regardless of whether that's happening or not",
it'd be great if that was done so it could be split out on it's own at
some point. It's a problem that affects all replication systems.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


Re: BDR Selective Replication

От
Craig Ringer
Дата:

On 28 April 2015 at 05:38, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 4/26/15 7:49 AM, Craig Ringer wrote:
There are also some improvements needed to the user interface - in
particular, providing a function interface for changing replication set
memberships for connections so there's no need to manually restart the
apply backends after a change, and providing default replication sets
for a node.

If 'default replication set' is the idea of "here's what tables *should* be getting replicated regardless of whether that's happening or not", it'd be great if that was done so it could be split out on it's own at some point. It's a problem that affects all replication systems.

It wasn't, but that's an interesting idea.

You need  away to identify peer nodes in an abstract way before you can really define sets of which nodes should get which tables. So I think replication identifiers ( https://commitfest.postgresql.org/4/161/ ) are a pre-requisite for that though, and one that's proving difficult to get in. 

I think any sort of replication sets is likely to have similar problems, especially the "no in-core user" problem. There's nothing fundamentally impossible about filtering WAL sent to physical downstreams over streaming replication to include only replicated tables and the catalogs, though, so perhaps there could be an in-core user for it.

In BDR we're currently (ab)using security labels to tag tables with their replication sets, but I'd love to have a proper way to do that. As I recall the prior approach, of allowing custom relation options, was rejected on -hackers.

How would you want to go about storing and tracking the information? A new catalog? The other issue for in-core replication sets would probably be making it foreign-key aware, so replication of a table transitively requires replication of its references.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BDR Selective Replication

От
Jim Nasby
Дата:
On 4/27/15 7:54 PM, Craig Ringer wrote:
>     If 'default replication set' is the idea of "here's what tables
>     *should* be getting replicated regardless of whether that's
>     happening or not", it'd be great if that was done so it could be
>     split out on it's own at some point. It's a problem that affects all
>     replication systems.
>
>
> It wasn't, but that's an interesting idea.
>
> You need  away to identify peer nodes in an abstract way before you can
> really define sets of which nodes should get which tables. So I think
> replication identifiers ( https://commitfest.postgresql.org/4/161/ ) are
> a pre-requisite for that though, and one that's proving difficult to get
> in.

Perhaps... different replication systems probably use different methods
to identify, so presumably there'd need to be some way to map a generic
identifier into an appropriate identifier for whatever replication
system you're using.

> I think any sort of replication sets is likely to have similar problems,
> especially the "no in-core user" problem. There's nothing fundamentally
> impossible about filtering WAL sent to physical downstreams over
> streaming replication to include only replicated tables and the
> catalogs, though, so perhaps there could be an in-core user for it.

Oh, I wasn't thinking this needed to be in-core. I think it'd be a lot
easier to develop it as an extension to start with... certainly a lot
less headache ;) If it becomes popular then it'll be a lot easier to get
it added.

> In BDR we're currently (ab)using security labels to tag tables with
> their replication sets, but I'd love to have a proper way to do that. As
> I recall the prior approach, of allowing custom relation options, was
> rejected on -hackers.
>
> How would you want to go about storing and tracking the information? A
> new catalog? The other issue for in-core replication sets would probably
> be making it foreign-key aware, so replication of a table transitively
> requires replication of its references.

As you said, we'd need a way to identify replication nodes. We might
also need/want a way to specify topology. I don't think topology would
be too hard (presumably it's either a single 'parent' node, or a list of
peers). What might be more interesting is dealing with different systems
methods of identifying nodes.

You'd want a way to define different sets and associate them with nodes.
A node could be a provider, subscriber, or both. I think some
replication systems support 'pass through' as well, where the node
passes data downstream but doesn't apply it itself. Or it could be
multi-master and possibly a provider to read-only subscribers.

Finally you'd need to associate tables and sequences with a set. I agree
you'd want to look at FKs. I'd also like to be able to define rules for
a set, like "include everything in this schema, unless the first
character is _".
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


Re: BDR Selective Replication

От
Craig Ringer
Дата:


On 29 April 2015 at 09:14, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
On 4/27/15 7:54 PM, Craig Ringer wrote:
    If 'default replication set' is the idea of "here's what tables
    *should* be getting replicated regardless of whether that's
    happening or not", it'd be great if that was done so it could be
    split out on it's own at some point. It's a problem that affects all
    replication systems.


It wasn't, but that's an interesting idea.

You need  away to identify peer nodes in an abstract way before you can
really define sets of which nodes should get which tables. So I think
replication identifiers ( https://commitfest.postgresql.org/4/161/ ) are
a pre-requisite for that though, and one that's proving difficult to get
in.

Perhaps... different replication systems probably use different methods to identify, so presumably there'd need to be some way to map a generic identifier into an appropriate identifier for whatever replication system you're using.

Replication identifiers do just that: provide a way to map identifiers from some external system into a local unique identifier for a peer node, along with tracking of the replay position from the peer so replay can be restarted at a consistent point. The replay position is an LSN, so they're not going to work for any arbitrary system, though.

How would you want to go about storing and tracking the information? A
new catalog? The other issue for in-core replication sets would probably
be making it foreign-key aware, so replication of a table transitively
requires replication of its references.

As you said, we'd need a way to identify replication nodes. We might also need/want a way to specify topology.

Topology? Why?

All a node needs to know is "send data from <these tables> to <these peers>". It's just a set. If a replication system is doing something fancy it'd be able to manage the replication sets on the nodes.
 
I don't think topology would be too hard (presumably it's either a single 'parent' node, or a list of peers). What might be more interesting is dealing with different systems methods of identifying nodes.

Yeah, topology is hard. Rings, mesh with dangling follower nodes, etc.

I don't think it's really the same thing as replication sets.

You'd want a way to define different sets and associate them with nodes. A node could be a provider, subscriber, or both. I think some replication systems support 'pass through' as well, where the node passes data downstream but doesn't apply it itself. Or it could be multi-master and possibly a provider to read-only subscribers.

Yeah, you're talking about some kind of abstract modelling of a replication topology. I'm not sure that's at all necessary to keep track of which tables should be replicated to which nodes.


--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: BDR Selective Replication

От
Jim Nasby
Дата:
On 4/29/15 1:38 AM, Craig Ringer wrote:
>     Perhaps... different replication systems probably use different
>     methods to identify, so presumably there'd need to be some way to
>     map a generic identifier into an appropriate identifier for whatever
>     replication system you're using.
>
>
> Replication identifiers do just that: provide a way to map identifiers
> from some external system into a local unique identifier for a peer
> node, along with tracking of the replay position from the peer so replay
> can be restarted at a consistent point. The replay position is an LSN,
> so they're not going to work for any arbitrary system, though.

Which may not work for something meant to work with different
replication systems...

>     You'd want a way to define different sets and associate them with
>     nodes. A node could be a provider, subscriber, or both. I think some
>     replication systems support 'pass through' as well, where the node
>     passes data downstream but doesn't apply it itself. Or it could be
>     multi-master and possibly a provider to read-only subscribers.
>
>
> Yeah, you're talking about some kind of abstract modelling of a
> replication topology. I'm not sure that's at all necessary to keep track
> of which tables should be replicated to which nodes.

I'd think that you'd still need to know if a table is a provider or
subscriber regardless of topology; how else will you know how to add it?

As for the topology part, yes, perhaps that's more than the baseline
case. It might be enough of a win to just deal with tables and sets to
not worry about it.

I originally had this idea when dealing with a number of londiste
clusters and wishing I had something better than "Run this SELECT and
paste the output to the command line" to deal with adding newly created
tables. It seemed likely that a more generic system should also be
pretty easy to allow plugging into different replication systems;
there'd just need to be a different layer that translated definition
into actual replication commands. Then the only thing missing would be
defining what sets lived where; that would allow the generic system at
least define almost every aspect of a replication environment. Maybe
that's too ambitious; the first step would be to try just what tables
are in which set and see how that goes.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com