Обсуждение: Postgres Replication
We have been researching replication for several months now, and I have some opinions to share to the community for feedback, discussion, and/or participation. Our goal is to get a replication solution for PostgreSQL that will meet most needs of users and applications alike (mission impossible theme here :). My research work along with others contributors has been collected and presented here http://www.greatbridge.org/genpage?replication_top If there is something missing, especially PostgreSQL related work, I would like to know about it, and my apologies to any one who got left off the list. This work is ongoing and doesn't draw a conclusion, which IMHO should be left up to the user, but I'm offering my opinions to spur discussion and/or feed back from this list, and try not to offend any one. Here's my opinion: of the approaches we've surveyed, the most promising one is the Postgres-R project from the Information and Communication Systems Group, ETH in Zurich, Switzerland, originally produced by Bettina Kemme, Gustavo Alonso, and others. Although Postgres-R is a synchronous approach, I believe it is the closest to the goal mentioned above. Here is an abstract of the advantages. 1) Postgres-R is built on the PostgreSQL-6.4.2 code base. The replication functionality is an optional parameter, so there will be insignificant overhead for non replication situations. The replication and communication managers are the two new modules added to the PostgreSQL code base. 2) The replication manager's main function is controlling the replication protocol via a message handling process. It receives messages from the local and remote backends and forwards write sets and decision messages via the communication manager to the other servers. The replication manager controls all the transactions running on the local server by keeping track of the states, including which protocol phase (read, send, lock, or write) the transaction is in. The replication manager maintains a two way channel implemented as buffered sockets to each backend. 3) The main task of the communication manager is to provide simple socket based interface between the replication manager and the group communication system (currently Ensemble). The communication system is a cluster of servers connected via the communication manager. The replication manager also maintains three one-way channels to the communication system: a broadcast channel to send messages, a total-order channel to receive totally orders write sets, and a no-order channel to listen for decision messages from the communication system. Decision messages can be received at any time where the reception of totally ordered write sets can be blocked in certain phases. 4) Based on a two phase locking approach, all dead lock situations are local and detectable by Postgres-R code base, and aborted. 5) The write set messages used to send database changes to other servers, can use either the SQL statements or the actual tuples changed. This is a parameter based on number of tuples changed by a transaction. While sending the tuple changes reduces overhead in query parse, plan and execution, there is a negative effect in sending a large write set across the network. 6) Postgres-R uses a synchronous approach that keeps the data on all sites consistent and provides serializability. The user does not have to bother with conflict resolution, and receives the same correctness and consistency of a centralized system. 7) Postgres-R could be part of a good fault-resilient and load distribution solution. It is peer-to-peer based and incurs low overhead propagating updates to the other cluster members. All replicated databases locally process queries. 8) Compared to other synchronous replication strategies (e.g., standard distributed 2-phase-locking + 2-phase-commit), Postgres-R has much better performance using 2-phase-locking. There are some issues that are not currently addressed by Postgres-R, but some enhancements made to PostgreSQL since the 6.4.2 tree are very favorable to addressing these short comings. 1) The addition of WAL in 7.1 has the information for recovering failed/off-line servers, currently all the servers would have to be stopped, and a copy would be used to get all the servers synchronized before starting again. 2)Being synchronous, Postgres-R would not be a good solution for off line/WAN scenarios where asynchronous replication is required. There are some theories on this issue which involve servers connecting and disconnecting from the cluster. 3)As in any serialized synchronous approach there is change in the flow of execution of a transaction; while most of these changes can be solved by calling newly developed functions at certain time points, synchronous replica control is tightly coupled with the concurrency control. Hence, especially in PostgreSQL 7.2 some parts of the concurrency control (MVCC) might have to be adjusted. This can lead to a slightly more complicated maintenance than a system that does not change the backend. 4)Partial replication is not addressed. Any feedback on this post will be appreciated. Thanks, Darren
On Mon, 11 Jun 2001 19:46:44 GMT, you wrote: >We have been researching replication for several months now, and >I have some opinions to share to the community for feedback, >discussion, and/or participation. Our goal is to get a replication >solution for PostgreSQL that will meet most needs of users >and applications alike (mission impossible theme here :). > >My research work along with others contributors has been collected >and presented here http://www.greatbridge.org/genpage?replication_top >If there is something missing, especially PostgreSQL related >work, I would like to know about it, and my apologies to any >one who got left off the list. This work is ongoing and doesn't >draw a conclusion, which IMHO should be left up to the user, >but I'm offering my opinions to spur discussion and/or feed back >from this list, and try not to offend any one. > >Here's my opinion: of the approaches we've surveyed, the most >promising one is the Postgres-R project from the Information and >Communication Systems Group, ETH in Zurich, Switzerland, originally >produced by Bettina Kemme, Gustavo Alonso, and others. Although >Postgres-R is a synchronous approach, I believe it is the closest to >the goal mentioned above. Here is an abstract of the advantages. > >1) Postgres-R is built on the PostgreSQL-6.4.2 code base. The >replication >functionality is an optional parameter, so there will be insignificant >overhead for non replication situations. The replication and >communication >managers are the two new modules added to the PostgreSQL code base. > >2) The replication manager's main function is controlling the >replication protocol via a message handling process. It receives >messages from the local and remote backends and forwards write >sets and decision messages via the communication manager to the >other servers. The replication manager controls all the transactions >running on the local server by keeping track of the states, including >which protocol phase (read, send, lock, or write) the transaction is >in. The replication manager maintains a two way channel >implemented as buffered sockets to each backend. what does "manager controls all the transactions" mean? I hope it does *not* mean that a bug in the manager would cause transactions not to commit... > >3) The main task of the communication manager is to provide simple >socket based interface between the replication manager and the >group communication system (currently Ensemble). The >communication system is a cluster of servers connected via >the communication manager. The replication manager also maintains >three one-way channels to the communication system: a broadcast >channel to send messages, a total-order channel to receive >totally orders write sets, and a no-order channel to listen for >decision messages from the communication system. Decision >messages can be received at any time where the reception of >totally ordered write sets can be blocked in certain phases. > >4) Based on a two phase locking approach, all dead lock situations >are local and detectable by Postgres-R code base, and aborted. Does this imply locking over different servers? That would mean a grinding halt when a network outage occurs... >5) The write set messages used to send database changes to other >servers, can use either the SQL statements or the actual tuples >changed. This is a parameter based on number of tuples changed >by a transaction. While sending the tuple changes reduces >overhead in query parse, plan and execution, there is a negative >effect in sending a large write set across the network. > >6) Postgres-R uses a synchronous approach that keeps the data on >all sites consistent and provides serializability. The user does not >have to bother with conflict resolution, and receives the same >correctness and consistency of a centralized system. > >7) Postgres-R could be part of a good fault-resilient and load >distribution >solution. It is peer-to-peer based and incurs low overhead propagating >updates to the other cluster members. All replicated databases locally >process queries. > >8) Compared to other synchronous replication strategies (e.g., standard >distributed 2-phase-locking + 2-phase-commit), Postgres-R has much >better performance using 2-phase-locking. Coming from a Sybase background I have some experience with replication. The way it works in Sybase Replication server is as follows: - for each replicated database, there is a "log reader" process that reads the WAL and captures only *committed transactions* to the replication server. (it does not make much sense to replicate other things IMHO :-). - the replication server stores incoming data in a que ("stable device"), until it is sure it has reached its final destination - a replication server can send data to another replication server in a compact (read: WAN friendly) way. A chain of replication servers can be made, depending on network architecture) - the final replication server makes a almost standard client connection to the target database and translates the compact transactions back to SQL statements. By using masks, extra functionality can be built in. This kind of architecture has several advantages: - only committed transactions are replicated which saves overhead - it does not have very much impact on performance of the source server (apart from reading the WAL) - since every replication server has a stable device, data is stored when the network is down and nothing gets lost (nor stops performing) - because only the log reader and the connection from the final replication server are RDBMS specific, it is possible to replicate from MS to Oracle using a Sybase replication server (or different versions etc). I do not know how much of this is patented or copyrighted, but the architecture seems elegant and robust to me. I have done implementations of bi-directional replication too. It *is* possible but does require some funky setup and maintenance. (but it is better that letting offices on different continents working on the same database :-) just my 2 EURO cts :-) -- __________________________________________________ "Nothing is as subjective as reality" Reinoud van Leeuwen reinoud@xs4all.nl http://www.xs4all.nl/~reinoud __________________________________________________
On Mon, 11 Jun 2001, Reinoud van Leeuwen wrote:
> On Mon, 11 Jun 2001 19:46:44 GMT, you wrote:
> what does "manager controls all the transactions" mean? I hope it does
> *not* mean that a bug in the manager would cause transactions not to
> commit...
Well yeah it does. Bugs are a fact of life. :)
> >4) Based on a two phase locking approach, all dead lock situations
> >are local and detectable by Postgres-R code base, and aborted.
>
> Does this imply locking over different servers? That would mean a
> grinding halt when a network outage occurs...
Don't know, but see below.
> Coming from a Sybase background I have some experience with
> replication. The way it works in Sybase Replication server is as
> follows:
> - for each replicated database, there is a "log reader" process that
> reads the WAL and captures only *committed transactions* to the
> replication server. (it does not make much sense to replicate other
> things IMHO :-).
> - the replication server stores incoming data in a que ("stable
> device"), until it is sure it has reached its final destination
>
> - a replication server can send data to another replication server in
> a compact (read: WAN friendly) way. A chain of replication servers can
> be made, depending on network architecture)
>
> - the final replication server makes a almost standard client
> connection to the target database and translates the compact
> transactions back to SQL statements. By using masks, extra
> functionality can be built in.
>
> This kind of architecture has several advantages:
> - only committed transactions are replicated which saves overhead
> - it does not have very much impact on performance of the source
> server (apart from reading the WAL)
> - since every replication server has a stable device, data is stored
> when the network is down and nothing gets lost (nor stops performing)
> - because only the log reader and the connection from the final
> replication server are RDBMS specific, it is possible to replicate
> from MS to Oracle using a Sybase replication server (or different
> versions etc).
>
> I do not know how much of this is patented or copyrighted, but the
> architecture seems elegant and robust to me. I have done
> implementations of bi-directional replication too. It *is* possible
> but does require some funky setup and maintenance. (but it is better
> that letting offices on different continents working on the same
> database :-)
Yes, the above architecture is what almost every vendor of replication
software uses. And I'm sure if you worked much with Sybase, you hate the
garbage that their repserver is :).
The architecture of postgres-r and repserver are fundamentally different
for a good reason: repserver only wants to replicate committed
transactions, while postgres-r is more of a 'clustering' solution (albeit
they don't say this word), and is capable to do much more than simple rep
server.
I.E. you can safely put half of your clients to second server in a
replicated postgres-r cluster without being worried that a conflict (or a
wierd locking situation) may occur.
Try that with sybase, it is fundamentally designed for one-way
replication, and the fact that you can do one-way replication in both
directions doesn't mean its safe to do that!
I'm not sure how postgres-r handles network problems. To be useful, a good
replication solution must have an option of "no network->no updates" as
well as "no network->queue updates and send them later". However, it is
far easier to add queuing to a correct 'eager locking' database than it is
to add proper locking to a queue-based replicator.
-alex
Thanks for the feedback. I'll try to address both your issues here.
>> what does "manager controls all the transactions" mean?
>
The replication manager controls the transactions by serializing the
write set messages.
This ensures all transactions are committed in the same order on each
server, so bugs
here are not allowed ;-)
>> I hope it does
>> *not* mean that a bug in the manager would cause transactions not to
>> commit...
>
> Well yeah it does. Bugs are a fact of life. :
>
>>> 4) Based on a two phase locking approach, all dead lock situations
>>> are local and detectable by Postgres-R code base, and aborted.
>>
>> Does this imply locking over different servers? That would mean a
>> grinding halt when a network outage occurs...
>
> Don't know, but see below.
There is a branch of the Postgres-R code that has some failure detection
implemented,
so we will have to merge this functionality with the version of
Postgres-R we have, and
test this issue. I'll let you the results.
>>
>> - the replication server stores incoming data in a que ("stable
>> device"), until it is sure it has reached its final destination
>
I like this idea for recovering servers that have been down a short
period of time, using WAL
to recover transactions missed during the outage.
>>
>> This kind of architecture has several advantages:
>> - only committed transactions are replicated which saves overhead
>> - it does not have very much impact on performance of the source
>> server (apart from reading the WAL)
>> - since every replication server has a stable device, data is stored
>> when the network is down and nothing gets lost (nor stops performing)
>> - because only the log reader and the connection from the final
>> replication server are RDBMS specific, it is possible to replicate
>> from MS to Oracle using a Sybase replication server (or different
>> versions etc).
>
There are some issues with the "log reader" approach:
1) The databases are not synchronized until the log reader completes its
processing.
2) I'm not sure about Sybase, but the log reader sends SQL statements to
the other servers
which are then parsed, planned and executed. This over head could be
avoided if only
the tuple changes are replicated.
3) Works fine for read only situations, but peer-to-peer applications
using this approach
must be designed with a conflict resolution scheme.
Don't get me wrong, I believe we can learn from the replication
techniques used by commercial
databases like Sybase, and try to implement the good ones into
PostgreSQL. Postgres-R is
a synchronous approach which out performs the traditional approaches to
synchronous replication.
Being based on PostgreSQL-6.4.2, getting this approach in the 7.2 tree
might be better than
reinventing the wheel.
Thanks again,
Darren
Thanks again,
Darren