Re: Replication Docs
От | Bruce Momjian |
---|---|
Тема | Re: Replication Docs |
Дата | |
Msg-id | 200611221736.kAMHasi00788@momjian.us обсуждение исходный текст |
Ответ на | Replication Docs (Markus Schiltknecht <markus@bluegap.ch>) |
Ответы |
Re: Replication Docs
(Markus Schiltknecht <markus@bluegap.ch>)
|
Список | pgsql-docs |
Markus Schiltknecht wrote: > Hello Bruce, > > I was trying to put together all comments to specific sections, thus the > new thread. Hope that helps. > > *** Synchronous Multi-Master Replication *** > > Bruce Momjian wrote: > > OK, new title is "Synchonous Multi-Master Replication", and the next > > heading is "Asynchronous Multi-Master Replication". > > Good, I really like that one. :-) Great (until we change it again) ;-) > >> Why not simply call in "Multi Master Replication"? That implies > >> clustering, doesn't it? > > > > Well, not really because of the async multi-master that is the next > > item. > > Yes, it's fine that way. I was just unsure if you want to have sync and > async in one paragraph or not. The proposal "Multi Master Replication" > would only fit if we'd describe both in one paragraph. I like to > describe both in more detail, as you did now. OK, it is two separate entries now: http://momjian.us/main/writings/pgsql/sgml/high-availability.html > >> BTW, I'm slowly beginning to accept that you don't want to mix > >> "Statement-Based Replication Middleware" with "Multi Master > >> Replication". ;-) > > > > OK, are they mixed now? > > No, they're not. They're split, which I think is what you want. I've > been uncomfortable with was that split into "Statement-Based Replication > Middleware" and "Synchronous Multi-Master Replication". I've been > arguing that the first describes one possible implementation of the > second, while other implementations are not described (2PC, SHMEM, > Postgres-R, etc...) > > I was trying to say that I'm beginning to accept that split, because > especially pgpool really seems to put a lot of those burdens to the > user. I've been trying to use some humor, but that mainly seems to > confuse people. My english might not be good enough for humor, yet. > > However, where do you now fit Sequoia in? It uses "statement-based > replication", but AFAIK it is much more clever than pgpool and handles > non-deterministic functions. And the Sequoia people probably won't get > excited about not calling them "Multi-Master Replication". Uh, good point. The title is now "Statement-Based Replication Middleware". That doesn't say multi-master, but it doesn't say master/slave either. The Sequoia PDF you sent me is very detailed: http://www.continuent.org/uploads/sequoia/Resources/2006-08-15Cecchet_ApacheConAsia2006.pdf I think we are back to the issue of classification. We have traditional master/slave as slony, and multi-master as perhaps pgcluster, and lots in between. I am thinking pgpool and sequoia fit in there. I have added Sequoia to the Statement-Based Replication Middleware section. > Bruce Momjian wrote: > > I just saw it [the slides about PGCluster-II]. It does seem more like > > Oracle RAC than any other method. > > Yes. I think it's not production ready, yet, so there's no point in > mentioning it in the documentation. OK. > Bruce Momjian wrote: > > I figured that shared-disk/memory only really makes sense for > > multi-master clustering, so I mentioned it in that paragraph: > > > > ...<snipped the new paragraph> > > > > Is that enought? > > I'd say so, yes. We are not going into more details for other aspects so > that's fine. OK. > You might not even mention shared-memory. I don't know of any > implementation in the database world. Except perhaps using OpenMosix and > running PostgreSQL on top of it. Maybe just leave it in there, it won't > hurt. OK, I will only mention shared disk now. > Bruce Momjian wrote: > > One problem I have is that we we have shared disk failover, but no > > other shared case with a PostgreSQL implementation, and people don't > > want to mention Oracle RAC, so why do we mention it if we have no > > implementations even in the works. > > Most probably you're already aware that with PGCluster-II we have such > an implementation in the works. I do now. :-) I think we are OK with the additional sentence about shared disk in the Synchonous Multi-Master Replication section, right? > *** Asynchronous Multi-Master Replication *** > > >> Again, IMHO, "Parallel Query Execution" says everything. The word > >> 'Clustering' does not help, because it's not defined nor commonly > >> used in any helpful way (probably besides marketing). > > > > OK, new title is Multi-Server Parallel Query Execution. If I have > > just "Parallel Query Execution", it could be multi-process parallel > > query execution. > > Yes, the new title is good. > > In the text below, you are mainly describing what I call 'disconnected > operation' (somebody have a better, more common term for that?). But the > main advantage of async replication is having no delay before commit. > Thus giving better performance for writing transactions. > > In case of async, multi master replication, conflicts can arise, which > have to be resolved. I think your example does not make it clear that > this applies to async, multi master replication in general. And that > those can sometimes be resolved automatically. OK, good point, section updated: <term>Asynchronous Multi-Master Replication</term> <listitem> <para> For servers that are not regularly connected, like laptops or remote servers, keeping data consistent among servers is a challenge. Using asynchronous multi-master replication, each server works independently, and periodically communicates with the other servers to identify conflicting transactions. The conflicts can be resolved by users or conflict resolution rules. rules. > > > *** Multi-Master Parallel Query Execution *** > > Bruce Momjian wrote: > > Uh, multi-master replication allows for load balancing, but it doesn't > > help a single query to run any faster. Think of having only one query > > running on the cluster. Parallel execution allows a single query to > > use more than one computer, right? > > Right. > > > Uh, this confuses me. What is missing? You split tables across > > multiple servers. > > In "Multi-Master Parallel Query Execution" you write: "One possible way > this could work is for the data to be split among servers". So the > example you give involves Data Partitioning. OK. > I wanted to point out that another way to do Parallel Query Execution is > using Multi-Master Replication to have equal replicas and then query > them in parallel. I don't think there is any solution for that, yet. > Except, perhaps PGPool-II can do it? Uh, if the data isn't partitioned, what value is there to hitting multiple servers, for single query? I am confused. > *** Introduction Text on the top *** > > Bruce Momjian wrote: > > OK, updated to add "little" delay, and removed "small" from async > > case: > > > > load-balanced servers will return consistent results with little > > propagation delay. Asynchronous updating has a delay between the > > Hm, that does not address my concerns. But after thinking about it, I > can accept the term 'consistent results' - it's clear enough what it > means. I'm probably thinking into too many details... OK. > But now, the "little delays" certainly is in the wrong place. Such > delays occur before commit, not before returning results. Uh, I don't think the little appears to talk about the results but only the propogation. > Maybe revert it back to "..no propagation delay". Or completely leave > away the "no propagation delay". OK, how is this new text? This guarantees that a failover will not lose any data and that all load-balanced servers will return consistent results no matter which server is queried. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
В списке pgsql-docs по дате отправления: