Обсуждение: replication hooks
On 5/29/08, Andrew Sullivan <ajs@commandprompt.com> wrote: > On Thu, May 29, 2008 at 12:05:18PM -0700, Robert Hodges wrote: > > people are starting to get religion on this issue I would strongly > > advocate a parallel effort to put in a change-set extraction API > > that would allow construction of comprehensive master/slave > > replication. > > You know, I gave a talk in Ottawa just last week about how the last > effort to develop a comprehensive API for replication failed. I had > some ideas about why, the main one of which is something like this: > "Big features with a roadmap have not historically worked, so unless > we're willing to change the way we work, we won't get that." > > I don't think an API is what's needed. It's clear proposals for > particlar features that can be delivered in small pieces. That's what > the current proposal offers. I think any kind of row-based approach > such as what you're proposing would need that kind of proposal too. > > That isn't to say that I think an API is impossible or undesirable. > It is to say that the last few times we tried, it went nowhere; and > that I don't think the circumstances have changed. I think the issue is simpler - API for synchronous replication is undesirable - it would be too complex and hinder future development (as I explained above). And the API for asynchronous replication is already there - triggers, txid functions for queueing. There is this tiny matter of replicating schema changes asynchronously, but I suspect nobody actually cares. Few random points about that: - The task cannot even be clearly defined (on technical level - how the events should be represented). - Any schema changes need to be carefully prepared anyway. Whether to apply them to one or more servers does not make muchdifference. - Major plus of async replica is ability to actually have different schema on slaves. - People _do_ care about exact schema on single place - failover servers. - But for failover server we want also synchronous replication. So if we have synchronous WAL based replication for failover servers, the interest in hooks to log schema changes will decrease even more. -- marko
On Thu, May 29, 2008 at 11:05:09PM +0300, Marko Kreen wrote: > There is this tiny matter of replicating schema changes asynchronously, > but I suspect nobody actually cares. I know that Slony's users call this their number one irritant, so I have my doubts nobody cares. But maybe nobody cares enough. > - The task cannot even be clearly defined (on technical level - how > the events should be represented). Really? I've been in discussions where different people had clear (but, alas, different) ideas of how to represent them. > - Any schema changes need to be carefully prepared anyway. Whether > to apply them to one or more servers does not make much difference. One problem that designers of replication systems have is that they're already thinking in the Serious Database Application world. But I have recently had the pleasure of being reminded how many users of database systems neither know nor care to know any of the details of the underlying system. They already know how to make schema changes: log into database, and start typing "ALTER TABLE. . ." You or I agreeing that more careful preparation than that is important will not change their mind. This is part of the reason MySQL looks so good: you can "just do" these things. If it doesn't work out later, well, you don't know that when your ALTER TABLE "just works". > - Major plus of async replica is ability to actually have different > schema on slaves. I agree. > - People _do_ care about exact schema on single place - failover servers. Yeah, but not only there. One of the things I was hoping to have nailed down in the "hooks" discussion was, in fact, the use cases. Half the time, people have such a clear idea of what _they_ want from their replication that they come to believe "replication" means that. Another thing I like about the current proposal is that it is very clear about what it is (and isn't) aiming for. A -- Andrew Sullivan ajs@commandprompt.com +1 503 667 4564 x104 http://www.commandprompt.com/
On 5/29/08, Andrew Sullivan <ajs@commandprompt.com> wrote: > On Thu, May 29, 2008 at 11:05:09PM +0300, Marko Kreen wrote: > > There is this tiny matter of replicating schema changes asynchronously, > > but I suspect nobody actually cares. > > I know that Slony's users call this their number one irritant, so I > have my doubts nobody cares. But maybe nobody cares enough. Oh, users of course like their lives to be as easy as possible and all tools be "do as i wish"-complete. I meant no developer is interested after looking at the task complexity and resulting payoff. > > - The task cannot even be clearly defined (on technical level - how > > the events should be represented). > > Really? I've been in discussions where different people had clear > (but, alas, different) ideas of how to represent them. Yeah. The main problem is that unless you do WAL based replication, you cannot achieve transparency. So you need to pick few use cases and tailor you solution for them, which gets uninteresting very fast - user _will_ stumble upon spacial cases, and if they expect everything "just work" the resulting conversation wont be funny. > > - Any schema changes need to be carefully prepared anyway. Whether > > to apply them to one or more servers does not make much difference. > > One problem that designers of replication systems have is that they're > already thinking in the Serious Database Application world. But I > have recently had the pleasure of being reminded how many users of > database systems neither know nor care to know any of the details of > the underlying system. They already know how to make schema changes: > log into database, and start typing "ALTER TABLE. . ." You or I > agreeing that more careful preparation than that is important will not > change their mind. This is part of the reason MySQL looks so good: > you can "just do" these things. If it doesn't work out later, well, > you don't know that when your ALTER TABLE "just works". Simple - use WAL-based replication. Although - not so simple, as currently we don't provide it. The existing PITR hooks expect users to write their own replication, which is not a user-friendly approach... Hopefully this will be fixed in 8.4. > > - People _do_ care about exact schema on single place - failover servers. > > Yeah, but not only there. One of the things I was hoping to have > nailed down in the "hooks" discussion was, in fact, the use cases. > Half the time, people have such a clear idea of what _they_ want from > their replication that they come to believe "replication" means that. The main problem with replica-hooks-discuss list was lack of focus. There are various replication methods - single-master, multi-master, asynchronous, synchronous, WAL-based, trigger-based, changeset-based. Any combination wants different hooks, putting them all together makes people not care. Eg - setting the topic to schema change logging for async trigger-based replication would be better, but even there are various usage scenarios that may not be compatible, so it people don't see a chance of common hooks they don't bother. Actually I suspect this task is solvable, main problem is that it's pretty low on anyones priority list. > Another thing I like about the current proposal is that it is very > clear about what it is (and isn't) aiming for. Yes. And we can skip the "common hooks" discussion. ;) -- marko
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > Yeah. The main problem is that unless you do WAL based replication, > you cannot achieve transparency. So you need to pick few use cases > and tailor you solution for them, which gets uninteresting very fast > - user _will_ stumble upon spacial cases, Isn't that what PostGIS is for? g,d,&r - -- Greg Sabino Mullane greg@turnstep.com PGP Key: 0x14964AC8 200805291840 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkg/MTQACgkQvJuQZxSWSsjfEACgr64IdjtfhidTAGg/dVVVTMOP 0HAAn2tkYoNleSryZ5EyiSMp0o2x9ZFL =Fmc4 -----END PGP SIGNATURE-----
<font face="Verdana, Helvetica, Arial"><span style="font-size:12.0px">Hi Marko, <br /><br /> Replication requirements varywidely of course, but DDL support is shared by such a wide range of use cases it is very difficult to see how any realsolution would fail to include it. This extends to change extraction APIs, however, defined. The question of what DDLto replicate is also quite clear—all of it with as few exceptions as possible. <br /><br /> For instance, it is almostimpossible to set up and manage replicated systems easily if you cannot propagate schema changes in serialized orderalong with other updates from applications. The inconvenience of using alternative mechanisms like the SLONY ‘executescript’ is considerable and breaks most commonly used database management tools. <br /><br /> That said, SLONY atleast serializes the changes. Non-serialized approaches lead to serious outages and can get you into distributed consensusproblems, such as when is it ‘safe’ to change schema across different instances. These are very hard to solve practicallyand tend to run into known impossibility results like Brewer’s Conjecture, which holds that it is impossible tokeep distributed databases consistent while also remaining open for updates and handling network partitions. <br /><br/> I’ll post back later on the question of the API. The key is to do something simple that avoids the problems discussedby Andrew and ties it accurately to use cases. However, this requires a more prepared response than my hastilywritten post from last night. <br /><br /> Cheers, Robert<br /><br /> On 5/29/08 9:05 PM, "Marko Kreen" <markokr@gmail.com>wrote:<br /><br /></span></font><blockquote><font face="Verdana, Helvetica, Arial"><span style="font-size:12.0px">On5/29/08, Andrew Sullivan <ajs@commandprompt.com> wrote:<br /> > On Thu, May 29, 2008at 12:05:18PM -0700, Robert Hodges wrote:<br /> > > people are starting to get religion on this issue I wouldstrongly<br /> > > advocate a parallel effort to put in a change-set extraction API<br /> > > that wouldallow construction of comprehensive master/slave<br /> > > replication.<br /> ><br /> > You know, I gavea talk in Ottawa just last week about how the last<br /> > effort to develop a comprehensive API for replicationfailed. I had<br /> > some ideas about why, the main one of which is something like this:<br /> > "Bigfeatures with a roadmap have not historically worked, so unless<br /> > we're willing to change the way we work,we won't get that."<br /> ><br /> > I don't think an API is what's needed. It's clear proposals for<br /> > particlar features that can be delivered in small pieces. That's what<br /> > the current proposal offers. I thinkany kind of row-based approach<br /> > such as what you're proposing would need that kind of proposal too.<br />><br /> > That isn't to say that I think an API is impossible or undesirable.<br /> > It is to say that thelast few times we tried, it went nowhere; and<br /> > that I don't think the circumstances have changed.<br /><br/> I think the issue is simpler - API for synchronous replication is<br /> undesirable - it would be too complex andhinder future development<br /> (as I explained above).<br /><br /> And the API for asynchronous replication is alreadythere - triggers,<br /> txid functions for queueing.<br /><br /> There is this tiny matter of replicating schema changesasynchronously,<br /> but I suspect nobody actually cares. Few random points about that:<br /><br /> - The task cannoteven be clearly defined (on technical level - how<br /> the events should be represented).<br /> - Any schema changesneed to be carefully prepared anyway. Whether<br /> to apply them to one or more servers does not make much difference.<br/> - Major plus of async replica is ability to actually have different<br /> schema on slaves.<br /> - People_do_ care about exact schema on single place - failover servers.<br /> - But for failover server we want also synchronousreplication.<br /><br /> So if we have synchronous WAL based replication for failover servers,<br /> the interestin hooks to log schema changes will decrease even more.<br /><br /> --<br /> marko<br /><br /> --<br /> Sent viapgsql-hackers mailing list (pgsql-hackers@postgresql.org)<br /> To make changes to your subscription:<br /><a href="http://www.postgresql.org/mailpref/pgsql-hackers">http://www.postgresql.org/mailpref/pgsql-hackers</a><br/><br /></span></font></blockquote><fontface="Verdana, Helvetica, Arial"><span style="font-size:12.0px"><br /><br /> -- <br />Robert Hodges, CTO, Continuent, Inc.<br /> Email: robert.hodges@continuent.com<br /> Mobile: +1-510-501-3728 Skype: hodgesrm<br /></span></font>
Marko Kreen wrote: > There is this tiny matter of replicating schema changes asynchronously, > but I suspect nobody actually cares. Few random points about that: > I'm not sure I follow you - the Sybase 'warm standby' replication of everything is really useful for business continuity. The per-table rep is more effective for publishing reference data, but is painful to maintain. Not having something that automagically reps a complete copy including DDL (except for temp tables) is a major weakness IMO. James