Re: replication/redundancy
От | Jonathan Gardner |
---|---|
Тема | Re: replication/redundancy |
Дата | |
Msg-id | 200307010822.39288.jgardner@jonathangardner.net обсуждение исходный текст |
Ответ на | Re: replication/redundancy (weigelt@metux.de) |
Список | pgsql-admin |
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Monday 30 June 2003 09:17, weigelt@metux.de wrote: > On Mon, Jun 30, 2003 at 08:31:09AM -0700, Jonathan Gardner wrote: > > * currently only an explicit sync-out is supported - from time to time > evry table has to be scanned for new records So you are using "lazy" rather than "eager" replication. I am sure you know the limitations for lazy replication. Let me enumerate them here for those of you who aren't familiar with this: 1) The data is not consistent. This means if you run the same select query at the same time on the two databases, you may get different results. For some situations, that is okay (like Usenet). For others, it is not. (like registrations -- you'll sign up on one database, but you won't appear on the other.) 2) The "other" process that does the synchronization is serial in nature. The processes that change the database are parallel in nature. It is very possible to have changes happening to the database faster than you can replicate them. This was a real problem at a web company I recently worked for that used lazy replication. Their backup database fell weeks behind the live database. It almost got to the point where recreating the entire database would've been faster than waiting for the replication process to catch up. 3) These two factors above make using the second database as a hot-swappable backup risky at best. You will lose some data when you switch to the backup, unless changes to the database are so rare that the backup is usually up to date. If that were the case, you probably don't need the backup in the first place, because databases that don't do much tend not to be very important. > > * currently no real conflict handling > What he is talking about here is what happens when two seperate processes are working on the same rows. PostgreSQL uses transactions and locking right now, so two processes on the same system cannot do this. However, his system cannot handle this at all when the two processes are on seperate machines. The most obvious problem with this comes from incrementing a column. If both processes try to increment the same column, then they will end up with the column incremented by one or the other, but not both. This would be bad for things like paypal, where your account would only increase by one or the other account transfers, rather than both, if two occured at the same time. > > perhaps we can improve this a little bit. > I would hope you spend some time researching what others have done. Relational databases are an area that a tremendous amount of solid research has already occured. Applying yourself to understand the research and projects that have gone before you will save yourself a lot of time replicating their work. In other words, "If I have seen farther, it is because I have stood on the shoulders of giants" to (mis?)quote Newton. Again, to re-emphasize why pgreplication is so cool and why everyone should be excited about this: 1) Database theory says that scaleable, eager replication is impossible. This is true in practice. 2) The Postgres-R team discovered a way to make scaleable, eager replication work. The restriction is that locks, once granted, may be aborted or revoked. 3) This means you will one day be able to setup a beowulf-type cluster of postgres databases that will rival the most powerful databases on earth today. - -- Jonathan Gardner <jgardner@jonathangardner.net> (was jgardn@alumni.washington.edu) Live Free, Use Linux! -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE/Aac+WgwF3QvpWNwRAgFxAJ9Mxesnc6Q3wLrUcL1Zz62AGLLjGACcCYJp zcV9rFm8TiqH90N6eSpRQnY= =/bFm -----END PGP SIGNATURE-----
В списке pgsql-admin по дате отправления: