Re: [HACKERS] database replication
От | Ryan Kirkpatrick |
---|---|
Тема | Re: [HACKERS] database replication |
Дата | |
Msg-id | Pine.LNX.4.10.9912251433310.1551-100000@excelsior.rkirkpat.net обсуждение исходный текст |
Ответ на | database replication (DWalker@black-oak.com) |
Список | pgsql-hackers |
On Fri, 24 Dec 1999 DWalker@black-oak.com wrote: > I've been toying with the idea of implementing database replication > for the last few days. I too have been thinking about this some over the last year or two, just trying to find a quick and easy way to do it. I am not so interested in replication, as in synchronization, as in between a desktop machine and a laptop, so I can keep the databases on each in sync with each other. For this sort of purpose, both the local and remote databases would be "idle" at the time of syncing. > 2) The replication system will need to add at least one field to each > table in each database that needs to be replicated. This field will be > a date/time stamp which identifies the "last update" of the record. > This field will be called PGR_TIME for lack of a better name. > Because this field will be used from within programs and triggers it > can be longer so as to not mistake it for a user field. How about a single, seperate table with the fields of 'database', 'tablename', 'oid', 'last_changed', that would store the same data as your PGR_TIME field. It would be seperated from the actually data tables, and therefore would be totally transparent to any database interface applications. The 'oid' field would hold each row's OID, a nice, unique identification number for the row, while the other fields would tell which table and database the oid is in. Then this table can be compared with the this table on a remote machine to quickly find updates and changes, then each differences can be dealt with in turn. > 3) For each table to be replicated the replication system will > programatically add one plpgsql function and trigger to modify the > PGR_TIME field on both UPDATEs and INSERTs. The name of this function > and trigger will be along the lines of > <table_name>_replication_update_trigger and > <table_name>_replication_update_function. The function is a simple > two-line chunk of code to set the field PGR_TIME equal to NOW. The > trigger is called before each insert/update. When looking at the Docs > I see that times are stored in Zulu (GT) time. Because of this I > don't have to worry about time zones and the like. I need direction > on this part (such as "hey dummy, look at page N of file X."). I like this idea, better than any I have come up with yet. Though, how are you going to handle DELETEs? > 6) To handle replication I basically check the local "last replication > time" and compare it against the remote PGR_TIME fields. If the > remote PGR_TIME is greater than the last replication time then change > the local copy of the database, otherwise, change the remote end of > the database. At this point I don't have a way to know WHICH field > changed between the two replicas so either I do ROW level replication > or I check each field. I check PGR_TIME to determine which field is > the most current. Some fine tuning of this process will have to occur > no doubt. Yea, this is indeed the sticky part, and would indeed require some fine-tunning. Basically, the way I see it, is if the two timestamps for a single row do not match (or even if the row and therefore timestamp is missing on one side or the other altogether):local ts > remote ts => Local row is exported to remote.remote ts > local ts=> Remote row is exported to local.local ts > last sync time && no remote ts => Local row is inserted on remote.localts < last sync time && no remote ts => Local row is deleted.remote ts > last sync time && no local ts => Remote row is inserted on local.remote ts < last sync time && no local ts => Remote row is deleted. where the synchronization process is running on the local machine. By exported, I mean the local values are sent to the remote machine, and the row on that remote machine is updated to the local values. How does this sound? > 7) The commandline utility, fired off by something like cron, could > run several times during the day -- command line parameters can be > implemented to say PUSH ALL CHANGES TO SERVER A, or PULL ALL CHANGES > FROM SERVER B. Or run manually for my purposes. Also, maybe follow it with a vacuum run on both sides for all databases, as this is going to potenitally cause lots of table changes that could stand with a cleanup. > 1) How far do I go with this? Do I start manhandling the system catalogs (pg_* tables)? Initially, I would just stick to user table data... If you have changes in triggers and other meta-data/executable code, you are going to want to make syncs of that stuff manually anyway. At least I would want to. > 2) As to #2 and #3 above, I really don't like tools automagically > changing my tables but at this point I don't see a way around it. I > guess this is where the testing comes into play. Hence the reason for the seperate table with just a row's identification and last update time. Only modifications to the synced database is the update trigger, which should be pretty harmless. > 3) Security: the replication app will have to have pretty good rights > to the database so it can add the nessecary functions and triggers, > modify table schema, etc. Just run the sync program as the postgres super user, and there are no problems. :) > So, any "you're insane and should run home to momma" comments? No, not at all. Though it probably should be remaned from replication to synchronization. The former is usually associated with a continuous stream of updates between the local and remote databases, so they are almost always in sync, and have a queuing ability if their connection is loss for span of time as well. Very complex and difficult to implement, and would require hacking server code. :( Something only Sybase and Oracle have (as far as I know), and from what I have seen of Sybase's replication server support (dated by 5yrs) it was a pain to setup and get running correctly.The latter, synchronization, is much more managable, and can still be useful, especially when you have a large database you want in two places, mainly for read only purposes at one end or the other, but don't want to waste the time/bandwidth to move and load the entire database each time it changes on one end or the other. Same idea as mirroring software for FTP sites, just transfers the changes, and nothing more.I also like the idea of using Python. I have been using it recently for some database interfaces (to PostgreSQL of course :), and it is a very nice language to work with. Some worries about performance of the program though, as python is only an interpreted lanuage, and I have yet to really be impressed with the speed of execution of my database interfaces yet.Anyway, it sound like a good project, and finally one where I actually have a clue of what is going on, and the skills to help. So, if you are interested in pursing this project, I would be more than glad to help. TTYL. --------------------------------------------------------------------------- | "For to me to live is Christ, and to die is gain." | | --- Philippians 1:21 (KJV) | --------------------------------------------------------------------------- | Ryan Kirkpatrick | Boulder, Colorado | http://www.rkirkpat.net/ | ---------------------------------------------------------------------------
В списке pgsql-hackers по дате отправления: