Hi Guys,
Now for one of my REALLY rare posts.
Having done a little bit of distributed data systems, I figured I'd
pitch in a couple cents worth.
> 2) The replication system will need to add at least one field to each
> table in each database that needs to be re plicated. This
> field will be a date/time stamp which identifies the " last
> update" of the record. This field will be called PGR_TIME
> for la ck of a better name. Because this field will be used
> from within programs and triggers it can be longer so as to not
> mistake it for a user field.
I just started reading this thread, but I figured I'd throw in a couple
suggestions for distributed data control (a few idioms I've had to
deal with b4):- Never use time (not reliable from system to system). Use a version number of some sort that can stay
consistentacross all replicas
This way, if a system's time is or goes out of wack, it doesn't cause your database to disintegrate, and it's easier
totrack conflicts (see below. If using time, the algorithm gets nightmarish)
- On an insert, set to version 1
- On an update, version++
- On a delete, mark deleted, and add a delete stub somewhere for the replicator process to deal with in sync'ing the
databases.
- If two records have the same version but different data, there's a conflict. A few choices: 1. Pick one as
thecorrect one (yuck!! invisible data loss) 2. Store both copies, pick one as current, and alert database
ownerof the conflict, so they can deal with it "manually." 3. If possible, some conflicts can be merged. If
adisjoint set of fields were changed in each instance, these changes may both be applied and the record
merged. (Problem: takes a lot more space. Requires a version number for every field, or persistent
storageof some old records. However, this might help the "which fields changed" issue you were talking
aboutin #6)
- A unique id across all systems should exist (or something that effectively simulates a unique id. Maybe a
compositionof the originating oid (from the insert) and the originating database (oid of the database's record?)
mightdo it. Store this as an extra field in every record. (Two extra fieldss so far: 'unique id' and 'version')
I do like your approach: triggers and a separate process. (Maintainable!! :)
Anyway, just figured I'd throw in a few suggestions,
Duane