Обсуждение: get a list of table modifications in a day?
I need to generate a diff (or something similar) of a table, day by day. What is the best way to tack insert/update/delete operations? I have two ideas, and I'd like to hear your opinion: 1) pg_dump each day and run diff 2) modify some triggers we use and store the information in another table I am not aware of any functionality offered by postgresql. Does it exists? If not, which solution would you prefer?
Вложения
On Thu, Sep 13, 2007 at 09:59:30AM +0200, Ottavio Campana wrote: > 1) pg_dump each day and run diff it will become increasingly painful as the table size increases. > 2) modify some triggers we use and store the information in another table this is the best choice. you can use table_log extension to avoid writing your own triggers. http://ads.wars-nicht.de/blog/archives/100-Log-Table-Changes-in-PostgreSQL-with-tablelog.html depesz -- quicksil1er: "postgres is excellent, but like any DB it requires a highly paid DBA. here's my CV!" :) http://www.depesz.com/ - blog dla ciebie (i moje CV)
Hi
PgQ can be used this purpose. Idea is to have triggers on table that push events into queue and then on that queue you can do whatever suits you best. As we don't want to keep these logs online PgQ is most conenient as it efficiently removes them as soon as they are handled.
PgQ - table_dispatcher.py
Has url encoded events as data source and writes them into table on target database.
Used to partiton data. For example change log's that need to kept online only shortly can be written to daily tables and then dropped as they become irrelevant.
Also allows to select which columns have to be written into target database
Creates target tables according to configuration file as needed
PgQ - cube_dispatcher.py
Has url encoded events as data source and writes them into partitoned tables in target database. Logutriga is used to create events.
Used to provide batches of data for business intelligence and data cubes.
Only one instance of each record is stored. For example if record is created and then updated twice only latest version of record stays in that days table.
Does not support deletes (not that it is hard to support just we have no need for it).
PgQ - queue_archiver.py
Writes queue contents into file. Used for backing up queue contents for safety.
regards,
Asko
PgQ can be used this purpose. Idea is to have triggers on table that push events into queue and then on that queue you can do whatever suits you best. As we don't want to keep these logs online PgQ is most conenient as it efficiently removes them as soon as they are handled.
PgQ - table_dispatcher.py
Has url encoded events as data source and writes them into table on target database.
Used to partiton data. For example change log's that need to kept online only shortly can be written to daily tables and then dropped as they become irrelevant.
Also allows to select which columns have to be written into target database
Creates target tables according to configuration file as needed
PgQ - cube_dispatcher.py
Has url encoded events as data source and writes them into partitoned tables in target database. Logutriga is used to create events.
Used to provide batches of data for business intelligence and data cubes.
Only one instance of each record is stored. For example if record is created and then updated twice only latest version of record stays in that days table.
Does not support deletes (not that it is hard to support just we have no need for it).
PgQ - queue_archiver.py
Writes queue contents into file. Used for backing up queue contents for safety.
regards,
Asko
On 9/13/07, Ottavio Campana <ottavio@campana.vi.it> wrote:
I need to generate a diff (or something similar) of a table, day by day.
What is the best way to tack insert/update/delete operations? I have two
ideas, and I'd like to hear your opinion:
1) pg_dump each day and run diff
2) modify some triggers we use and store the information in another table
I am not aware of any functionality offered by postgresql. Does it exists?
If not, which solution would you prefer?
"Ottavio Campana" <ottavio@campana.vi.it> writes: > I need to generate a diff (or something similar) of a table, day by day. > What is the best way to tack insert/update/delete operations? I have two > ideas, and I'd like to hear your opinion: > > 1) pg_dump each day and run diff You can't use pg_dump directly as the rows are unordered. An update will remove the old row in one place and put the new row possibly in a completely different place. Some operations like CLUSTER or VACUUM FULL could move around rows which doesn't matter to SQL but would show up in a diff. You would have to COPY to a file a query which includes an ORDER BY. > 2) modify some triggers we use and store the information in another table > > I am not aware of any functionality offered by postgresql. Does it exists? Well alternatively you could do the same as 1) but do it in SQL. Something like CREATE TABLE copy_table AS (SELECT * FROM original_table); ... wait a day SELECT * FROM copy_table EXCEPT SELECT * FROM original_table It's not going to be fast though. Probably the triggers are the best option really. They give you more information than a diff in that they tell you when the change occurred, what user made the change, and if multiple changes to the same record occurred you get a record of each. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
hubert depesz lubaczewski ha scritto: > On Thu, Sep 13, 2007 at 09:59:30AM +0200, Ottavio Campana wrote: >> 1) pg_dump each day and run diff > > it will become increasingly painful as the table size increases. > >> 2) modify some triggers we use and store the information in another table > > this is the best choice. you can use table_log extension to avoid > writing your own triggers. > > http://ads.wars-nicht.de/blog/archives/100-Log-Table-Changes-in-PostgreSQL-with-tablelog.html since I already use triggers on that table, can I use table_log? I mean, can I have two triggers for the same event on the same table?
Вложения
am Thu, dem 13.09.2007, um 10:44:41 +0200 mailte Ottavio Campana folgendes: > > http://ads.wars-nicht.de/blog/archives/100-Log-Table-Changes-in-PostgreSQL-with-tablelog.html > > since I already use triggers on that table, can I use table_log? > > I mean, can I have two triggers for the same event on the same table? Yes. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net