Обсуждение: Minimizing dead tuples caused by update triggers
I've been fighting with the common workarounds for inadequate response times on select count(*) and min(),max() on tables with tens of millions of rows for quite a while now and understand the reasons for the table scans. I have applications that regularly poll a table ( ideally, the more frequent, the better ) to learn the most recent data inside it as well as the # of rows in it ( among a few other things ). As the databases have grown in size, these summarizations could no longer be done on the fly, so I wrote a database wrapper API that tracks those values internally. This wrapper has grown very complex and is difficult to manage across different systems. What I'd like to do instead is implement triggers for insert, updates, and deletes to check and/or replace a value in a "table_stats", representing table count, min/max dates, and a few other costly operations.. that can then be queried in short order. I know this is a fairly common thing to do. The thing that concerns me is dead tuples on the table_stats table. I believe that every insert of new data in one of the monitored tables will result in an UPDATE of the table_stats table. When thousands ( or millions ) of rows are inserted, the select performance ( even trying with an index ) on table_stats slows down in a hurry. If I wrap the inserts into large transactions, will it only call the update on table_states when I commit? Obviously I want to vacuum this table regularly to recover this. The problem I'm running into is contention between VACUUM ( not full ) and pg_dump ( version 8.0.12 ). My system backups takes 6 hours to run pg_dump on a 400GB cluster directory. If the vacuum command fires during the dump, it forces an exclusive lock and any queries will hang until pg_dump finishes. If I have to wait until pg_dump is finished before issuing the VACUUM command, everything slows down significantly as the dead tuples in table_stats pile up. What strategy could I employ to either: 1. resolve the contention between pg_dump and vacuum, or 2. reduce the dead tuple pile up between vacuums Thanks for reading -Dan
Dan Harris <fbsd@drivefaster.net> writes: > The thing that concerns me is dead tuples on the table_stats table. I > believe that every insert of new data in one of the monitored tables > will result in an UPDATE of the table_stats table. When thousands > ( or millions ) of rows are inserted, the select performance ( even > trying with an index ) on table_stats slows down in a hurry. Yup. FWIW, 8.3's "HOT" tuple updates might help this quite a lot. Not a lot to be done about it in 8.0.x though :-( regards, tom lane
On Dec 19, 2007, at 6:39 PM, Tom Lane wrote: >> The thing that concerns me is dead tuples on the table_stats >> table. I >> believe that every insert of new data in one of the monitored tables >> will result in an UPDATE of the table_stats table. When thousands >> ( or millions ) of rows are inserted, the select performance ( even >> trying with an index ) on table_stats slows down in a hurry. > > Yup. FWIW, 8.3's "HOT" tuple updates might help this quite a lot. > Not a lot to be done about it in 8.0.x though :-( A work-around others have used is to have the trigger just insert into a 'staging' table and then periodically take the records from that table and summarize them somewhere else. You still have a vacuum concern on the staging table, but the advantage is that you trigger path is a simple insert instead of an update, which is effectively a delete and an insert. This is a case where a cron'd vacuum that runs once a minute is probably a wise idea. -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
Вложения
Le jeudi 20 décembre 2007, Decibel! a écrit : > A work-around others have used is to have the trigger just insert > into a 'staging' table and then periodically take the records from > that table and summarize them somewhere else. And you can even use the PgQ skytools implementation to easily have this kind of 'staging'-table with a producer and one or many subscribers. See those references if you're interrested: http://kaiv.wordpress.com/2007/10/19/skytools-database-scripting-framework-pgq/ http://skytools.projects.postgresql.org/doc/pgq-sql.html http://skytools.projects.postgresql.org/doc/pgq-admin.html http://skytools.projects.postgresql.org/doc/pgq-nodupes.html Hope this helps, regards, -- dim