Обсуждение: Is Vacuum/analyze destroying my performance?
I have always been frustrated by the wildly erratic performance of our postgresql 8 server. We run aprogram that does heavy data importing via a heuristics-based import program. Sometime records being imported would just fly by, sometimes they would crawl. The import program imports records from a flat table and uses heuristics to normalise and dedupe. This is done via a sequence of updates and inserts bracketed by a start-end transaction. At a certain checkpoint representing about 1,000,000 rows read and imported, I ran a vacuum/analyze on all of the tables in the target schema. To my horror, performance reduced to less than TEN percent of what it was befor the vacuum/analyze. I thought that turning autovacuum off and doing my own vacuuming would improve performance, but it seems to be killing it. I have since turned autovacuum on and am tearing my hair out wathcing the imported records crawl by. I have tried vacuuming the entire DB as well as rebuilding indexes. Nothing. Any ideas what could have happened? What is the right thing to do? Carlo
Update on this issue, I "solved" my problem by doing the following: 1) Stopped the import, and did a checkpoint backup on my import target schema 2) Dropped the import target schema 3) Restored a backup from a previous checkpoint when the tables were much smaller 4) Performed a VACUUM/ANALYZE on all of the tables in the import target schema in that smaller state 5) Dropped the import target schema again 6) Restored the checkpoint backup of the larger data set referred to in step 1 7) Rstarted the import from where it left off The result: the import is flying again, with 10-20 times the performance. The import runs as 4 different TCL scripts in parallel, importing difernt segments of the table. The problem that I have when the import runs at this speed is that I hve to constantly watch for lock-ups. Previously I had reported that when these multiple processes are running at high speed, PostgreSQL occasionally freezes one or more of the processes by never retutning from a COMMIT. I look at the target tables, and it seems that the commit has gone through. This used to be a disaster because Ithought I had to restart every frozen proess by killing the script and restarting at the last imported row. Now I have found a way to un-freeze the program: I find the frozen process via PgAdmin III and send a CANCEL. To my surprise, the import continues as i nothing happened. Still incredibly inconvenient and laborious, but at least it's a little less tedious. Could these two problems - the weird slowdowns after a VACUUM/ANALYZE and the frequent lockups when the import process is running quickly - be related? Carlo
Just a wild guess, but the performance problem sounds like maybe as your data changes, eventually the planner moves some query from an index scan to a sequential scan, do you have any details on what queries are taking so long when things are running slow? You can turn on the GUC var "log_min_duration_statement" and see what queries are slow and then manually check them with an explain analyze, that might help. Matt Carlo Stonebanks wrote: > Update on this issue, I "solved" my problem by doing the following: > > 1) Stopped the import, and did a checkpoint backup on my import target > schema > 2) Dropped the import target schema > 3) Restored a backup from a previous checkpoint when the tables were much > smaller > 4) Performed a VACUUM/ANALYZE on all of the tables in the import target > schema in that smaller state > 5) Dropped the import target schema again > 6) Restored the checkpoint backup of the larger data set referred to in step > 1 > 7) Rstarted the import from where it left off > > The result: the import is flying again, with 10-20 times the performance. > The import runs as 4 different TCL scripts in parallel, importing difernt > segments of the table. The problem that I have when the import runs at this > speed is that I hve to constantly watch for lock-ups. Previously I had > reported that when these multiple processes are running at high speed, > PostgreSQL occasionally freezes one or more of the processes by never > retutning from a COMMIT. I look at the target tables, and it seems that the > commit has gone through. > > This used to be a disaster because Ithought I had to restart every frozen > proess by killing the script and restarting at the last imported row. > > Now I have found a way to un-freeze the program: I find the frozen process > via PgAdmin III and send a CANCEL. To my surprise, the import continues as i > nothing happened. Still incredibly inconvenient and laborious, but at least > it's a little less tedious. > > Could these two problems - the weird slowdowns after a VACUUM/ANALYZE and > the frequent lockups when the import process is running quickly - be > related? > > Carlo > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
""Matthew O'Connor"" <matthew@zeut.net> wrote in message news:45743240.7050302@zeut.net... > Just a wild guess, but the performance problem sounds like maybe as your > data changes, eventually the planner moves some query from an index scan > to a sequential scan, do you have any details on what queries are taking > so long when things are running slow? You can turn on the GUC var > "log_min_duration_statement" and see what queries are slow and then > manually check them with an explain analyze, that might help. > > Matt This is pretty well what I think is happening - I expect all queries to eventually move from seq scans to index scans. I actually have a SQL logging opion built into the import app. I just can't figure out how the planner can be so wrong. We are running a 4 CPU server (two dual core 3.2 GHz Xeons) with 4GB RAM and Windows Server 2003 x64 and a PERC RAID subsystem (I don't know the RAID type). I know that the metrics for the planner can be changed - is the default config for postgesql not suitable for our setup? For this server, we would like to be optimised for high speed over a few connections, rather than the classic balanced speed over many connections.
Carlo Stonebanks wrote: >> Just a wild guess, but the performance problem sounds like maybe as your >> data changes, eventually the planner moves some query from an index scan >> to a sequential scan, do you have any details on what queries are taking >> so long when things are running slow? You can turn on the GUC var >> "log_min_duration_statement" and see what queries are slow and then >> manually check them with an explain analyze, that might help. >> > This is pretty well what I think is happening - I expect all queries to > eventually move from seq scans to index scans. I actually have a SQL logging > opion built into the import app. > > I just can't figure out how the planner can be so wrong. We are running a 4 > CPU server (two dual core 3.2 GHz Xeons) with 4GB RAM and Windows Server > 2003 x64 and a PERC RAID subsystem (I don't know the RAID type). I know that > the metrics for the planner can be changed - is the default config for > postgesql not suitable for our setup? For this server, we would like to be > optimised for high speed over a few connections, rather than the classic > balanced speed over many connections. If it is the planner choosing a very bad plan, then I don't think your hardware has anything to do with it. And, we can't diagnose why the planner is doing what it's doing without a lot more detail. I suggest you do something to figure out what queries are taking so long, then send us an explain analyze, that might shine some light on the subject.