Re: performance problem
От | Mike Mascari |
---|---|
Тема | Re: performance problem |
Дата | |
Msg-id | 3FBD2E6F.20405@mascari.com обсуждение исходный текст |
Ответ на | Re: performance problem ("Rick Gigger" <rick@alpinenetworking.com>) |
Список | pgsql-general |
Rick Gigger wrote: > Ok, adding the index back in worked the first time but then I tried > generating the database from scratch again, this time adding the index right > from the start. It added the 45000 rows in about a minute but then was > going dog slow on the updates again. So I did an explain and sure enough it > was not using the index. After some investigation I determined that it was > not using the index because when the transaction started there were only 4 > rows in the table so at that point it didn't want to use it. It apparently > doesn't gather analysis data fast enough to handle this kind of transaction. The statistics are collected as a result of an ANALYZE command: http://www.postgresql.org/docs/current/static/sql-analyze.html This does not happen automatically. EXPLAIN output will show a default assumption of 1000 rows, IIRC. > I worked around this by starting the transaction and inserting the 45,000 > rows and then killing it. The I removed the index and readded it which > apparently gathered some stats and since there were all of the dead tuples > in there from the failed transaction it now decided that it should use the > index. I reran the script and this time it took 5 minutes again instead of > 1 1/2 hours. If you examine the behavior of pg_dump output, you'll notice that it doesn't built indexes until after the COPY command has completed the data import. It's a waste of cpu cycles and disk bandwidth to update indexes on upon every insert. Your script should: 1) Drop all indexes on the relation 2) Use COPY if possible instead of INSERT to import the data 3) Recreate the indexes 4) Use UPDATE to update as necessary > I am using 7.2.4. Has this improved in later versions? I'm not concerened > since this is a very rare thing to need to do and it's obviously possible to > work around but it would be nice if postgres could figure things like that > out on it's own. (It certainly would have saved me a lot of time and > confusion last night at about 3 am). Is there a way to for the use of a > specific index on a query? You can force the use of an index scan by turning sequential scans to off: SET ENABLE_SEQSCAN TO OFF; But the problem is that the statistics didn't match the data. You could have: 1) Used INSERTs to insert the data into a relation with an index 2) Executed ANALYZE <foo> to update the statistics 3) Perform the UPDATE After the UPDATE, you'll still have dead tuples (the original rows) which require that they be marked as dead, and so you should occassionally run either VACUUM to mark them as such or VACUUM FULL to reclaim the dead space or VACUUM FULL ANALYZE to also update the relation's statistics. I would just execute the steps I outlined above with COPY and not worry about ANALYZEs and VACUUMs in a script. People often run VACUUM ANALYZE in a 1/day cron job and VACUUM FULL ANALYZE in a 1/week cron job. Then, of course, there's REINDEX... Hope that helps, Mike Mascari mascarm@mascari.com
В списке pgsql-general по дате отправления: