On Mon, 28 Jul 2008, Faludi Gábor wrote:
> EXPLAIN ANALYZE SELECT DISTINCT letoltes.cid, count(letoltes.cid) AS
> elofordulas FROM letoltes GROUP BY cid ORDER BY elofordulas DESC LIMIT 5;
> QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=9255.05..9255.09 rows=5 width=4) (actual time=604.734..604.743 > rows=5 loops=1)
> -> Unique (cost=9255.05..9257.26 rows=294 width=4) (actual time=604.732..604.737 rows=5 loops=1)
> -> Sort (cost=9255.05..9255.79 rows=294 width=4) (actual time=604.730..604.732 rows=5 loops=1)
> Sort Key: count(cid), cid
> -> HashAggregate (cost=9242.26..9243.00 rows=294 width=4) (actual time=604.109..604.417 rows=373
loops=1)
> -> Seq Scan on letoltes (cost=0.00..6920.51 rows=464351 width=4) (actual time=0.022..281.413
rows=464351loops=1)
> Total runtime: 604.811 ms
So this query is doing a sequential scan of the letoltes table for each
query. You may get some improvement by creating an index on cid and
clustering on that index, but probably not much.
Moving to Postgres 8.3 will probably help a lot, as it will allow multiple
queries to use the same sequential scan in parallel. That's assuming the
entire table isn't in cache.
Another solution would be to create an additional table that contains the
results of this query, and keep it up to date using triggers on the
original table. Then query that table instead.
However, probably the best solution is to examine the problem and work out
if you can alter the application to make it avoid doing such an expensive
query so often. Perhaps it could cache the results.
Matthew
--
Psychotics are consistently inconsistent. The essence of sanity is to
be inconsistently inconsistent.