Subject: Re: [PERFORM] Performance problems with DISTINCT ON
> Sgarbossa Domenico wrote:
>> I need to retrieve the most recent prices per products from a price list
>> table:
>
>> select distinct on (articolo) articolo,data_ent,prezzo from
>> listini_anagrafici order by articolo, data_ent desc
>>
>> but it seems that this query runs slowly... about 5/6 seconds.
>> the table contains more or less 500K records, PostgreSQL version is
>> 8.1.11 and the server has 4gb of RAM entirely dedicate to the db.
>
>> 'Unique (cost=73893.89..76551.25 rows=88312 width=24) (actual
>> time=4022.578..5076.206 rows=193820 loops=1)'
>> ' -> Sort (cost=73893.89..75222.57 rows=531472 width=24) (actual
>> time=4022.574..4505.538 rows=531472 loops=1)'
>> ' Sort Key: articolo, data_ent'
>> ' -> Seq Scan on listini_anagrafici (cost=0.00..16603.72
>> rows=531472 width=24) (actual time=0.009..671.797 rows=531472 loops=1)'
>> 'Total runtime: 5217.452 ms'
>
> You've got 531472 rows in the table and the query is going to output
> 193820 of them. Scanning the whole table is almost certainly the way to
> go.
>
> If the table doesn't change much, you could try running a CLUSTER on the
> index you've created. That will lock the table while it re-orders the
> physical layout of the rows based on your index though, so it's no good
> if the table is updated much.
>
> Failing that, you could try issuing "set work_mem = ..." before the
> query with increasing sizes for work_mem. That might make the sort
> faster too.
>
Thank you for the answer,
I've tried as you suggest but the only things that seems make some
differences is the work_mem parameter
This helps to reduce the amount of time about for the half (3 seconds) but
unfortunately this ain't enough.
If there are a lot of concurrent request I think it could made the data
swap to the disk.
Should I try a different approach to solve this issue?