Re: Order by (for 15 rows) adds 30 seconds to query time

Поиск
Список
Период
Сортировка
От Richard Neill
Тема Re: Order by (for 15 rows) adds 30 seconds to query time
Дата
Msg-id 4B15FE63.10800@cam.ac.uk
обсуждение исходный текст
Ответ на Re: Order by (for 15 rows) adds 30 seconds to query time  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Список pgsql-performance
Dear Kevin,

Thanks for a very helpful reply.

Kevin Grittner wrote:
> Richard Neill <rn214@cam.ac.uk> wrote:
>
>> Am I wrong in thinking that ORDER BY is always applied after the
>> main query is run?
>
> Yes, you are wrong to think that.  It compares the costs of various
> plans, and when it has an index with the high order portion matching
> your ORDER BY clause, it may think that it can scan that index to
> generate the correct sequence.  If the sort is estimated to be
> expensive enough compared to the index scan, it will use the index
> scan and skip the sort.  Sorting hundreds of billions of rows can be
> expensive.
>

That makes sense now.



>> Even if I run it this way:
>>
>> select * from (select * from h.inventory where demand_id
>> =289276563) as sqry order by id;
>>
>> which should(?) surely force it to run the first select, then
>> sort,
>
> I wouldn't necessarily assume that.  You can EXPLAIN that form of
> the query and find out easily enough.  Does it say:
>
>                             ->  Index Scan using inventory_demand_id on
> inventory  (cost=0.00..22.36 rows=28 width=56) (actual time=0.025..0.053
> rows=15 loops=1)
>                                   Index Cond: (demand_id = 289276563)
>
> or:
>
>                             ->  Index Scan using inventory_pkey on
> inventory  (cost=0.00..879728.20 rows=28 width=56) (actual
> time=31738.956..32860.738 rows=15 loops=1)
>                                   Filter: (demand_id = 289276563)
>
>> it's quick if I do order by id+1
>
> You don't have an index on id+1.
>

Your explanation is validated by the explain - it only does the sort
last iff I use "order by id+1", where there is no index for that.

[Aside: using "id+0" also forces a sort.]


>
> The real problem to solve here is that it's estimating the rows
> count for the result so badly.  If you need a short-term
> work-around, you've already discovered that you can keep it from
> using the index on id for ordering by creating an expression using
> id which causes it not to consider the index a match.  That's kind
> of ugly to keep long term, though.
>

We seem to have a general case of very bad query plans, where in other
cases, explain analyze shows that the query-planner's guesses are miles
adrift.

Others have said that this is symptomatic of a lack of doing analyze,
however we are doing quite a lot of analyzing (both through autovacuum,
and a manual "vacuum verbose analyze" every night). Our underlying
statistical distribution isn't that changeable.

Thanks,

Richard









В списке pgsql-performance по дате отправления:

Предыдущее
От: Greg Smith
Дата:
Сообщение: Re: RAID card recommendation
Следующее
От: Matthew Wakeling
Дата:
Сообщение: Re: Order by (for 15 rows) adds 30 seconds to query time