Re: Optimize ORDER BY ... LIMIT

Поиск
Список
Период
Сортировка
От Gregory Stark
Тема Re: Optimize ORDER BY ... LIMIT
Дата
Msg-id 87ac50q5xl.fsf@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Optimize ORDER BY ... LIMIT  (Martijn van Oosterhout <kleptog@svana.org>)
Список pgsql-hackers
Martijn van Oosterhout <kleptog@svana.org> writes:

> On Fri, Sep 15, 2006 at 05:30:27PM +0100, Gregory Stark wrote:
>> Also, because heap sort is slower than qsort (on average anyways) it makes
>> sense to not bother with the heap until the number of tuples grows well beyond
>> the limit or until it would otherwise spill to disk.
>
> The thought that comes to mind is to leave the sorting as is, but
> change the code that writes to the tapes to stop writing once it hits
> the limit. So each tape will never have more than N tuples, where N is
> the limit. This would be fairly unobtrusive because none of the other
> code actually needs to care.

I'm sorry, I forgot to mention that part of my analysis. Once you reach disk
any chance of optimising the limit case is pretty much out the window. You
could trim some tuples from each tape but unless you're sorting truly
stupendous amounts of data I doubt it would really help much.

I think it only makes sense to look at the in-memory case. Instead of qsorting
thousands of records or, worse, spilling millions of records to disk and doing
an external sort only to use only the top 10 and throw the rest away, we throw
tuples away before they accumulate in memory in the first place.

>> Alternatively we could have Limit(Sort()), Unique(Sort()), and
>> Limit(Unique(Sort())) be handled by new types of Sort nodes entirely and not
>> introduce the Limit and Unique nodes at all. 
>
> I don't think it's easy to merge Unique and Sort, mainly because the
> fields you're doing the Unique on are probably not the fields you're
> sorting on, so you're probably not saving much.

On the contrary I think the vast majority of the time you have a Unique(Sort)
it will be the same key because it will be caused by a SELECT DISTINCT. Now
that I actually test it I see there are more nodes that could do implement
this (namely GroupAgg) so I'm thinking more and more about having an abstract
way to pass information down through the nodes rather than handle just
Limit/Sort specifically.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: Release notes
Следующее
От: Bruce Momjian
Дата:
Сообщение: Release notes