Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)
Дата
Msg-id CA+TgmobKFcb6ajVEN8eSnBa78sB3FSwuEWTHXd2x9JC5DOh5OA@mail.gmail.com
обсуждение исходный текст
Ответ на Timsort performance, quicksort (was: Re: Memory usage during sorting)  (Peter Geoghegan <peter@2ndquadrant.com>)
Ответы Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)  (Greg Stark <stark@mit.edu>)
Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)  (Peter Geoghegan <peter@2ndquadrant.com>)
Список pgsql-hackers
On Wed, Apr 18, 2012 at 9:31 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> Thoughts?

Interesting work.  I thought about trying to code up timsort at one
point, but I've been running short of round tuits.

I did some quick tests of quicksort using half a million random
strings.  On my MacBook Pro, if the strings are in random order,
quicksort takes about 12 seconds.  If they are presorted, it takes
about 800 ms.  If they are in sorted order with an empty string
appended onto the end, it takes about 25 seconds.

If I modify gen_qsort_tuple.pl to perform the "presorted" input check
only when n > 40, the for the
presorted-except-the-last-element-is-small test drops from 25 seconds
to about 21.5 seconds, without apparently harming either of the other
two cases.  If I remove the check altogether, the time further drops
to about 13.5 seconds, the time to sort completely-ordered data rises
to about 10-10.5 seconds, and the time to sort randomly ordered data
still doesn't seem to change much.

Based on that, I'm inclined to propose rejiggering things so that the
presorted-input check runs only at the top level, and not during any
recursive steps.  The theory is that it won't cost much because if the
data is randomly ordered we won't do many comparisons before falling
out, and that seems to be true.  But the only point of doing it in the
recursive steps is to win when the data is partially ordered, and we
actually seem to be losing big-time in that case, perhaps because when
the data is partially ordered, the presort check will frequently to
run through a significant part of the array - wasting cycles - but
fall out before it actually gets to the end.

Of course, even if we did that, it might not be as good as your
timsort numbers, but that doesn't mean we shouldn't do it...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Usage of planner_ctx
Следующее
От: Robert Haas
Дата:
Сообщение: Re: New sync commit mode remote_write