Re: Timsort performance, quicksort (was: Re: Memory usage during sorting)
| От | Robert Haas | 
|---|---|
| Тема | Re: Timsort performance, quicksort (was: Re: Memory usage during sorting) | 
| Дата | |
| Msg-id | CA+TgmobKFcb6ajVEN8eSnBa78sB3FSwuEWTHXd2x9JC5DOh5OA@mail.gmail.com обсуждение исходный текст | 
| Ответ на | Timsort performance, quicksort (was: Re: Memory usage during sorting) (Peter Geoghegan <peter@2ndquadrant.com>) | 
| Ответы | Re: Timsort performance, quicksort (was: Re: Memory usage
 during sorting) Re: Timsort performance, quicksort (was: Re: Memory usage during sorting) | 
| Список | pgsql-hackers | 
On Wed, Apr 18, 2012 at 9:31 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > Thoughts? Interesting work. I thought about trying to code up timsort at one point, but I've been running short of round tuits. I did some quick tests of quicksort using half a million random strings. On my MacBook Pro, if the strings are in random order, quicksort takes about 12 seconds. If they are presorted, it takes about 800 ms. If they are in sorted order with an empty string appended onto the end, it takes about 25 seconds. If I modify gen_qsort_tuple.pl to perform the "presorted" input check only when n > 40, the for the presorted-except-the-last-element-is-small test drops from 25 seconds to about 21.5 seconds, without apparently harming either of the other two cases. If I remove the check altogether, the time further drops to about 13.5 seconds, the time to sort completely-ordered data rises to about 10-10.5 seconds, and the time to sort randomly ordered data still doesn't seem to change much. Based on that, I'm inclined to propose rejiggering things so that the presorted-input check runs only at the top level, and not during any recursive steps. The theory is that it won't cost much because if the data is randomly ordered we won't do many comparisons before falling out, and that seems to be true. But the only point of doing it in the recursive steps is to win when the data is partially ordered, and we actually seem to be losing big-time in that case, perhaps because when the data is partially ordered, the presort check will frequently to run through a significant part of the array - wasting cycles - but fall out before it actually gets to the end. Of course, even if we did that, it might not be as good as your timsort numbers, but that doesn't mean we shouldn't do it... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: