Re: Memory usage during sorting

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Memory usage during sorting
Дата
Msg-id CAEYLb_Uhwz0BRQR42GJvmcUtSxDW1WYOzD5fOzE-3b1jPrCGpQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Memory usage during sorting  (Greg Stark <stark@mit.edu>)
Ответы Re: Memory usage during sorting  (Peter Geoghegan <peter@2ndquadrant.com>)
Список pgsql-hackers
On 14 April 2012 13:32, Greg Stark <stark@mit.edu> wrote:
> On Fri, Apr 13, 2012 at 7:01 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
>> Well, timsort is specifically designed to take advantage of pre-sorted
>> data. It does appear to have a lot of traction, as wikipedia points
>> out:
>
> I hadn't heard of it. But reading up on it it does seem like a good
> fit for us. It trades some additional storage -- an array of pointers
> into the sort array where in our case the pointers would be much
> smaller than a whole SortTuple -- for fewer comparisons -- which in
> our case are often much slower than a simple integer comparison.

I wouldn't go so far as to suggest getting rid of quicksort, of
course. Quicksort is generally faster than other average case O(n log
n) algorithms in practice, for various reasons, principal among them
that it takes advantage of memory locality so well. I don't think that
it's a coincidence that timsort is popular in Python and Java land.
Even the Python C implementation has to sort integers through all that
PyObject reference indirection, I think. I'd now speculate that an
appropriate use of this algorithm might be to simply use it with types
that don't have a SortSupport function, that are largely passed by
reference, and have expensive comparisons. FWIW, I started playing
with adding timsort to Postgres last night:

https://github.com/Peter2ndQuadrant/postgres/tree/timsort

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: column name of pg_stat_replication.backend_start
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Patch: add timing of buffer I/O requests