Re: has anyone looked at burstsort ?

Поиск
Список
Период
Сортировка
От Gregory Stark
Тема Re: has anyone looked at burstsort ?
Дата
Msg-id 87d4yw9y2r.fsf@oxford.xeocode.com
обсуждение исходный текст
Ответ на has anyone looked at burstsort ?  (Hannu Krosing <hannu@skype.net>)
Ответы Re: has anyone looked at burstsort ?
Re: has anyone looked at burstsort ?
Список pgsql-hackers
"Hannu Krosing" <hannu@skype.net> writes:

> has anyone looked at burstsort
> https://sourceforge.net/projects/burstsort
>
> they claim that "Copy-Burstsort is a sorting algorithm for strings that
> is cache-efficient. Burstsort and its variants are much faster than
> Quicksort and Radixsort especially on large datasets. Copy-Burstsort
> works best for sorting short strings such as genomes and words"
>
> if the speed claim is true, and there are no other bad effects, like for
> example very bad memory use,  we could try to talk the author into
> allowing us to include it under BSD licens (currently it is GPL)

The actual implementation isn't very interesting. It's C++ code written for
Visual C++ and it's pretty primitive, has no comments, and only supports
sorting ASCII strings containing only 26 letters...

In any case we can't make use of it for sorting strings unless we want to
special-case text/varchar in the tuplesort code. That might be something to
consider at some point but right now there's still a lot to do with the
generic tuplesort that requires only a comparison operator.

On the other hand it does seem like it might be possible to adapt this
algorithm to sorting multi-column keys. 

The key to the algorithm is that it uses a trie to bin rows with common
leading prefixes together. This avoids performing redundant comparisons
between those columns later.

If you have long keys with each column in the key being relatively low
cardinality you could get some mileage out of doing something like this
algorithm does on a column-by-column basis rather than on a
character-by-character basis.

But can you see any way to adapt this to a disk-sort?

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: has anyone looked at burstsort ?
Следующее
От: Tom Lane
Дата:
Сообщение: Re: has anyone looked at burstsort ?