Re: Sorting Improvements for 8.4

Поиск
Список
Период
Сортировка
От Brian Hurt
Тема Re: Sorting Improvements for 8.4
Дата
Msg-id 476A7F41.1030606@janestcapital.com
обсуждение исходный текст
Ответ на Re: Sorting Improvements for 8.4  ("Dann Corbit" <DCorbit@connx.com>)
Ответы Re: Sorting Improvements for 8.4  ("Dann Corbit" <DCorbit@connx.com>)
Re: Sorting Improvements for 8.4  (Brian Hurt <bhurt@janestcapital.com>)
Список pgsql-hackers
While we're blue skying things, I've had an idea for a sorting algorithm 
kicking around for a couple of years that might be interesting.  It's a 
variation on heapsort to make it significantly more block-friendly.  I 
have no idea if the idea would work, or how well it'd work, but it might 
be worthwhile kicking around.

Now, the core idea of heapsort is that the array is put into heap order- 
basically, that a[i] >= a[2i+1] and a[i] >= a[2i+2] (doing the 0-based 
array version here).  The problem is that, assuming that the length of a 
is larger than memory, then a[2i+1] is likely going to be on a different 
page or block than a[i].  That means every time you have to bubble down 
a new element, you end up reading O(log N) blocks- this is *per element*.

The variation is to instead work with blocks, so you have a block of 
entries b[i], and you change the definition of heap order, so that 
min(b[i]) >= max(b[2i+1]) and min(b[i]) >= max(b[2i+2]).  Also, during 
bubble down, you need to be carefull to only change the minimum value of 
one of the two child blocks b[2i+1] and b[2i+2].  Other than that, the 
algorithm works as normal.  The advantage of doing it this way is that 
while each bubble down still takes O(log N) blocks being touched, you 
get a entire block worth of results for your effort.  Make your blocks 
large enough (say, 1/4 the size of workmem) and you greatly reduce N, 
the number of blocks you have to deal with, and get much better I/O 
(when you're reading, you're reading megabytes at a shot).

Now, there are boatloads of complexities I'm glossing over here.  This 
is more of a sketch of the idea.  But it's something to consider.

Brian



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Greg Smith
Дата:
Сообщение: Re: Sorting Improvements for 8.4
Следующее
От: "Koichi Suzuki"
Дата:
Сообщение: Re: Benchmark for GiST index?