Re: Using quicksort for every external sort run

Поиск
Список
Период
Сортировка
От Ants Aasma
Тема Re: Using quicksort for every external sort run
Дата
Msg-id CA+CSw_t0nss16bwp-DzRyUy9e847hQCZVV+j0o+MNcrpLJnVDw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Using quicksort for every external sort run  (Greg Stark <stark@mit.edu>)
Ответы Re: Using quicksort for every external sort run  (Greg Stark <stark@mit.edu>)
Список pgsql-hackers
On Sat, Dec 12, 2015 at 12:41 AM, Greg Stark <stark@mit.edu> wrote:
> On Wed, Dec 9, 2015 at 2:44 AM, Peter Geoghegan <pg@heroku.com> wrote:
>> On Tue, Dec 8, 2015 at 6:39 PM, Greg Stark <stark@mit.edu> wrote:
>>
>> I guess you mean insertion sort. What's the theoretical justification
>> for the change?
>
> Well my thinking was that hard coding a series of comparisons would be
> faster than a loop doing a O(n^2) algorithm even for small constants.
> And sort networks are perfect for hard coded sorts because they do the
> same comparisons regardless of the results of previous comparisons so
> there are no branches. And even better the comparisons are as much as
> possible independent of each other -- sort networks are typically
> measured by the depth which assumes any comparisons between disjoint
> pairs can be done in parallel. Even if it's implemented in serial the
> processor is probably parallelizing some of the work.
>
> So I implemented a quick benchmark outside Postgres based on sorting
> actual SortTuples with datum1 defined to be random 64-bit integers (no
> nulls). Indeed the sort networks perform faster on average despite
> doing more comparisons. That makes me think the cpu is indeed doing
> some of the work in parallel.

The open coded version you shared bloats the code by 37kB, I'm not
sure it is pulling it's weight, especially given relatively heavy
comparators. A quick index creation test on int4's profiled with perf
shows about 3% of CPU being spent in the code being replaced. Any
improvement on that is going to be too small to easily quantify.

As the open coding doesn't help with eliminating control flow
dependencies, so my idea is to encode the sort network comparison
order in an array and use that to drive a simple loop. The code size
would be pretty similar to insertion sort and the loop overhead should
mostly be hidden by the CPU OoO machinery. Probably won't help much,
but would be interesting and simple enough to try out. Can you share
you code for the benchmark so I can try it out?

Regards,
Ants Aasma



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jaime Casanova
Дата:
Сообщение: Re: Disabling an index temporarily
Следующее
От: Greg Stark
Дата:
Сообщение: Re: Using quicksort for every external sort run