Re: [HACKERS] The case for removing replacement selection sort

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: [HACKERS] The case for removing replacement selection sort
Дата	11 сентября 2017 г. 03:07:36
Msg-id	131a54e5-c8ea-89e7-6f95-445d07a24972@2ndquadrant.com обсуждение исходный текст
Ответ на	Re: [HACKERS] The case for removing replacement selection sort (Peter Geoghegan <pg@bowt.ie>)
Ответы	Re: [HACKERS] The case for removing replacement selection sort
Список	pgsql-hackers

Дерево обсуждения

On 09/11/2017 01:03 AM, Peter Geoghegan wrote:
> On Sat, Sep 9, 2017 at 8:28 AM, Greg Stark <stark@mit.edu> wrote:
>> This may be a bit "how long is a piece of string" but how do those two
>> compare with string sorting in an interesting encoding/locale -- say
>> /usr/share/dict/polish in pl_PL for example. It's certainly true that
>> people do sort text as well as numbers.
> 
> I haven't looked at text, because I presume that it's very rare for 
> tuples within a table to be more or less ordered by a text
> attribute. While the traditional big advantage of replacement
> selection has always been its ability to double initial run length on
> average, where text performance is quite interesting because
> localized clustering still happens, that doesn't really seem relevant
> here. The remaining use of replacement selection is expressly only
> about the "single run, no merge" best case, and in particular about
> avoiding regressions when upgrading from versions prior to 9.6 (9.6
> is the version where we began to generally prefer quicksort).
> 
>> Also, people often sort on keys of more than one column....
> 
> Very true. I should test this.
> 

I'm currently re-running the benchmarks we did in 2016 for 9.6, but
those are all sorts with a single column (see the attached script). But
it'd be good to add a few queries testing sorts with multiple keys. We
can either tweak some of the existing data sets + queries, or come up
with entirely new tests.

The current tests construct data sets with different features (unique or
low/high-cardinality, random/sorted/almost-sorted). How should that work
for multi-key sorts? Same features for all columns, or some mix?

For the existing queries, I should have some initial results tomorrow,
at least for the data sets with 100k and 1M rows. The tests with 10M
rows will take much more time (it takes 1-2hours for a single work_mem
value, and we're testing 6 of them).

But perhaps we can reduce the number of tests somehow, only testing the
largest/smallest work_mem values? So that we could get some numbers now,
possibly running the complete test suite later.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

sort-bench.sh

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] The case for removing replacement selection sort

Вложения