Re: [HACKERS] [PATCH] Incremental sort

Поиск

Список

Период

Сортировка

От	Alexander Korotkov
Тема	Re: [HACKERS] [PATCH] Incremental sort
Дата	2 октября 2017 г. 22:37:03
Msg-id	CAPpHfduS7SBC_J0frWX1tToZkKyyAjLVDQH1wcS7Rbvjg6nV-g@mail.gmail.com обсуждение исходный текст
Ответ на	Re: [HACKERS] [PATCH] Incremental sort (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Ответы	Re: [HACKERS] [PATCH] Incremental sort (Robert Haas <robertmhaas@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Sat, Sep 30, 2017 at 11:20 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:

On Sat, Sep 16, 2017 at 2:46 AM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Thu, Sep 14, 2017 at 2:48 AM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
Patch rebased to current master is attached. I'm going to improve my testing script and post new results.

New benchmarking script and results are attached. There new dataset parameter is introduced: skew factor. Skew factor defines skew in distribution of groups sizes.
My idea of generating is just usage of power function where power is from 0 to 1. Following formula is used to get group number for particular item number i.
[((i / number_of_indexes) ^ power) * number_of_groups]
For example, power = 1/6 gives following distribution of groups sizes:
group number group size
0 2
1     63
2     665
3     3367
4     11529
5     31031
6     70993
7     144495
8     269297
9     468558

For convenience, instead of power itself, I use skew factor where power = 1.0 / (1.0 + skew). Therefore, with skew = 0.0, distribution of groups sizes is uniform. Larger skew gives more skewed distribution (and that seems to be quite intuitive). For, negative skew, group sizes are mirrored as for corresponding positive skew. For example, skew factor = -5.0 gives following groups sizes distribution:
group number group size
0     468558
1     269297
2     144495
3     70993
4     31031
5     11529
6     3367
7     665
8     63
9     2

Results shows that between 2172 test cases, in 2113 incremental sort gives speedup while in 59 it causes slowdown. The following 4 test cases show most significant slowdown (>10% of time).

Table GroupedCols GroupCount Skew PreorderedFrac FullSortMedian IncSortMedian TimeChangePercent
int4|int4|numeric 1 100 -10 0 1.5688240528 2.0607631207 31.36
text|int8|text|int4 1 1 0 0 1.7785198689 2.1816160679 22.66
int8|int8|int4 1 10 -10 0 1.136412859 1.3166360855 15.86
numeric|text|int4|int8 2 10 -10 1 0.4403841496 0.5070910454 15.15

As you can see, 3 of this 4 test cases have skewed distribution while one of them is related to costly location-aware comparison of text. I've no particular idea of how to cope these slowdowns. Probably, it's OK to have slowdown in some cases while have speedup in majority of cases (assuming there is an option to turn off new behavior). Probably, we should teach optimizer more about skewed distributions of groups, but that doesn't seem feasible for me.

Any thoughts?

BTW, replacement selection sort was removed by 8b304b8b. I think it worth to rerun benchmarks after that, because results might be changed. Will do.

I've applied patch on top of c12d570f and rerun the same benchmarks.

CSV-file with results is attached. There is no dramatical changes. There is still minority of performance regression cases while majority of cases has improvement.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com

The Russian Postgres Company

Вложения

results2.csv

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Robert Haas
Дата: 02 октября 2017 г., 22:36:02
Сообщение: Re: [HACKERS] issue: record or row variable cannot be part ofmultiple-item INTO list

Следующее

От: Peter Eisentraut
Дата: 02 октября 2017 г., 22:42:36
Сообщение: Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47language tags. Should it?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [HACKERS] [PATCH] Incremental sort

Вложения

Предыдущее

Следующее