Re: PoC: Partial sort

Поиск

Список

Период

Сортировка

От	Alexander Korotkov
Тема	Re: PoC: Partial sort
Дата	13 сентября 2014 г. 00:20:23
Msg-id	CAPpHfdvvPozkiZPHoC6=AJO9bXS4mK6837GbBXibnWh=LkspRw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: PoC: Partial sort (Peter Geoghegan <pg@heroku.com>)
Ответы	Re: PoC: Partial sort (Peter Geoghegan <pg@heroku.com>)
Список	pgsql-hackers

Дерево обсуждения

On Sun, Jul 13, 2014 at 6:45 AM, Peter Geoghegan <pg@heroku.com> wrote:

On Mon, Feb 10, 2014 at 10:59 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:
> Done. Patch is splitted.

I took a quick look at this.

Have you thought about making your new cmpSortSkipCols() function not
use real comparisons? Since in the circumstances in which this
optimization is expected to be effective (e.g. your original example)
we can also expect a relatively low cardinality for the first n
indexed attributes (again, as in your original example), in general
when cmpSortSkipCols() is called there is a high chance that it will
return true. If any pair of tuples (logically adjacent tuples fed in
to cmpSortSkipCols() by an index scan in logical order) are not fully
equal (i.e. their leading, indexed attributes are not equal) then we
don't care about the details -- we just know that a new sort grouping
is required.

Actually, higher cardinality skip columns is better. Sorting of smaller groups is faster than sorting larger groups of same size. Also, with smaller groups you achieve limit more accurate (in average), i.e. sort smaller amount of total rows.

The idea here is that you can get away with simple binary equality
comparisons, as we do when considering HOT-safety. Of course, you
might find that two bitwise unequal values are equal according to
their ordinary B-Tree support function 1 comparator (e.g. two numerics
that differ only in their display scale). AFAICT this should be okay,
since that just means that you have smaller sort groupings than
strictly necessary. I'm not sure if that's worth it to more or less
duplicate heap_tuple_attr_equals() to save a "mere" n expensive
comparisons, but it's something to think about (actually, there are
probably less than even n comparisons in practice because there'll be
a limit).

Not correct. Smaller groups are not OK. Imagine that two representations of same skip column value exists. Index may return them in any order, even change them one by one. In this case sorting on other column never takes place, while it should. But some optimizations are still possible:

Use bitwise comparison first, then recheck. But, no guarantees that acceleration will be achieved.
Use equality check instead of btree comparison. For "text" datatype it would be rather faster because of no locale-aware comparison.

------
With best regards,
Alexander Korotkov.

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Tomas Vondra
Дата: 13 сентября 2014 г., 00:09:11
Сообщение: Re: bad estimation together with large work_mem generates terrible slow hash joins

Следующее

От: Robert Haas
Дата: 13 сентября 2014 г., 00:23:07
Сообщение: Re: bad estimation together with large work_mem generates terrible slow hash joins

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: PoC: Partial sort

Предыдущее

Следующее