Re: PoC: Partial sort

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: PoC: Partial sort
Дата
Msg-id CAM3SWZQm9FODmYaVOg3-dFvSB4pbn75shNSrUwNpUrpEhh9PqA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: PoC: Partial sort  (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы Re: PoC: Partial sort  (Alexander Korotkov <aekorotkov@gmail.com>)
Список pgsql-hackers
On Mon, Feb 10, 2014 at 10:59 AM, Alexander Korotkov
<aekorotkov@gmail.com> wrote:
> Done. Patch is splitted.

I took a quick look at this.

Have you thought about making your new cmpSortSkipCols() function not
use real comparisons? Since in the circumstances in which this
optimization is expected to be effective (e.g. your original example)
we can also expect a relatively low cardinality for the first n
indexed attributes (again, as in your original example), in general
when cmpSortSkipCols() is called there is a high chance that it will
return true. If any pair of tuples (logically adjacent tuples fed in
to cmpSortSkipCols() by an index scan in logical order) are not fully
equal (i.e. their leading, indexed attributes are not equal) then we
don't care about the details -- we just know that a new sort grouping
is required.

The idea here is that you can get away with simple binary equality
comparisons, as we do when considering HOT-safety. Of course, you
might find that two bitwise unequal values are equal according to
their ordinary B-Tree support function 1 comparator (e.g. two numerics
that differ only in their display scale). AFAICT this should be okay,
since that just means that you have smaller sort groupings than
strictly necessary. I'm not sure if that's worth it to more or less
duplicate heap_tuple_attr_equals() to save a "mere" n expensive
comparisons, but it's something to think about (actually, there are
probably less than even n comparisons in practice because there'll be
a limit).

A similar idea appears in my SortSupport for text ("Poor man's
normalized key"/strxfrm()) patch. A poor man's key comparison didn't
work out, and there may be further differences that aren't captured in
the special simple key representation, so we need to do a "proper
comparison" to figure it out for sure. However, within the sortsupport
routine comparator, we know that we're being called in this context,
as a tie-breaker for a poor man's normalized key comparison that
returned 0, and so are optimistic about the two datums being fully
equal. An optimistic memcmp() is attempted before a strcoll() here if
the lengths also match.

I have not actually added special hints so that we're optimistic about
keys being equal in other places (places that have nothing to do with
the general idea of poor man's normalized keys), but that might not be
a bad idea. Actually, it might not be a bad idea to just always have
varstr_cmp() attempt a memcmp() first when two texts have equal
length, no matter how it's called.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Extending MSVC scripts to support --with-extra-version
Следующее
От: Mitsumasa KONDO
Дата:
Сообщение: Re: gaussian distribution pgbench