Re: B-Tree support function number 3 (strxfrm() optimization)

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: B-Tree support function number 3 (strxfrm() optimization)
Дата
Msg-id CAM3SWZSAtQOVbPZmVMcaj-_fQEvKcZ5MKeaAUyH6Wz40hmvL3A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: B-Tree support function number 3 (strxfrm() optimization)  (Peter Geoghegan <pg@heroku.com>)
Ответы Re: B-Tree support function number 3 (strxfrm() optimization)  (Peter Geoghegan <pg@heroku.com>)
Re: B-Tree support function number 3 (strxfrm() optimization)  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Wed, Sep 3, 2014 at 2:44 PM, Peter Geoghegan <pg@heroku.com> wrote:
> I guess it should still be a configure option, then. Or maybe there
> should just be a USE_ABBREV_KEYS macro within pg_config_manual.h.

Attached additional patches are intended to be applied on top off most
of the patches posted on September 2nd [1]. Note that you should not
apply patch 0001-* from that set to master, since it has already been
committed to master [2]. However, while rebasing I revised
patch/commit 0005-* to abbreviation used on all platforms, including
32-bit platforms (the prior 0005-* patch just re-enabled the
optimization on Darwin/Apple), so you should discard the earlier
0005-* patch. In a later commit I also properly formalize the idea
that we always do opportunistic "memcmp() == 0" checks, no matter what
context a sortsupport-accelerated text comparison occurs in. That
seems like a good idea, but it's broken out in a separate commit in
case you are not in agreement.

While I gave serious consideration to your idea of having a dedicated
abbreviation comparator, and not duplicating sortsupport state when
abbreviated keys are used (going so far as to almost fully implement
the idea), I ultimately decided that my vote says we don't do that. It
seemed to me that there were negligible benefits for increased
complexity. In particular, I didn't want to burden tuplesort with
having to worry about whether or not abbreviation was aborted during
tuple copying, or was not used by the opclass in the first place -
implementing your scheme makes that distinction relevant. It's very
convenient to have comparetup_heap() "compare the leading sort key"
(that specifically looks at SortTuple.datum1 pairs) indifferently,
using the same comparator for "abbreviated" and "not abbreviated"
cases indifferently. comparetup_heap() does not seem like a great
place to burden with caring about each combination any more than
strictly necessary.

I like that I don't have to care about every combination, and can
treat abbreviation abortion as the special case with the extra step,
in line with how I think of the optimization conceptually. Does that
make sense? Otherwise, there'd have to be a ApplySortComparator()
*and* "ApplySortComparatorAbbreviated()" call with SortTuple.datum1
pairs passed, as appropriate for each opclass (and abortion state), as
well as a heap_getattr() tie-breaker call for the latter case alone
(when we got an inconclusive answer, OR when abbreviation was
aborted). Finally, just as things are now, there'd have to be a loop
where the second or subsequent attributes are dealt with by
ApplySortComparator()'ing. So AFAICT under your scheme there are 4
ApplySortComparator* call sites required, rather than 3 as under mine.

Along similar lines, I thought about starting from nkey = 0 within
comparetup_heap() when abortion occurs (so that there'd only be 2
ApplySortComparator() call sites - no increase from master) , but that
turns out to be messy, plus I like those special tie-breaker
assertions.

I will be away for much of next week, and will have limited access to
e-mail. I will be around tomorrow, though. I hope that what I've
posted is suitable to commit without further input from me.

[1] http://www.postgresql.org/message-id/CAM3SWZTEtQcKc24LhWKDLasJf-b-cCNn4q0OYjhGBX+NcpNRpg@mail.gmail.com
[2] http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=d8d4965dc29263462932be03d4206aa694e2cd7e
--
Peter Geoghegan

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Marko Tiikkaja
Дата:
Сообщение: Re: PL/pgSQL 2
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: PL/pgSQL 1.2