Re: B-Tree support function number 3 (strxfrm() optimization)

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: B-Tree support function number 3 (strxfrm() optimization)
Дата
Msg-id CAM3SWZRNUr4Z5hijt=Xvdma-5uv2gg3+922drOZLwSoHNh5MkA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: B-Tree support function number 3 (strxfrm() optimization)  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: B-Tree support function number 3 (strxfrm() optimization)  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Thu, Sep 4, 2014 at 2:18 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Eh, maybe?  I'm not sure why the case where we're using abbreviated
> keys should be different than the case we're not.  In either case this
> is a straightforward trade-off: if we do a memcmp() before strcoll(),
> we win if it returns 0 and lose if returns non-zero and strcoll also
> returns non-zero.  (If memcmp() returns non-zero but strcoll() returns
> 0, it's a tie.)  I'm not immediately sure why it should affect the
> calculus one way or the other whether abbreviated keys are in use; the
> question of how much faster memcmp() is than strcoll() seems like the
> relevant consideration either way.


Not quite. Consider my earlier example of sorting ~300,000 cities by
country only. That's a pretty low cardinality attribute. We win big,
and we are almost certain that the abbreviated key cardinality is a
perfect proxy for the full key cardinality so we stick with
abbreviated keys while copying over tuples. Sure, most comparisons
will actually be resolved with a "memcmp() == 0" rather than an
abbreviated comparison, but under my ad-hoc cost model there is no
distinction, since they're both very much cheaper than a strcoll()
(particularly when we factor in the NUL termination copying that a
"memcmp() == 0" also avoids). To a lesser extent we're also justified
in that optimism because we've already established that roughly the
first 8 bytes of the string are bitwise equal.

So the difference is that in the abbreviated key case, we are at least
somewhat justified in our optimism. Whereas, where we're just eliding
fmgr overhead, say on the 2nd or subsequent attribute of a multi-key
sort, it's totally opportunistic to chance a "memcmp() == 0". The
latter optimization can only be justified by the fact that the
memcmp() is somewhere between dirt cheap and free. That seems like
soemthing that should significantly impact the calculus.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: PL/pgSQL 2
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: Pg_upgrade and toast tables bug discovered