Re: B-Tree support function number 3 (strxfrm() optimization)

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: B-Tree support function number 3 (strxfrm() optimization)
Дата
Msg-id CA+Tgmoa+OjyVHV1aXHgnWTMY5s6ZsQogS9UrUy9J4RpaO-6E_A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: B-Tree support function number 3 (strxfrm() optimization)  (Peter Geoghegan <pg@heroku.com>)
Ответы Re: B-Tree support function number 3 (strxfrm() optimization)  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
On Tue, Sep 2, 2014 at 4:41 PM, Peter Geoghegan <pg@heroku.com> wrote:
> HyperLogLog isn't sample-based - it's useful for streaming a set and
> accurately tracking its cardinality with fixed overhead.

OK.

>> Is it the right decision to suppress the abbreviated-key optimization
>> unconditionally on 32-bit systems and on Darwin?  There's certainly
>> more danger, on those platforms, that the optimization could fail to
>> pay off.  But it could also win big, if in fact the first character or
>> two of the string is enough to distinguish most rows, or if Darwin
>> improves their implementation in the future.  If the other defenses
>> against pathological cases in the patch are adequate, I would think
>> it'd be OK to remove the hard-coded checks here and let those cases
>> use the optimization or not according to its merits in particular
>> cases.  We'd want to look at what the impact of that is, of course,
>> but if it's bad, maybe those other defenses aren't adequate anyway.
>
> I'm not sure. Perhaps the Darwin thing is a bad idea because no one is
> using Macs to run real database servers. Apple haven't had a server
> product in years, and typically people only use Postgres on their Macs
> for development. We might as well have coverage of the new code for
> the benefit of Postgres hackers that favor Apple machines. Or, to look
> at it another way, the optimization is so beneficially that it's
> probably worth the risk, even for more marginal cases.
>
> 8 primary weights (the leading 8 bytes, frequently isomorphic to the
> first 8 Latin characters, regardless of whether or not they have
> accents/diacritics, or punctuation/whitespace) is twice as many as 4.
> But every time you add a byte of space to the abbreviated
> representation that can resolve a comparison, the number of
> unresolvable-without-tiebreak comparisons (in general) is, I imagine,
> reduced considerably. Basically, 8 bytes is way better than twice as
> good as 4 bytes in terms of its effect on the proportion of
> comparisons that are resolved only with abbreviated keys. Even still,
> I suspect it's still worth it to apply the optimization with only 4.
>
> You've seen plenty of suggestions on assessing the applicability of
> the optimization from me. Perhaps you have a few of your own.

My suggestion is to remove the special cases for Darwin and 32-bit
systems and see how it goes.

> That wouldn't be harmless - it would probably result in incorrect
> answers in practice, and would certainly be unspecified. However, I'm
> not reading uninitialized bytes. I call memset() so that in the event
> of the final strxfrm() blob being less than 8 bytes (which can happen
> even on glibc with en_US.UTF-8). It cannot be harmful to memcmp()
> every Datum byte if the remaining bytes are always initialized to NUL.

OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Need Multixact Freezing Docs
Следующее
От: Hannu Krosing
Дата:
Сообщение: Re: PL/pgSQL 1.2