Re: ICU integration

Поиск

Список

Период

Сортировка

От	Peter Geoghegan
Тема	Re: ICU integration
Дата	24 сентября 2016 г. 13:13:37
Msg-id	CAM3SWZQs1jG31zuPNkpB9XYDzWfy-i+OY28tjJaFpwn=V9F7Mg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: ICU integration (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы	Re: ICU integration (Thomas Munro <thomas.munro@enterprisedb.com>)
Список	pgsql-hackers

Дерево обсуждения

On Fri, Sep 23, 2016 at 7:27 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> A couple of thoughts about abbreviated keys:
>
> #ifndef TRUST_STRXFRM
>     if (!collate_c)
>         abbreviate = false;
> #endif
>
> I think this macro should affect only strxfrm, and we should trust
> ucol_getSortKey or disable it independently.  ICU's manual says
> reassuring things like "Sort keys are most useful in databases" and
> "Sort keys are generally only useful in databases or other
> circumstances where function calls are extremely expensive".

+1. Abbreviated keys are essential to get competitive performance
while sorting text, and the fact that ICU makes them safe to
reintroduce is a major advantage of adopting ICU. Perhaps we should
consider wrapping strxfrm() instead, though, so that other existing
callers of strxfrm() (I'm thinking of convert_string_datum()) almost
automatically do the right thing. In other words, maybe there should
be a pg_strxfrm() or something, with TRUST_STRXFRM changed to be
something that can dynamically resolve whether or not it's a collation
managed by a trusted collation provider (this could only be resolved
at runtime, which I think is fine).

> It looks like varstr_abbrev_convert calls strxfrm unconditionally
> (assuming TRUST_STRXFRM is defined).  <captain-obvious>This needs to
> use ucol_getSortKey instead when appropriate.</>  It looks like it's a
> bit more helpful than strxfrm about telling you the output buffer size
> it wants, and it doesn't need nul termination, which is nice.
> Unfortunately it is like strxfrm in that the output buffer's contents
> is unspecified if it ran out of space.

One can use the ucol_nextSortKeyPart() interface to just get the first
4/8 bytes of an abbreviated key, reducing the overhead somewhat, so
the output buffer size limitation is probably irrelevant. The ICU
documentation says something about this being useful for Radix sort,
but I suspect it's more often used to generate abbreviated keys.
Abbreviated keys were not my original idea. They're really just a
standard technique.

-- 
Peter Geoghegan

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Fabien COELHO
Дата: 24 сентября 2016 г., 12:45:31
Сообщение: Re: pgbench - minor fix for meta command only scripts

Следующее

От: Peter Geoghegan
Дата: 24 сентября 2016 г., 14:34:41
Сообщение: Re: Refactoring speculative insertion with unique indexes a little

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: ICU integration

Предыдущее

Следующее