Re: [HACKERS] How can I find a specific collation in pg_collationwhen using ICU?

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: [HACKERS] How can I find a specific collation in pg_collationwhen using ICU?
Дата
Msg-id CAH2-Wz=dRsf_qMpHJ5A7pndmpo1sP1MtzrsUpuuNaCCsecCopg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] How can I find a specific collation in pg_collationwhen using ICU?  ("Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com>)
Ответы Re: [HACKERS] How can I find a specific collation in pg_collation when using ICU?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Wed, Aug 9, 2017 at 6:19 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> From: pgsql-hackers-owner@postgresql.org
>> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Peter Eisentraut
>> There are no case-insensitive collations in PostgreSQL (yet).
>
> That's sad news, as I expected ICU to bring its various collation features to PostgreSQL.  I hope it will be easy to
addthem.
 

The reason it is not easy is that text equality is based on strict
binary equality. We would have to teach hash operator classes about
collations to fix this, and make text_eq() hash strxfrm() or
something. That requires special work. You can ask ICU for case
insensitivity with Postgres 10, but the strcmp() tie breaker within
varstr_cmp() will prevent it from being truly case insensitive.

>> The best explanation of the difference that I can understand is here, under
>> "Why do CJK strings sort incorrectly in Unicode?":
>>
>> https://dev.mysql.com/doc/refman/5.5/en/faqs-cjk.html
>
> Thanks a lot.  MysQL seems to have many collations, doesn't it?

Well, it depends on how you define collation, which actually gets
quite complicated with ICU. You can combine certain options together
with great flexibility, it seems (see my e-mail from earlier today --
"What users can do with custom ICU collations in Postgres 10"). Let's
assume that there are 10 distinct options for each locale that each
independently affect sort order for the base collation of the locale.
I believe that there are at least 10 such options that change things,
and maybe a lot more. Theoretically, this means that there are an
absolutely enormous number of possible collations with ICU.

Now, I wouldn't *actually* say that we have many thousands of
collations with ICU, because that doesn't seem like a sensible way to
explain ICU Postgres collations. The way that we think about this from
using libc doesn't work well for ICU. Instead, I would say that we
have hundreds of locales (not collations), each of which support some
variant options (e.g., traditional Spanish sort order, alternative
Japanese sort order, pictographic emoji sorting), with some further
generic options for varying how case it handled, how numbers are
handled, and other things like that.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: [HACKERS] Log LDAP "diagnostic messages"?
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] How can I find a specific collation in pg_collation when using ICU?