Re: Built-in case-insensitive collation pg_unicode_ci

Поиск

Список

Период

Сортировка

От	Peter Eisentraut
Тема	Re: Built-in case-insensitive collation pg_unicode_ci
Дата	16 октября 16:46:30
Msg-id	76d9a422-2e15-4300-9c6d-47a7c3d00caa@eisentraut.org обсуждение исходный текст
Ответ на	Built-in case-insensitive collation pg_unicode_ci (Jeff Davis <pgsql@j-davis.com>)
Список	pgsql-hackers

Дерево обсуждения

On 20.09.25 02:21, Jeff Davis wrote:
> New builtin case-insensitive collation PG_UNICODE_CI, where the
> ordering semantics are just:
> 
>     strcmp(CASEFOLD(arg1), CASEFOLD(arg2))
> 
> and the character semantics are the same as PG_UNICODE_FAST.

If it's a variant of PG_UNICODE_FAST, then it ought to be called 
PG_UNICODE_FAST_CI or similar.  Otherwise, one would expect it to be a 
variant of PG_UNICODE (if that existed, but there is also UNICODE).

But that name is also dubious since you later write that it's not 
actually fast.

> Non-deterministic collations cannot be used by SIMILAR TO, and may
> cause problems for ILIKE and regexes. The reason is that pattern
> matching often depends on the character-by-character semantics, but ICU
> collations aren't constrained enough for these semantics to work.

This reasoning is a bit narrow.  SIMILAR TO is kind of deprecated, and 
ILIKE is kind of stupid, and regexes have their own way to control 
case-sensitivity.

Nevertheless, I think there would be some value to provide CI (and maybe 
accent-insensitive?) collations that operate separately from the 
"nondeterministic" mechanism.  But then I would like to see a 
comprehensive approach that covers a variety of providers and locales. 
For example, I would expect there to be something like a "sv_SE_CI" 
locale, either available by default or easily created.

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Built-in case-insensitive collation pg_unicode_ci