Re: Built-in case-insensitive collation pg_unicode_ci
| От | Peter Eisentraut |
|---|---|
| Тема | Re: Built-in case-insensitive collation pg_unicode_ci |
| Дата | |
| Msg-id | 76d9a422-2e15-4300-9c6d-47a7c3d00caa@eisentraut.org обсуждение исходный текст |
| Ответ на | Built-in case-insensitive collation pg_unicode_ci (Jeff Davis <pgsql@j-davis.com>) |
| Список | pgsql-hackers |
On 20.09.25 02:21, Jeff Davis wrote: > New builtin case-insensitive collation PG_UNICODE_CI, where the > ordering semantics are just: > > strcmp(CASEFOLD(arg1), CASEFOLD(arg2)) > > and the character semantics are the same as PG_UNICODE_FAST. If it's a variant of PG_UNICODE_FAST, then it ought to be called PG_UNICODE_FAST_CI or similar. Otherwise, one would expect it to be a variant of PG_UNICODE (if that existed, but there is also UNICODE). But that name is also dubious since you later write that it's not actually fast. > Non-deterministic collations cannot be used by SIMILAR TO, and may > cause problems for ILIKE and regexes. The reason is that pattern > matching often depends on the character-by-character semantics, but ICU > collations aren't constrained enough for these semantics to work. This reasoning is a bit narrow. SIMILAR TO is kind of deprecated, and ILIKE is kind of stupid, and regexes have their own way to control case-sensitivity. Nevertheless, I think there would be some value to provide CI (and maybe accent-insensitive?) collations that operate separately from the "nondeterministic" mechanism. But then I would like to see a comprehensive approach that covers a variety of providers and locales. For example, I would expect there to be something like a "sv_SE_CI" locale, either available by default or easily created.
В списке pgsql-hackers по дате отправления: