Re: Unicode full case mapping: PG_UNICODE_FAST, and standard-compliant UCS_BASIC

Поиск

Список

Период

Сортировка

От	Jeff Davis
Тема	Re: Unicode full case mapping: PG_UNICODE_FAST, and standard-compliant UCS_BASIC
Дата	19 апреля 22:30:57
Msg-id	130bf15c8d18c20c0e3c76ef85cd057cacb565dc.camel@j-davis.com обсуждение исходный текст
Ответ на	Re: Unicode full case mapping: PG_UNICODE_FAST, and standard-compliant UCS_BASIC (Noah Misch <noah@leadboat.com>)
Ответы	Re: Unicode full case mapping: PG_UNICODE_FAST, and standard-compliant UCS_BASIC
Список	pgsql-hackers

Дерево обсуждения

On Thu, 2025-04-17 at 06:58 -0700, Noah Misch wrote:
> Should initcap_wbnext() pass in a locale-dependent "bool posix"
> argument like
> the others calls the commit changed?

Yes, I believe you are correct. Patch and tests attached.

> Long-term, pg_u_isword() should have a "bool posix" argument. 
> Currently, only
> tests call that function.  If it got a non-test caller,
> https://www.unicode.org/reports/tr18/#word would have pg_u_isword()
> follow the
> choice of posix compatibility like pg_u_isalnum() does.

I based those functions on:

https://www.unicode.org/reports/tr18/#Compatibility_Properties

and the "word" class does not have a POSIX variant. But Postgres has
two documented definitions for "word" characters:

 * for regexes, alnum + "_"
 * for INITCAP(), just alnum

and the above definition doesn't match up with either one, which is why
we don't use it.

ICU INITCAP() uses the ICU definition of word boundaries, so doesn't
match our documentation.

We could adjust our documentation to allow for provider-dependent
definitions of word characters, which might be a good idea. But that
still doesn't quite capture ICU's more complex definition of word
boundaries.

Or, we could remove those unused functions for now, and figure out if
there's a reason to add them back later. They are probably adding more
confusion than anything.

Regards,
    Jeff Davis

Вложения

v1-0001-Fix-INITCAP-word-boundaries-for-PG_UNICODE_FAST.patch

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Unicode full case mapping: PG_UNICODE_FAST, and standard-compliant UCS_BASIC

Вложения