Re: encoding affects ICU regex character classification

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: encoding affects ICU regex character classification
Дата	29 ноября 2023 г. 23:56:04
Msg-id	360857.1701302164@sss.pgh.pa.us обсуждение исходный текст
Ответ на	encoding affects ICU regex character classification (Jeff Davis <pgsql@j-davis.com>)
Ответы	Re: encoding affects ICU regex character classification
Список	pgsql-hackers

Дерево обсуждения

Jeff Davis <pgsql@j-davis.com> writes:
> The problem seems to be confusion between pg_wchar and a unicode code
> point in pg_wc_isalpha() and related functions.

Yeah, that's an ancient sore spot: we don't really know what the
representation of wchar is.  We assume it's Unicode code points
for UTF8 locales, but libc isn't required to do that AFAIK.  See
comment block starting about line 20 in regc_pg_locale.c.

I doubt that ICU has much to do with this directly.

We'd have to find an alternate source of knowledge to replace the
<wctype.h> functions if we wanted to fix it fully ... can ICU do that?

            regards, tom lane

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: encoding affects ICU regex character classification