Re: encoding affects ICU regex character classification

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: encoding affects ICU regex character classification
Дата
Msg-id 360857.1701302164@sss.pgh.pa.us
обсуждение исходный текст
Ответ на encoding affects ICU regex character classification  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: encoding affects ICU regex character classification  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
Jeff Davis <pgsql@j-davis.com> writes:
> The problem seems to be confusion between pg_wchar and a unicode code
> point in pg_wc_isalpha() and related functions.

Yeah, that's an ancient sore spot: we don't really know what the
representation of wchar is.  We assume it's Unicode code points
for UTF8 locales, but libc isn't required to do that AFAIK.  See
comment block starting about line 20 in regc_pg_locale.c.

I doubt that ICU has much to do with this directly.

We'd have to find an alternate source of knowledge to replace the
<wctype.h> functions if we wanted to fix it fully ... can ICU do that?

            regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeff Davis
Дата:
Сообщение: encoding affects ICU regex character classification
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: logical decoding and replication of sequences, take 2