Re: encoding affects ICU regex character classification

Поиск
Список
Период
Сортировка
От Jeff Davis
Тема Re: encoding affects ICU regex character classification
Дата
Msg-id 3b4657c1145393d5b18228fad173b3f61f2c1a57.camel@j-davis.com
обсуждение исходный текст
Ответ на Re: encoding affects ICU regex character classification  (Jeremy Schneider <schneider@ardentperf.com>)
Ответы Re: encoding affects ICU regex character classification
Список pgsql-hackers
On Tue, 2023-12-12 at 14:35 -0800, Jeremy Schneider wrote:
> Is someone able to test out upper & lower functions on U+A7BA ...
> U+A7BF
> across a few libs/versions?

Those code points are unassigned in Unicode 11.0 and assigned in
Unicode 12.0.

In ICU 63-2 (based on Unicode 11.0), they just get mapped to
themselves. In ICU 64-2 (based on Unicode 12.1) they get mapped the
same way the builtin CTYPE maps them (based on Unicode 15.1).

The concern over unassigned code points is misplaced. The application
may be aware of newly-assigned code points, and there's no way they
will be mapped correctly in Postgres if the provider is not aware of
those code points. The user can either proceed in using unassigned code
points and accept the risk of future changes, or wait for the provider
to be upgraded.

If the user doesn't have many expression indexes dependent on ctype
behavior, it doesn't matter much. If they do have such indexes, the
best we can offer is a controlled process, and the builtin provider
allows the most visibility and control.

(Aside: case mapping has very strong compatibility guarantees, but not
perfect. For better compatibility guarantees, we should support case
folding.)

> And I have no idea if or when
> glibc might have picked up the new unicode characters.

That's a strong argument in favor of a builtin provider.

Regards,
    Jeff Davis




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dilip Kumar
Дата:
Сообщение: Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: Clean up find_typedefs and add support for Mac