Re: Built-in CTYPE provider

Поиск
Список
Период
Сортировка
От Daniel Verite
Тема Re: Built-in CTYPE provider
Дата
Msg-id d26df384-2fa7-4f50-b703-b0b6706dbeff@manitou-mail.org
обсуждение исходный текст
Ответ на Built-in CTYPE provider  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Built-in CTYPE provider
Re: Built-in CTYPE provider
Список pgsql-hackers
    Jeff Davis wrote:

> While "full" case mapping sounds more complex, there are actually
> very few cases to consider and they are covered in another (small)
> data file. That data file covers ~100 code points that convert to
> multiple code points when the case changes (e.g. "ß" -> "SS"), 7
> code points that have context-sensitive mappings, and three locales
> which have special conversions ("lt", "tr", and "az") for a few code
> points.

But there are CLDR mappings on top of that.

According to the Unicode FAQ

   https://unicode.org/faq/casemap_charprop.html#5

   Q: Does the default case mapping work for every language? What
   about the default case folding?

   [...]

   To make case mapping language sensitive, the Unicode Standard
   specificially allows implementations to tailor the mappings for
   each language, but does not provide the necessary data. The file
   SpecialCasing.txt is included in the Standard as a guide to a few
   of the more important individual character mappings needed for
   specific languages, notably the Greek script and the Turkic
   languages. However, for most language-specific mappings and
   tailoring, users should refer to CLDR and other resources.

In particular "el" (modern greek) has case mapping rules that
ICU seems to implement, but "el" is missing from the list
("lt", "tr", and "az") you identified.

The CLDR case mappings seem to be found in
https://github.com/unicode-org/cldr/tree/main/common/transforms
in *-Lower.xml and *-Upper.xml


Best regards,
--
Daniel Vérité
https://postgresql.verite.pro/
Twitter: @DanielVerite



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Sacha Hottinger
Дата:
Сообщение: AW: Building PosgresSQL with LLVM fails on Solaris 11.4
Следующее
От: Emre Hasegeli
Дата:
Сообщение: "pgoutput" options missing on documentation