Re: unicode match normal forms

Поиск
Список
Период
Сортировка
От Daniel Verite
Тема Re: unicode match normal forms
Дата
Msg-id 48e7eaab-9403-4d65-8581-cd1e55231d28@manitou-mail.org
обсуждение исходный текст
Ответ на unicode match normal forms  (hamann.w@t-online.de)
Список pgsql-general
    Hamann W wrote:

> in unicode letter ä exists in two versions - linux and windows use a
> composite whereas macos prefers
> the decomposed form. Is there any way to make a semi-exact match that
> accepts both variants?

Aside from normalizing the strings into the same normal form
before comparing, non-deterministic ICU collations will recognize them as
identical (they're "canonically equivalent" in Unicode terms)

For instance,

CREATE COLLATION nd (
   provider = 'icu',
   locale='',
   deterministic = false
);

SELECT
 nfc_form,
 nfd_form,
 nfc_form = nfd_form COLLATE nd AS equal1,
 nfc_form = nfd_form COLLATE "C" AS equal2 -- or any deterministic collation
FROM
  (VALUES
      (E'j\u00E4hrlich',
       E'j\u0061\u0308hrlich'))
  AS s(nfc_form, nfd_form);


 nfc_form | nfd_form | equal1 | equal2
----------+----------+--------+--------
 jährlich  | jährlich  | t    | f
(1 row)

Normalizing is available as a built-in function since Postgres 13 and
non-deterministic collations appeared in Postgres 12.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite



В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение:
Следующее
От: goldgraeber-werbetechnik@t-online.de
Дата:
Сообщение: Re: unicode match normal forms