Re: [PATCH] Completed unaccent dictionary with many missing characters

Поиск

Список

Период

Сортировка

От	Michael Paquier
Тема	Re: [PATCH] Completed unaccent dictionary with many missing characters
Дата	14 июля 2022 г. 08:41:31
Msg-id	Ys+siw2VEuyXdS4B@paquier.xyz обсуждение исходный текст
Ответ на	Re: [PATCH] Completed unaccent dictionary with many missing characters (Przemysław Sztoch <przemyslaw@sztoch.pl>)
Список	pgsql-hackers

Дерево обсуждения

On Tue, Jul 05, 2022 at 09:24:49PM +0200, Przemysław Sztoch wrote:
> I do not add more, because they probably concern older languages.
> An alternative might be to rely entirely on Unicode decomposition ...
> However, after the change, only one additional Ukrainian letter with an
> accent was added to the rule file.

Hmm.  I was wondering about the decomposition part, actually.  How
much would it make things simpler if we treat the full range of the
cyrillic characters, aka from U+0400 to U+4FF, scanning all of them
and building rules only if there are decompositions?  Is it worth
considering the Cyrillic supplement, as of U+0500-U+052F?

I was also thinking about the regression tests, and as unaccent
characters are more spread than for Latin and Greek, it could be a
good thing to have a complete coverage.  We could for example use a
query like that to check if a character is treated properly or not:
SELECT chr(i.a) = unaccent(chr(i.a))
  FROM generate_series(1024, 1327) AS i(a); -- range of Cyrillic.
--
Michael

Вложения

signature.asc

В списке pgsql-hackers по дате отправления:

Предыдущее

От: David Rowley
Дата: 14 июля 2022 г., 08:30:56
Сообщение: Re: Skip partition tuple routing with constant partition key

Следующее

От: Dilip Kumar
Дата: 14 июля 2022 г., 08:56:32
Сообщение: Re: Handle infinite recursion in logical replication setup

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: [PATCH] Completed unaccent dictionary with many missing characters

Вложения

Предыдущее

Следующее