Re: BUG #18362: unaccent rules and Old Greek text

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: BUG #18362: unaccent rules and Old Greek text
Дата
Msg-id ZdvMcEkMYoMqELiG@paquier.xyz
обсуждение исходный текст
Ответ на Re: BUG #18362: unaccent rules and Old Greek text  (Thomas Munro <thomas.munro@gmail.com>)
Ответы Re: BUG #18362: unaccent rules and Old Greek text  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
On Mon, Feb 26, 2024 at 12:15:57PM +1300, Thomas Munro wrote:
> The Python script is looking for combining sequences that add accents,
> but this one has just "03AC" in the combining sequence field, so it's
> a kind of "simple" redirection that points here:
>
> 03AC;GREEK SMALL LETTER ALPHA WITH TONOS;Ll;0;L;03B1 0301;;;;N;GREEK
> SMALL LETTER ALPHA TONOS;;0386;;0386
>
> That has a normal looking sequence that we can understand (α + an
> accent).  If I tell the script to follow such "simple" redirections, I
> get over a thousand new mappings, including those.  See attached.
> There is probably more correct terminology that I'm using here...

Ah, you've beaten me to it.  Yes, that's pretty much the impression I
was getting while looking at the set of characters in Unicode.txt.  I
am not entirely sure if what you are doing is the best way to do it,
but the set of characters generated in unaccent.rules makes sense
here.  I am surprised to see that many, TBH.

Perhaps you should add a few characters of these series to
unaccent.sql?
--
Michael

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: BUG #18362: unaccent rules and Old Greek text
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #18362: unaccent rules and Old Greek text