Re: BUG #15548: Unaccent does not remove combining diacritical characters

Поиск
Список
Период
Сортировка
От Hugh Ranalli
Тема Re: BUG #15548: Unaccent does not remove combining diacritical characters
Дата
Msg-id CAAhbUMNyZ+PhNr_mQ=G161K0-hvbq13Tz2is9M3WK+yX9cQOCw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #15548: Unaccent does not remove combining diacritical characters  (Hugh Ranalli <hugh@whtc.ca>)
Ответы Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters  (Michael Paquier <michael@paquier.xyz>)
Re: BUG #15548: Unaccent does not remove combining diacriticalcharacters  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Список pgsql-bugs
Okay, I've tried to separate everything cleanly. The patches are numbered in the order in which they should be applied. Each patch contains all the updates appropriate to that version (i.e., if the change would modify unaccent.rules, those changes are also in the patch):

01 - Updates generate_unaccent_rules.py to be Python 2 and 3 compatible. The approach I have taken is "native" Python 3 compatibility with adjustments for Python 2. There's a marked block at the beginning of the file that can be removed whenever Python 2 support is dropped. I haven't followed the recommended practice of importing the "past" or "future" modules, as the changes are minimal, and these are just additional dependencies that need to be installed separately, which didn't seem to make sense for a utility script. This patch also updates sql/unaccent.sql to UTF-8 format. 

02 - Updates generate_unaccent_rules.py to work with all versions (I tested r28 and r34) of the Latin-ASCII transliteration file. It also updates unaccent.rules to have the output of the r34 transliteration file. This patch should work without the 01 patch.

03 - Updates generate_unaccent_rules.py to remove combining diacritical marks. It also updates unaccent.rules with the revised output, and adds tests to sql/unaccent.sql. It will not work or apply if the 01 patch is not applied. It should without the 02 patch.

When you look at unaccent.rules generated by the 03 version, there may appear to be blank lines. I've checked and they're not blank. They are characters which are only visible with other characters in front of them, at least in my editor.

I'll go update the CommitFest now. I hope I've covered everything; please let me know if there's anything I've missed.

Best wishes,
Hugh

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: BUG #15553: "ERROR: cache lookup failed for type 2" with a function the first time it run.
Следующее
От: Etsuro Fujita
Дата:
Сообщение: Re: BUG #15552: Unexpected error in COPY to a foreign table in atransaction