pgsql: unaccent: Add support for quoted translated characters

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема pgsql: unaccent: Add support for quoted translated characters
Дата
Msg-id E1qinve-004smN-Fz@gemulon.postgresql.org
обсуждение исходный текст
Список pgsql-committers
unaccent: Add support for quoted translated characters

As reported in bug #18057, the extension unaccent removes in its rule
file whitespace characters that are intentionally specified when
building unaccent.rules from UnicodeData.txt, causing an incorrect
translation for some characters like numeric symbols.  This is caused by
the fact that all whitespaces before and after the origin and target
characters are all discarded (this limitation is documented).

This commit makes possible the use of quotes around target characters,
so as whitespaces can be considered part of target characters.  Some
target characters use a double quote, these require an extra double
quote.

The documentation is updated to show how to use quoted areas,
generate_unaccent_rules.py is updated to generate unaccent.rules and a
couple of tests are added for numeric symbols.  While working on this
patch, I have implemented a fake rule file to test the parsing logic
implemented, which is not included here as it would just consume extra
cycles in the tests, and it requires the manipulation of an installation
tree to be able to work correctly.

As this requires a change of format in unaccent.rules, this cannot be
backpatched, unfortunately.  The idea to use double quotes as escaped
characters comes from Tom Lane.

Reported-by: Martin Schlossarek
Author: Michael Paquier
Discussion: https://postgr.es/m/18057-62712cad01bd202c@postgresql.org

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/59f47fb98dab6f4a59bdfdb8825a7560ca8f1cba

Modified Files
--------------
contrib/unaccent/expected/unaccent.out      | 36 ++++++++++++
contrib/unaccent/generate_unaccent_rules.py |  4 ++
contrib/unaccent/sql/unaccent.sql           |  6 ++
contrib/unaccent/unaccent.c                 | 86 +++++++++++++++++++++++++----
contrib/unaccent/unaccent.rules             | 56 +++++++++----------
doc/src/sgml/unaccent.sgml                  | 16 ++++++
6 files changed, 166 insertions(+), 38 deletions(-)


В списке pgsql-committers по дате отправления:

Предыдущее
От: Nathan Bossart
Дата:
Сообщение: pgsql: Remove open-coded binary heap in pg_dump_sort.c.
Следующее
От: Michael Paquier
Дата:
Сообщение: pgsql: doc: Fix description of BUFFER_USAGE_LIMIT for VACUUM and ANALYZ