unaccent fails when datlocprovider=i and datctype=C

От

Jeff Davis

Тема

Дата

8 марта 2023 г. в 05:49:12

Msg-id

79e4354d9eccfdb00483146a6b9f6295202e7890.camel@j-davis.com

обсуждение

Список

pgsql-bugs

Дерево обсуждения

unaccent fails when datlocprovider=i and datctype=C Jeff Davis <pgsql@j-davis.com> 8 марта 2023 г. в 05:49:12

Re: unaccent fails when datlocprovider=i and datctype=C Peter Eisentraut <peter.eisentraut@enterprisedb.com> 10 марта 2023 г. в 08:40:19

Repro:

$ initdb -D data -N --locale-provider=icu --icu-locale=en --locale=C

=# create extension unaccent;
ERROR:  invalid multibyte character for locale
HINT:  The server's LC_CTYPE locale is probably incompatible with the
database encoding.
CONTEXT:  line 1 of configuration file
".../share/tsearch_data/unaccent.rules": "¡  !
"

Cause: t_isspace() implementation is incomplete (notice "TODO"
comments):

    Oid         collation = DEFAULT_COLLATION_OID;  /* TODO */
    pg_locale_t mylocale = 0;   /* TODO */

    if (clen == 1 || lc_ctype_is_c(collation))
        return isspace(TOUCHAR(ptr));

    char2wchar(character, WC_BUF_LEN, ptr, clen, mylocale);

    return iswspace((wint_t) character[0]);

If using datlocprovider=c, then the earlier branch goes straight to
isspace(). But if datlocprovider=i, then
lc_ctype_is_c(DEFAULT_COLLATION_OID) returns false, and it goes into
char2wchar(). char2wchar() is essentially a wrapper around mbstowcs(),
which does not work on multibyte input when LC_CTYPE=C. 

Quick fix (attached): check whether datctype is C rather than the
default collation.

Eventually this should be fixed by doing character classification in
ICU when the provider is ICU.

-- 
Jeff Davis
PostgreSQL Contributor Team - AWS

В списке pgsql-bugs по дате отправления

Предыдущее

От: PG Bug reporting form

Дата: 8 марта 2023 г. в 04:01:30

Сообщение: BUG #17824: PQTRANS_ACTIVE misleading

Следующее

От: Alexander Lakhin

Дата: 8 марта 2023 г. в 06:00:00

Сообщение: Re: BUG #17811: Replacing an underlying view breaks OLD/NEW tuple when accessing it via upper-level view