Hi,
When trying databases defined with ICU locales, I see that backends
that serve such databases seem to have their LC_CTYPE inherited from
the environment (as opposed to a per-database fixed value).
That's a problem for the backend code that depends on libc functions
that themselves depend on LC_CTYPE, such as the full text search parser
and dictionaries.
For instance, if you start the instance with a C locale
(LC_ALL=C pg_ctl...) , and tries to use FTS in an ICU UTF-8 database,
it doesn't work:
template1=# create database "fr-utf8"
template 'template0' encoding UTF8
locale 'fr'
collation_provider 'icu';
template1=# \c fr-utf8
You are now connected to database "fr-utf8" as user "daniel".
fr-utf8=# show lc_ctype;
lc_ctype
----------
fr
(1 row)
fr-utf8=# select to_tsvector('été');
ERROR: invalid multibyte character for locale
HINT: The server's LC_CTYPE locale is probably incompatible with the
database encoding.
If I peek into the "real" LC_CTYPE when connected to this database,
I can see it's "C":
fr-utf8=# create extension plperl;
CREATE EXTENSION
fr-utf8=# create function lc_ctype() returns text as '$ENV{LC_CTYPE};'
language plperl;
CREATE FUNCTION
fr-utf8=# select lc_ctype();
lc_ctype
----------
C
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite