Re: Concerning about Unicode-aware string handling

Поиск
Список
Период
Сортировка
От Albe Laurenz
Тема Re: Concerning about Unicode-aware string handling
Дата
Msg-id D960CB61B694CF459DCFB4B0128514C207E6AAD2@exadv11.host.magwien.gv.at
обсуждение исходный текст
Ответ на Concerning about Unicode-aware string handling  (Vincas Dargis <vindrg@gmail.com>)
Ответы Re: Concerning about Unicode-aware string handling  (Vincas Dargis <vindrg@gmail.com>)
Список pgsql-general
Vincas Dargis wrote:
> We have problems (currently using 8.4, but also in latest 9.1.3) in
> our application with Unicode word symbols in Lithuanian ('ąčęėįšųūž'),
> Russian and of course potentially other languages.
> 
> For example, regex_replace('acząčž', E'\\W', '', 'g') removes ąčž.
> 
> lower() and ~* comparison works only with locale that is set (no
> internationalization).
> 
> Could we expect Unciode support in near future? Or should we do quick
> hacks by reimplementing regexp_replace(), lower(), upper() and other
> string SQL functions using, for example, Qt libraries..? Or maybe are
> there some kind simpler workarounds?

I tried it with 9.1.3 on Linux:

upper() and lower() works fine, no matter what the
database encoding is:

test=> SELECT upper('acząčž');
 upper
--------
 ACZĄČŽ
(1 row)

And this seems OK with LATIN7:

lt2=> SHOW server_encoding;
 server_encoding
-----------------
 LATIN7
(1 row)

lt2=> SHOW lc_ctype;
 lc_ctype
----------
 lt_LT
(1 row)

lt2=> SHOW lc_collate;
 lc_collate
------------
 lt_LT
(1 row)

lt2=> SELECT 'ą' ~* '\w';
 ?column?
----------
 t
(1 row)

But it looks wrong with UTF8:

lt=> SHOW server_encoding;
 server_encoding
-----------------
 UTF8
(1 row)

lt=> SHOW lc_ctype;
  lc_ctype
------------
 lt_LT.utf8
(1 row)

lt=> SHOW lc_collate;
 lc_collate
------------
 lt_LT.utf8
(1 row)

lt=> SELECT 'ą' ~* '\w';
 ?column?
----------
 f
(1 row)


Is that what you are complaining about?

Yours,
Laurenz Albe

В списке pgsql-general по дате отправления:

Предыдущее
От: Vincas Dargis
Дата:
Сообщение: Re: Concerning about Unicode-aware string handling
Следующее
От: Samba
Дата:
Сообщение: Re: Global Named Prepared Statements