Re: Unicode database on non-unicode operating system

Поиск
Список
Период
Сортировка
От Morten Barklund
Тема Re: Unicode database on non-unicode operating system
Дата
Msg-id AB6A9C75F1620048B14C9E7D9526F5B136CB70@TBWAMAIL.tbwa.dk
обсуждение исходный текст
Ответ на Re: Unicode database on non-unicode operating system  (Peter Eisentraut <peter_e@gmx.net>)
Ответы Re: Unicode database on non-unicode operating system  (Peter Eisentraut <peter_e@gmx.net>)
Список pgsql-general
Hi Peter,

Thanks for the hint.

I can see that lc_collate (sorting) and lc_ctype (lower-upper conversion) is set to en_DK and I guess that default
encodingfor en_DK is iso88591 or maybe windows1252. Thus my server should have been initialized with en_DK.utf8 or? How
doI find out what the default encoding for the locale en_DK is? I can see, that normally one would sub-specify this by
eitheradding .iso88591 or .utf8, but is windows1252 then default?
 

Because it is clear, that en_DK includes the proper rules for upper-lower conversion of Danish special characters as I
whenconverting from UTF-8 to ISO 8859-1 can use upper() and lower() as expected. And Danish special characters have the
samecode points in latin1 and windows1252.
 

I am not able to reinitdb, as many other databases are running, which might be affected negatively. This means, that
eventhough my database is created WITH ENCODING 'unicode', it is in fact "broken" as the locale does not fully support
unicodestring handling?
 

I wanted to use Unicode, as I expected non-latin1 characters, but this actually means, that if I had any such, some
stringfunctions would not work at all.
 


Regards,
Morten Barklund

-----Original Message-----
From: Peter Eisentraut [mailto:peter_e@gmx.net] 
Sent: Tuesday, July 15, 2008 2:33 PM
To: pgsql-general@postgresql.org
Cc: Morten Barklund
Subject: Re: [GENERAL] Unicode database on non-unicode operating system

Am Dienstag, 15. Juli 2008 schrieb Morten Barklund:
> My problem is, that the lowercase versions of non-ascii characters are
> broken. Specifically I found, that when lower() is invoked on a text with
> non-ascii characters, the operating system's locale is used for converting
> each octet in the string to lowercase in stead of using the locale of the
> database to convert each character in the string to lowercase. This caused
> the danish lower case o with slash "ø", which in unicode is represented as
> the latin1-readable octets "ø", to be converted to the latin1-readable
> octets "ã¸", which then in turn was (tried) to be interpreted as a unicode
> character - but the octects "ã¸" does not represent a unicode character in
> utf8. The lower case version of "ø" is of course just itself.

This means you have mismatching server encodings and locales configured.  
Check SHOW lc_collate and SHOW server_encoding, and then pick a combination 
that is compatible.  This will probably mean you have to reinitdb.



В списке pgsql-general по дате отправления:

Предыдущее
От: Yi Zhao
Дата:
Сообщение: Re: how to found a variable is in a aggregation or not?
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: FAQ correction for Windows 2000/XP