Обсуждение: Bug #676: lower(), upper(), & initcap() do not work on utf-8 chars
Bug #676: lower(), upper(), & initcap() do not work on utf-8 chars
От
pgsql-bugs@postgresql.org
Дата:
Henry House (hajhouse@houseag.com) reports a bug with a severity of 3 The lower the number the more severe it is. Short Description lower(), upper(), & initcap() do not work on utf-8 chars Long Description The string case manipulation functions lower(), upper(), & initcap() have no effect on non-ASCII characters in the argument, such as æ, å, ø, ä, etc. ASCII chars in the argument are properly up- or down-cased. The database encoding is UTF-8. Sample Code SELECT upper('æ'); No file was uploaded with this report
pgsql-bugs@postgresql.org writes: > The string case manipulation functions lower(), upper(), & initcap() > have no effect on non-ASCII characters in the argument, such as æ, å, > ø, ä, etc. ASCII chars in the argument are properly up- or down-cased. > The database encoding is UTF-8. lower/upper-casing is driven by locale, not encoding. Unfortunately you didn't mention anything about your locale setup... regards, tom lane
Henry House <hajhouse@houseag.com> writes: >> Unfortunately you didn't mention anything about your locale setup... > The server locale is en_US.UTF-8. (At least I set it up as such when > installing PostgreSQL; I know no way to verify.) The server version is 7.2.= > 1, > running on a IA32 and a DEC Alpha; both machines show the same behavior. Bo= > th > are Debian Linux. Perhaps the bug lies in the locale definition supplied by > Debian? Offhand I'd not necessarily expect an en_US locale to upcase/downcase anything except a-z/A-Z. Perhaps you need to use a different locale. I'd suggest taking this up with a locale expert, which I surely am not. regards, tom lane
On Sat, May 25, 2002 at 12:56:06AM -0400, Tom Lane wrote: > pgsql-bugs@postgresql.org writes: > > The string case manipulation functions lower(), upper(), & initcap()=20 > > have no effect on non-ASCII characters in the argument, such as =EF=BF= =BD, =EF=BF=BD,=20 > > =EF=BF=BD, =EF=BF=BD, etc. ASCII chars in the argument are properly up-= or down-cased. > > The database encoding is UTF-8.=09 >=20 > lower/upper-casing is driven by locale, not encoding. >=20 > Unfortunately you didn't mention anything about your locale setup... The server locale is en_US.UTF-8. (At least I set it up as such when installing PostgreSQL; I know no way to verify.) The server version is 7.2.= 1, running on a IA32 and a DEC Alpha; both machines show the same behavior. Bo= th are Debian Linux. Perhaps the bug lies in the locale definition supplied by Debian? --=20 Henry House The attached file is a digital signature. See <http://romana.hajhouse.org/p= gp> for information. My OpenPGP key: <http://romana.hajhouse.org/hajhouse.asc>.
> > lower/upper-casing is driven by locale, not encoding. > > > > Unfortunately you didn't mention anything about your locale setup... > > The server locale is en_US.UTF-8. (At least I set it up as such when > installing PostgreSQL; I know no way to verify.) The server version is 7.2.1, > running on a IA32 and a DEC Alpha; both machines show the same behavior. Both > are Debian Linux. Perhaps the bug lies in the locale definition supplied by > Debian? I don't think current locale support code works with mutibyte encodings such as UTF-8. See the thread tiled "Bug #659: lower()/upper() bug on" on pgsql-bugs and pgsql-hackers. In the mean time, a work around would be something like: select convert(lower(convert('X', 'LATIN1')),'LATIN1','UNICODE'); That will convert UTF-8 'X' to its lower case if you are sure that 'X' could be converted to ISO-8859-1. Of course the problem with this method is: Someone has suggested me a fix using UTF-8 locales, but I'm worried about usage of UTF-8 and am waiting for the test result with my Japanese data. -- Tatsuo Ishii