Bug #659: lower()/upper() bug on ->multibyte<- DB

Поиск
Список
Период
Сортировка
От pgsql-bugs@postgresql.org
Тема Bug #659: lower()/upper() bug on ->multibyte<- DB
Дата
Msg-id 20020507145112.BE39A476356@postgresql.org
обсуждение исходный текст
Ответы Re: Bug #659: lower()/upper() bug on ->multibyte<- DB  (Tatsuo Ishii <t-ishii@sra.co.jp>)
Список pgsql-bugs
Michael Enke (michael.enke@wincor-nixdorf.com) reports a bug with a severity of 2
The lower the number the more severe it is.

Short Description
lower()/upper() bug on ->multibyte<- DB

Long Description
OS: Linux Kernel 2.4.4, PostgreSQL version 7.2.1
lower() and upper() doesn't work like expected for multibyte
databases. It is working fine for one-byte encoding.
The behaviour can be reproduced as follows:
at initdb: LC_CTYPE was set to de_DE
createdb -E UTF-8 name
export PGCLIENTENCODING=LATIN1
psql -U name
--------------------------------------------------
=> select lower('Ä');  -- german umlaut A, capital
ERROR: Could not convert UTF-8 to ISO8859-1
-- I expected to see: ä german umlaut a, lower case
--------------------------------------------------
=> select lower('ä');  -- german umlaut a, lower case
ERROR: Could not convert UTF-8 to ISO8859-1
-- I expected to see: ä german umlaut a, lower case
--------------------------------------------------
=> select upper('ä');  -- it doesn't translate
ä
-- I expected to see: Ä
--------------------------------------------------
=> select upper('Ä');  -- this works fine
Ä
--------------------------------------------------

The same happens to Ö and Ü (O umlaut, U umlaut)

If you want to reproduce this and don't have ä/Ä on your keyboard,
you can create a table with one column, type varchar(1) (on a MB DB).
create a file with following input:
ae is \u00e4
AE is \u00c4
from java use the command:
native2ascii -reverse -utf8 <this-file> <new-file>
In <new-file> you will see:
in the first line 2 bytes: A(with tilde on top) and Euro Symbol,
in the second line 2 byte: A(with tilde on top) and a dotted box
unset PGCLIENTENCODING, call psql:
insert into table values('<copy and paste first two bytes>');
insert into table values('<copy and paste second two bytes>');
export PGCLIENTENCODING=LATIN1
psql: select * from table; will show you the a-umlaut and A-umlaut.

Sample Code


No file was uploaded with this report

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: problem with the sum function
Следующее
От: Stephan Szabo
Дата:
Сообщение: Re: problem with the sum function