Обсуждение: Bug #837: Unable to use LATIN9 (=ISO-8859-15) encoding
Steve Haslam (araqnid@debian.org) reports a bug with a severity of 2
The lower the number the more severe it is.
Short Description
Unable to use LATIN9 (=ISO-8859-15) encoding
Long Description
I am trying to use LATIN9 (ISO-8859-15) as my client encoding rather than LATIN1-- the database I am using is encoded
asUNICODE. However, if I attempt to use the LATIN9 encoding, I get erroneous results. I am using PostgreSQL 7.3 from
Debianunstable (7.3rel-3), which gives a version() string of "PostgreSQL 7.3 on i386-pc-linux-gnu, compiled by GCC
2.95.4".
From PSQL, if I perform:
\encoding LATIN9
insert into i18ntest(id, data) values('Euro symbol', '¤');
then I would expect this to insert a euro symbol into the data column (code point 164 is Euro in ISO-8859-15). However,
whenI change back to UTF-8, the UTF-8 data is "¤", which is the currency symbol.
Now, if I try to insert the Euro symbol using a UNICODE client encoding, then I get an error when I switch back to
LATIN9and SELECT it out again:
psql:/home/steve/public_html/i18ntest.sql:35: WARNING: UtfToLocal: could not convert UTF-8 (0xe282ac). Ignored
However, if I switch to LATIN1 and try to SELECT it, I get a conversion error, which is correct since the euro symbol
doesnot have a code point in LATIN1:
psql:/home/steve/public_html/i18ntest.sql:33: ERROR: Could not convert UTF-8 to ISO8859-1
Sample Code
-- this code is available at http://araqnid.ddts.net/~steve/i18ntest.sql in case it gets munged by the
form/browser/server
-- This is done in a database with "UNICODE" encoding
-- e.g.:
-- create database i18ntest encoding = 'UNICODE';
-- \connect i18ntest
select version();
drop table i18ntest;
create table i18ntest(id text primary key, data text not null);
begin;
\encoding LATIN1
insert into i18ntest(id, data) values('Pound sign', '£');
\encoding LATIN9
insert into i18ntest(id, data) values('Euro symbol', '¤');
commit;
\encoding UNICODE
select id, data from i18ntest;
\encoding LATIN1
select id, data from i18ntest;
\encoding LATIN9
select id, data from i18ntest;
begin;
\encoding UNICODE
update i18ntest set data = '£' where id = 'Pound sign';
update i18ntest set data = 'â\202¬' where id = 'Euro symbol';
commit;
\encoding UNICODE
select id, data from i18ntest;
\encoding LATIN1
select id, data from i18ntest;
\encoding LATIN9
select id, data from i18ntest;
-- drop table i18ntest;
-- drop database i18ntest;
No file was uploaded with this report
> >From PSQL, if I perform:
> \encoding LATIN9
> insert into i18ntest(id, data) values('Euro symbol', '¤');
>
> then I would expect this to insert a euro symbol into the data column
'¤' means '¤', not anything else. Maybe you want to try '\244'
(octal).
--
Peter Eisentraut peter_e@gmx.net
> >From PSQL, if I perform:
> \encoding LATIN9
> insert into i18ntest(id, data) values('Euro symbol', '¤');
>
> then I would expect this to insert a euro symbol into the data column
> (code point 164 is Euro in ISO-8859-15). However, when I change back to
> UTF-8, the UTF-8 data is "¤", which is the currency symbol.
I have confirmed this. It appears to have been a copy and paste mistake.
I will put the following patch into the next subrelease (7.3.1):
*** ../pg73branch/pgsql/src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c Tue Oct 29
18:19:192002
--- src/backend/utils/mb/conversion_procs/utf8_and_iso8859/utf8_and_iso8859.c Mon Dec 9 20:14:43 2002
***************
*** 98,104 ****
{PG_LATIN8, LUmapISO8859_14, ULmapISO8859_14,
sizeof(LUmapISO8859_14) / sizeof(pg_local_to_utf),
sizeof(ULmapISO8859_14) / sizeof(pg_utf_to_local)}, /* ISO-8859-14 Latin 8 */
! {PG_LATIN9, LUmapISO8859_2, ULmapISO8859_2,
sizeof(LUmapISO8859_15) / sizeof(pg_local_to_utf),
sizeof(ULmapISO8859_15) / sizeof(pg_utf_to_local)}, /* ISO-8859-15 Latin 9 */
{PG_LATIN10, LUmapISO8859_16, ULmapISO8859_16,
--- 98,104 ----
{PG_LATIN8, LUmapISO8859_14, ULmapISO8859_14,
sizeof(LUmapISO8859_14) / sizeof(pg_local_to_utf),
sizeof(ULmapISO8859_14) / sizeof(pg_utf_to_local)}, /* ISO-8859-14 Latin 8 */
! {PG_LATIN9, LUmapISO8859_15, ULmapISO8859_15,
sizeof(LUmapISO8859_15) / sizeof(pg_local_to_utf),
sizeof(ULmapISO8859_15) / sizeof(pg_utf_to_local)}, /* ISO-8859-15 Latin 9 */
{PG_LATIN10, LUmapISO8859_16, ULmapISO8859_16,
--
Peter Eisentraut peter_e@gmx.net
On Fri, Dec 06, 2002 at 12:20:54AM +0100, Peter Eisentraut wrote:
> > >From PSQL, if I perform:
> > \encoding LATIN9
> > insert into i18ntest(id, data) values('Euro symbol', '¤');
> >
> > then I would expect this to insert a euro symbol into the data column
>=20
> '¤' means '¤', not anything else. Maybe you want to try '\244'
> (octal).
That was a literal character 164 that the browser seems to have munged when
uploading the form (the script is also available in raw form using the URI
at the top, http://araqnid.ddts.net/~steve/i18ntest.sql)
SRH
--=20
Steve Haslam Reading, UK araqnid@innocent.com
Debian GNU/Linux Maintainer araqnid@debian.org
Currently for sale: http://www.arise.demon.co.uk/my_cv/
almost called it today, turned to face the void, numb with the suffering
and the question- "Why am I?" [queensr=FFc=
he]