Обсуждение: latin1 unicode conversion errors

Поиск
Список
Период
Сортировка

latin1 unicode conversion errors

От
Kris Jurka
Дата:
Why is latin1 special in its conversion from unconvertible unicode data? 
Other latin character sets add a warning, but latin1 errors out.

jurka=# create database utf8 with encoding ='utf8';
CREATE DATABASE
jurka=# \c utf8
You are now connected to database "utf8".
utf8=# create table t(a text);
CREATE TABLE
utf8=# insert into t values ('\346\231\243');
INSERT 0 1
utf8=# set client_encoding = 'latin2';
SET
utf8=# select * from t;
WARNING:  ignoring unconvertible UTF-8 character 0xe699a3 a
---

(1 row)

utf8=# set client_encoding = 'latin1';
SET
utf8=# select * from t;
ERROR:  could not convert UTF8 character 0x00e6 to ISO8859-1

Kris Jurka


Re: latin1 unicode conversion errors

От
Bruce Momjian
Дата:
My guess is that it was coded by someone different and needs to be made
consistent.

---------------------------------------------------------------------------

Kris Jurka wrote:
> 
> Why is latin1 special in its conversion from unconvertible unicode data? 
> Other latin character sets add a warning, but latin1 errors out.
> 
> jurka=# create database utf8 with encoding ='utf8';
> CREATE DATABASE
> jurka=# \c utf8
> You are now connected to database "utf8".
> utf8=# create table t(a text);
> CREATE TABLE
> utf8=# insert into t values ('\346\231\243');
> INSERT 0 1
> utf8=# set client_encoding = 'latin2';
> SET
> utf8=# select * from t;
> WARNING:  ignoring unconvertible UTF-8 character 0xe699a3
>   a
> ---
> 
> (1 row)
> 
> utf8=# set client_encoding = 'latin1';
> SET
> utf8=# select * from t;
> ERROR:  could not convert UTF8 character 0x00e6 to ISO8859-1
> 
> Kris Jurka
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: latin1 unicode conversion errors

От
Bruce Momjian
Дата:
OK, yea, it is inconsistent. I changed it do throw a warning instead.
Only patched to 8.2 because it is a behavior change.

---------------------------------------------------------------------------

Kris Jurka wrote:
>
> Why is latin1 special in its conversion from unconvertible unicode data?
> Other latin character sets add a warning, but latin1 errors out.
>
> jurka=# create database utf8 with encoding ='utf8';
> CREATE DATABASE
> jurka=# \c utf8
> You are now connected to database "utf8".
> utf8=# create table t(a text);
> CREATE TABLE
> utf8=# insert into t values ('\346\231\243');
> INSERT 0 1
> utf8=# set client_encoding = 'latin2';
> SET
> utf8=# select * from t;
> WARNING:  ignoring unconvertible UTF-8 character 0xe699a3
>   a
> ---
>
> (1 row)
>
> utf8=# set client_encoding = 'latin1';
> SET
> utf8=# select * from t;
> ERROR:  could not convert UTF8 character 0x00e6 to ISO8859-1
>
> Kris Jurka
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c,v
retrieving revision 1.13
diff -c -c -r1.13 utf8_and_iso8859_1.c
*** src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c    25 Dec 2005 02:14:18 -0000    1.13
--- src/backend/utils/mb/conversion_procs/utf8_and_iso8859_1/utf8_and_iso8859_1.c    12 Feb 2006 20:59:36 -0000
***************
*** 84,91 ****
              len -= 2;
          }
          else if ((c & 0xe0) == 0xe0)
!             elog(ERROR, "could not convert UTF8 character 0x%04x to ISO8859-1",
!                  c);
          else
          {
              *dest++ = c;
--- 84,93 ----
              len -= 2;
          }
          else if ((c & 0xe0) == 0xe0)
!             ereport(WARNING,
!                     (errcode(ERRCODE_UNTRANSLATABLE_CHARACTER),
!                      errmsg("ignoring unconvertible UTF-8 character 0x%04x",
!                             c)));
          else
          {
              *dest++ = c;