Обсуждение: Strange UTF-8 behaviour
<small><font face="Century Gothic">Hi there all. <br /> I am quite new to Postgres, so forgive me if this question
seemsobvious. <br /><br /> I have created a database with the UTF-8 encoding (createdb cassa --encoding=UTF-8) .<br />
ThenI have made the following tests :<br /><br /></font></small><small><font face="Century Gothic">cassa=>
</font></small><small><fontface="Century Gothic">create table test(id varchar(5));<br /> cassa=> insert into test
values('12345');<br /> INSERT 178725 1<br /> cassa=> insert into test values ('123è');<br /> INSERT 178726 1<br />
cassa=>insert into test values ('1234è');<br /> ERROR: value too long for type character varying(5)<br /><br /><br
/>but if I try <br /> cassa=> select '#' || id || '#' from test;<br /> ?column?<br /> ----------<br /> #12345#<br
/> #123è#<br /> (2 rows)<br /><br /><br /> so, apparently the chars are stored the rigth way
(</font></small><small><fontface="Century Gothic"> #123è#) but when trying the query the è char is parsed as 2 chars
....<br/><br /> The database server version is 7.3.4 on a RedHat 9 machine ...<br /><br /> Any clue ?<br /><br /> Tia
<br/> Marco<br /></font></small><small><font face="Century Gothic"><br /><br /></font></small> <pre
class="moz-signature"cols="72">--
Ever noticed how fast windows run ? neither did I
</pre>
My guess is that something in the chain of getting the data into the
database is measuring:
BYTES
not
CHARACTERS.
"Marco Ferretti" <marco.ferretti@jrc.it> wrote:
</quote--------------------------------------->
<snip>
I have created a database with the UTF-8 encoding (createdb cassa
--encoding=UTF-8) .
Then I have made the following tests :
cassa=> create table test(id varchar(5));
cassa=> insert into test values ('12345');
INSERT 178725 1
cassa=> insert into test values ('123è');
INSERT 178726 1
cassa=> insert into test values ('1234è');
ERROR: value too long for type character varying(5)
<snip>
so, apparently the chars are stored the rigth way ( #123è#) but when
trying the query the è char is parsed as 2 chars ....
The database server version is 7.3.4 on a RedHat 9 machine ...
Any clue ?
</quote--------------------------------------->
On Thu, Sep 16, 2004 at 06:10:13PM +0200, Marco Ferretti wrote:
> I am quite new to Postgres, so forgive me if this question seems
> obvious. <br>
> <br>
> I have created a database with the UTF-8 encoding (createdb cassa
> --encoding=UTF-8) .<br>
> Then I have made the following tests :<br>
FWIW, I can't reproduce this using 7.3.6. Is there anything special
about your 'e' character, or it's a plain 'e'?
$ createdb test --encoding=UTF-8
CREATE DATABASE
COMMENT
$ psql test
Welcome to psql 7.3.6, the PostgreSQL interactive terminal.
Type: \copyright for distribution terms
\h for help with SQL commands
\? for help on internal slash commands
\g or terminate with semicolon to execute query
\q to quit
test=# create table test (id char(5));
CREATE TABLE
test=# insert into test values ('1234e');
INSERT 16993 1
test=# create table test2 (id varchar(5));
CREATE TABLE
test=# insert into test2 values ('1234e');
INSERT 16996 1
test=# insert into test2 values ('123e');
INSERT 16997 1
test=# select '#' || id || '#', length(id) from test2;
?column? | length
----------+--------
#1234e# | 5
#123e# | 4
(2 rows)
--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"Escucha y olvidarás; ve y recordarás; haz y entenderás" (Confucio)
Hi Alvaro,
> FWIW, I can't reproduce this using 7.3.6. Is there anything special
> about your 'e' character, or it's a plain 'e'?
Maybe you didn't get the email correctly. It was an e with grave
accent:, just like this:
è (UTF-8 encoded)
I just checked on PG 7.4.3 / NetBSD, with this results:
egrave=# CREATE TABLE test (data varchar(5));
CREATE
egrave=# show server_encoding ;
client_encoding
-----------------
UNICODE
(1 row)
egrave=# show client_encoding ; -- don't know why it is set to unicode
client_encoding
-----------------
UNICODE
(1 row)
egrave=# INSERT INTO test VALUES ('1234è');
egrave'# '\r
Query buffer reset (cleared).
egrave=# set client_encoding = 'ISO8859-1';
SET
egrave=# show client_encoding ;
client_encoding
-----------------
ISO8859-1
(1 row)
egrave=# INSERT INTO test VALUES ('1234è');
INSERT 25340 1
egrave=# SELECT * FROM test;
data
------
1234è
(1 row)
It seems all is working when client encoding is set correctly up. Try to
check you client and server encoding.
I've also double checked with:
egrave=# SET client_encoding = 'ISO8859-2';
SET
egrave=# SELECT * FROM test;
WARNING: ignoring unconvertible UTF-8 character 0xc3a8
data
------
1234
(1 row)
Best regards
--
Matteo Beccati
http://phpadsnew.com/
http://phppgads.com/
Hi, > è (UTF-8 encoded) Sorry, I actually forgot to switch encoding :) I just hope the last part of the email was readable. Ciao ciao -- Matteo Beccati http://phpadsnew.com/ http://phppgads.com/
Thanks to all you guys ! You really helped marco