Обсуждение: Question about encoding
Hello, I'm sure you already replied tons of questions like this, but I don't understand something. This is the situation : * I have COPY commands from a dump, encoded as UTF8 (special characters are encoded as 2-bytes). * My file contains SET client_encoding = 'UTF8'; * But when I do psql <myfile.sql I get an error for 40 characters long strings that have UTF8 characters : ERROR: value too long for type character varying(40) * I think I understand why, psql -l says that my database is SQL_ASCII * Postgresql 7.4 Question 1 : Is it the expected behavior ? These characters have a SQL_ASCII equivalent because I already have them stored in another table of the same database Question 2 : If yes, then I suppose I should have the database as LATIN1 or UTF8. Can I change/convert the encoding of the database ? Or at least of the schema (which would be even better) Thanks a lot for any help :-))) Denis
On mån, 2009-08-24 at 18:42 +0200, Denis BUCHER wrote: > Question 1 : > Is it the expected behavior ? These characters have a SQL_ASCII > equivalent because I already have them stored in another table of the > same database SQL_ASCII is not the same as ASCII. SQL_ASCII means, take the bytes as they come. So a 40-character UTF-8 string might indeed be longer than 40 bytes, which is what SQL_ASCII will look at. The best bet is to avoid SQL_ASCII altogether. It's pretty bogus and inconsistent. > Question 2 : > If yes, then I suppose I should have the database as LATIN1 or UTF8. > Can I change/convert the encoding of the database ? Or at least of the > schema (which would be even better) Dump, recreate database with right encoding, restore.
Hello, Peter Eisentraut a écrit : >> Question 1 : >> Is it the expected behavior ? These characters have a SQL_ASCII >> equivalent because I already have them stored in another table of the >> same database > > SQL_ASCII is not the same as ASCII. SQL_ASCII means, take the bytes as > they come. So a 40-character UTF-8 string might indeed be longer than > 40 bytes, which is what SQL_ASCII will look at. > > The best bet is to avoid SQL_ASCII altogether. It's pretty bogus and > inconsistent. OK that's a good point it explains everything. >> Question 2 : >> If yes, then I suppose I should have the database as LATIN1 or UTF8. >> Can I change/convert the encoding of the database ? Or at least of the >> schema (which would be even better) > > Dump, recreate database with right encoding, restore. Ok I did it but postgresql 7.4 was not very nice, the following failed : 1. dump 2. DROP DATABASE; CREATE DATABASE WITH ENCODING UTF8 3. restore And it bugged. In fact the dump was writtent with "SET CLIENT ENCODING=SQL_ASCII but when restoring on the UTF8 databse it failed, I had to change it to SET client_encoding = LATIN1. Anyway, finally everything is working again, I just had to add a SET NAMES LATIN1 to my application because it was ISO and not UTF8 aware. Therefore thanks a lot for your help, it solved all my problems ! Denis