Обсуждение: Converting an ASCII database to an UTF-8 database
Hi All, I have a database in PostgreSQL which is ASCII. Due to some internationalization issues, I need to convert the database to the UTF-8 format. So my question is: How do I convert a database in the ASCII format into one of the UTF-8 format? Thanks in advance - Kishore
On fös, 2006-02-17 at 05:21 -0800, kishore.sainath@gmail.com wrote: > Hi All, > > I have a database in PostgreSQL which is ASCII. > Due to some internationalization issues, I need to convert the database > to the UTF-8 format. > > So my question is: > How do I convert a database in the ASCII format into one of the UTF-8 > format? using pg_dump ? gnari
kishore.sainath@gmail.com wrote: > How do I convert a database in the ASCII format into one of the UTF-8 > format? ASCII is a subset of UTF-8, so you don't need to do anything. Just change the encoding entry in the pg_database table. Of course, using pg_dump would be the official way to convert a database between any two encodings. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On fös, 2006-02-17 at 22:38 +0100, Peter Eisentraut wrote:
> kishore.sainath@gmail.com wrote:
> > How do I convert a database in the ASCII format into one of the UTF-8
> > format?
>
> ASCII is a subset of UTF-8, so you don't need to do anything. Just
> change the encoding entry in the pg_database table. Of course, using
> pg_dump would be the official way to convert a database between any two
> encodings.
This will only work correctly if the database
definitely does not contain non-ASCII characters.
Assuming by ASCII format we mean that the database was
created SQL_ASCII, then it is possible that it contains
invalid UTF-8 characters, as SQL_ASCII is a 8 bit
encoding.
consider:
template1=# create database test with encoding='SQL_ASCII';
CREATE DATABASE
template1=# \connect test
You are now connected to database "test".
test=# create table a (x text);
CREATE TABLE
test=# insert into a values ('á');
INSERT 33304378 1
test=# select * from a;
x
---
á
(1 row)
test=# update pg_database set encoding =
pg_catalog.pg_char_to_encoding('UTF8') where datname='test';
UPDATE 1
test=# select * from a;
x
---
á
(1 row)
test=# \connect template1
You are now connected to database "template1".
template1=# \connect test
You are now connected to database "test".
test=# select * from a;
x
---
(1 row)
test=#
gnari
I have this exact problem. I have dumped and reloaded other
databases and set the client encoding to convert them to UTF-8 but I
have one database with values that still cause it to fail, even if I
specify that the client encoding is SQL_ASCII. How do I fix that?
On Feb 17, 2006, at 4:08 PM, Ragnar wrote:
> On fös, 2006-02-17 at 22:38 +0100, Peter Eisentraut wrote:
>> kishore.sainath@gmail.com wrote:
>>> How do I convert a database in the ASCII format into one of the
>>> UTF-8
>>> format?
>>
>> ASCII is a subset of UTF-8, so you don't need to do anything. Just
>> change the encoding entry in the pg_database table. Of course, using
>> pg_dump would be the official way to convert a database between
>> any two
>> encodings.
>
> This will only work correctly if the database
> definitely does not contain non-ASCII characters.
>
> Assuming by ASCII format we mean that the database was
> created SQL_ASCII, then it is possible that it contains
> invalid UTF-8 characters, as SQL_ASCII is a 8 bit
> encoding.
>
> consider:
>
> template1=# create database test with encoding='SQL_ASCII';
> CREATE DATABASE
> template1=# \connect test
> You are now connected to database "test".
> test=# create table a (x text);
> CREATE TABLE
> test=# insert into a values ('á');
> INSERT 33304378 1
> test=# select * from a;
> x
> ---
> á
> (1 row)
>
> test=# update pg_database set encoding =
> pg_catalog.pg_char_to_encoding('UTF8') where datname='test';
> UPDATE 1
> test=# select * from a;
> x
> ---
> á
> (1 row)
>
> test=# \connect template1
> You are now connected to database "template1".
> template1=# \connect test
> You are now connected to database "test".
> test=# select * from a;
> x
> ---
>
> (1 row)
>
> test=#
>
>
> gnari
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>
I believe PostgreSQL treat UTF-8 and LATIN9 Differently.
When I tried to dump a db to a UTF-8 encoding and restore it
with UTF-8 encoding (also) it encountered problems with fields
that have unicoded values thus it stop from restoring the whole dump.
So I tried using LATIN9 encoding for both dump and restore.
I believe that LATIN9=UTF-8 encoding base on the DOCS.
I wonder why it is like that.
When I tried to dump a db to a UTF-8 encoding and restore it
with UTF-8 encoding (also) it encountered problems with fields
that have unicoded values thus it stop from restoring the whole dump.
So I tried using LATIN9 encoding for both dump and restore.
I believe that LATIN9=UTF-8 encoding base on the DOCS.
I wonder why it is like that.
Jan Cruz wrote: > I believe PostgreSQL treat UTF-8 and LATIN9 Differently. Certainly, considering that they are different encodings. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Rick Gigger wrote: > I have this exact problem. I have dumped and reloaded other > databases and set the client encoding to convert them to UTF-8 but I > have one database with values that still cause it to fail, even if I > specify that the client encoding is SQL_ASCII. How do I fix that? Well then you need to figure out what encoding those non-ASCII characters actually are and set the client encoding to that. Trying LATIN1 might be a good start, but if you users/applications have inserted random bytes then you will have to sort it out by hand. -- Peter Eisentraut http://developer.postgresql.org/~petere/