Re: [ADMIN] what's the efficient/safest way to convert database character set ?

Поиск
Список
Период
Сортировка
От Huang, Suya
Тема Re: [ADMIN] what's the efficient/safest way to convert database character set ?
Дата
Msg-id D83E55F5F4D99B4A9B4C4E259E6227CD9DF39C@AUX1EXC01.apac.experian.local
обсуждение исходный текст
Ответ на Re: [ADMIN] what's the efficient/safest way to convert database character set ?  (John R Pierce <pierce@hogranch.com>)
Ответы Re: [ADMIN] what's the efficient/safest way to convert database character set ?  (John R Pierce <pierce@hogranch.com>)
Список pgsql-general
Hi John,

" Is it UTF8 data stored in SQL_ASCII or is it
LATIN1 (ISO-8859) ? or some sort of Big5 or euc_cn ?  or what?  if it is already UTF8 data, are you sure that there are
noinvalid encodings  
accidentally stored?   Postgres with SQL_ASCII does no character
validation...  if its all USASCII (0x00 to 0x7F) then you're OK."

Question: How can I pull out the real character set information from the database? does it rely on the understanding of
businessknowledge? 

so, the real data stored in database decides if we need to use iconv to convert them to UTF-8. If data is  USASCII,
thenthe pg_dump/restore process you provided should be sufficient and complete. Do I understand this correctly? 

Besides, we do have different  encoding database on same server instance, see below:

Welcome to psql 8.3.11, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

postgres=# \l
                 List of databases
            Name             |  Owner   | Encoding
-----------------------------+----------+-----------
 admin                     | dba      | UTF8
 pgdb_1_sky                      | pgdb_1      | SQL_ASCII
 pgdb_1_sky_utf8                 | pgdb_1      | UTF8
 pgdb_1_ca                      | pgdb_1      | SQL_ASCII
 pgdb_1_us                      | pgdb_1      | SQL_ASCII
 pgdb_sky                  | pgdb  | SQL_ASCII
 pgdb_sky_users            | pgdb  | SQL_ASCII
 pgdb_sky_users_utf8       | pgdb  | UTF8
 pgdb_sky_utf8             | pgdb  | UTF8
 pgdb_sky_utf8_86465_old   | pgdb  | UTF8

Thanks,
Suya
-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of John R Pierce
Sent: Friday, October 18, 2013 4:12 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] [ADMIN] what's the efficient/safest way to convert database character set ?

On 10/17/2013 9:49 PM, Huang, Suya wrote:
> Yes John, we probably will use a new database server here to accommodate those converted database.
>
> By saying export/import, do you mean by :
> 1. pg_dump  (//should I specify -E  UTF 8 to dump the data in UTF-8
> encoding?) 2. create database xxx -E UTF8 3. pg_restore

I don't believe 8.3 supported multiple different encodings on the same server instance, thats relatively new.

before you can import your SQL_ASCII data, you need to know what charset
the data is actually in.    Is it UTF8 data stored in SQL_ASCII or is it
LATIN1 (ISO-8859) ? or some sort of Big5 or euc_cn ?  or what?  if it is already UTF8 data, are you sure that there are
noinvalid encodings  
accidentally stored?   Postgres with SQL_ASCII does no character
validation...  if its all USASCII (0x00 to 0x7F) then you're OK.

I would strongly recommend this new database server be running a
currently supported version, I'd probably use 9.2.   configure the old
server to allow the postgres user on the new server to connect and log on, and while logged onto the new server, run
somethinglike... 

     pg_dump -h oldserver -d olddatabase -u postgres -E oldencoding -s -f olddatabase.schema.sql
     pg_dump -h oldserver -d olddatabase -u postgres -E oldencoding -a -f olddatabase.data.sql
     createuser newuser
     createdb -O newuser -l en_US.utf8 newdbname
     psql -d newdbname -u newuser -f olddatabase.schema.sql
     psql -d newdbname -u newuser -f olddatabase.data.sql

if the data import fails due to a invalid encoding, then you may have to pass the .data.sql file through iconv (and
removethe set client_encoding sql commands from it) 




--
john r pierce                                      37N 122W
somewhere on the middle of the left coast



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


В списке pgsql-general по дате отправления:

Предыдущее
От: John R Pierce
Дата:
Сообщение: Re: [ADMIN] what's the efficient/safest way to convert database character set ?
Следующее
От: John R Pierce
Дата:
Сообщение: Re: [ADMIN] what's the efficient/safest way to convert database character set ?