Обсуждение: Move data between two databases SQL-ASCII to UTF8
I need to convert my database to UTF8. Is there a way to do a SELECT ... INSERT from the old database table to the new one? Would the INSERT correct data errors between the two data types? I only have 10 tables and the biggest has < 8000 rows.
Running Version 8.1.4 on Redhat 9
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
Margaret Gillon, IS Dept., Chromalloy Los Angeles, ext. 297
This e-mail message and any attachment(s) are for the sole use of the intended recipient(s) and may contain proprietary and/or confidential information which may be privileged or otherwise protected from disclosure. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient(s), please contact the sender by reply email and destroy the original message and any copies of the message as well as any attachment(s) to the original message.
2007/2/8, MargaretGillon@chromalloy.com <MargaretGillon@chromalloy.com>: > > I need to convert my database to UTF8. Is there a way to do a SELECT ... > INSERT from the old database table to the new one? Would the INSERT correct > data errors between the two data types? I only have 10 tables and the > biggest has < 8000 rows. Use pg_dump to dump the db and use iconv on the generated file: iconv -f ASCII -t UTF-8 mydb.dump -o mydb_utf8.dump <GUESS> If the characters are strictly ASCII (<=127) then the conversion will not be necessary. But if there are characters bigger than 127 then the conversion will have to be made from iso-8859-1 to utf-8: iconv -f ISO_8859-1 -t UTF-8 mydb.dump -o mydb_utf8.dump </GUESS> Regards, -- Clodoaldo Pinto Neto
			
				On 2/8/07, Clodoaldo <clodoaldo.pinto.neto@gmail.com> wrote:
Wouldn't it be adequate to set the client encoding to SQL_ASCII in the dump file (if that was infact the encoding on the original database)?
SET client_encoding TO SQL_ASCII;
And then let the database do the conversion? I would think since the db is UTF8 and the client is claiming SQL_ASCII then it would convert the data to UTF8.
I have done this in the past with SQL dumps that had characters that UTF8 didn't like, and I just added the "SET client_encoding TO LATIN1;" since I knew the source encoding was LATIN1.
--
Chad
http://www.postgresqlforums.com/
		
	Use pg_dump to dump the db and use iconv on the generated file:
iconv -f ASCII -t UTF-8 mydb.dump -o mydb_utf8.dump
Wouldn't it be adequate to set the client encoding to SQL_ASCII in the dump file (if that was infact the encoding on the original database)?
SET client_encoding TO SQL_ASCII;
And then let the database do the conversion? I would think since the db is UTF8 and the client is claiming SQL_ASCII then it would convert the data to UTF8.
I have done this in the past with SQL dumps that had characters that UTF8 didn't like, and I just added the "SET client_encoding TO LATIN1;" since I knew the source encoding was LATIN1.
--
Chad
http://www.postgresqlforums.com/
On Thu, Feb 08, 2007 at 08:22:40PM -0500, Chad Wagner wrote: > On 2/8/07, Clodoaldo <clodoaldo.pinto.neto@gmail.com> wrote: > >Use pg_dump to dump the db and use iconv on the generated file: > > > >iconv -f ASCII -t UTF-8 mydb.dump -o mydb_utf8.dump Converting the data from ASCII to UTF-8 doesn't make much sense: if the data is ASCII then it doesn't need conversion; if the data needs conversion then it isn't ASCII. > Wouldn't it be adequate to set the client encoding to SQL_ASCII in the dump > file (if that was infact the encoding on the original database)? http://www.postgresql.org/docs/8.2/interactive/multibyte.html#AEN24118 "If the client character set is defined as SQL_ASCII, encoding conversion is disabled, regardless of the server's character set." As Clodoaldo mentioned, if the data is strictly ASCII then no conversion is necessary because the UTF-8 representation will be the same. If you set client_encoding to SQL_ASCII and the data contains non-ASCII characters that aren't valid UTF-8 then you'll get the error 'invalid byte sequence for encoding "UTF8"'. In that case set client_encoding to whatever encoding the data is really in; likely guesses for Western European languages are LATIN1, LATIN9, or perhaps WIN1252. -- Michael Fuhr