I am migrating a database from 7.4 in SQL_ASCII encoding to 8.1 in UTF8.
I made a pg_dump of the 7.4 database. I had difficulty (there are
invalid UTF8 characters in the original database, like 0xb9) going
straight into 8.1 with UTF8, so I tried importing it in a temporary 8.1
cluster that I set to be SQL_ASCII encoding. That import went fine.
So, basically, I am now trying to move data from 8.1 in SQL_ASCII to 8.1
in UTF8. I know that the text fields in UTF8 can handle the invalid
sequences because I can do:
=> create table foo(t text);
CREATE TABLE
=> insert into foo values(E'a\xb9c');
INSERT 0 1
=> insert into foo values('abc');
INSERT 0 1
=> select t,length(t) from foo;
t | length
-----+--------
ac | 3
abc | 3
That's how I want to import the data. I want the application to behave
as much like before as possible, so I would not like to strip the binary
characters.
Is there a way to get pg_dump to use the escape sequences instead of
writing the binary value? Is what I'm trying to do dangerous?
I am still investigating how the application filters the data. If it
sends the binary character inside the query, is there any way to make a
UTF8-encoded database accept that? Do I have to create a separate
database encoded with SQL_ASCII?
Regards,
Jeff Davis