Обсуждение: UTF8 frustrations
Trying to do pg_restore from one UTF8 encoded db to another UTF8 encoded db DB_source: Red Hat Enterprise Linux AS release 4 (Nahant Update 4) psql 8.2.4 DB Destination: Debian GNU/Linux 4.0 psql 8.1.9 I've tried: pg_dump from the source box & from destination box from destination server pg_dump -i -h source_server db > db.8.1.9.sql No matter which way I issue pg_dump command it always fails on the same record. ERROR: pg_restore: ERROR: invalid byte sequence for encoding "UTF8": 0xdf69 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". CONTEXT: COPY logs, line 69238382 pg_restore: [archiver (db)] error returned by PQendcopy: ERROR: invalid byte sequence for encoding "UTF8": 0xdf69 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". CONTEXT: COPY logs, line 69238382 UTF8 to UTF8 and everything in between is UTF8, and it still fails, I'm tempted to delete this record and keep going but I'm determined to find a solution. TIA for any help, Jesse Waters
On Sep 4, 2007, at 14:48 , jesse.waters@gmail.com wrote: > Trying to do pg_restore from one UTF8 encoded db to another UTF8 > encoded db > DB_source: > Red Hat Enterprise Linux AS release 4 (Nahant Update 4) > psql 8.2.4 > > DB Destination: > Debian GNU/Linux 4.0 > psql 8.1.9 > UTF8 to UTF8 and everything in between is UTF8, and it still fails, > I'm tempted to delete this record and keep going but I'm determined > to find a solution. (1) PostgreSQL does not support dump and restore from a new version of PostgreSQL to an older version. (2) There were changes to UTF8 handling between 8.1 and 8.2. I'm a bit surprised you get an error like this as 8.1 is more lax than 8.2: it accepts some invalid sequences as valid. If you're really determined to do this, I recommend editing the dump file by hand to remove or edit this record so it does not contain the invalid sequence (as you've suggested). Michael Glaesemann grzm seespotcode net
On 9/4/07, jesse.waters@gmail.com <jesse.waters@gmail.com> wrote: > Trying to do pg_restore from one UTF8 encoded db to another UTF8 encoded db > DB_source: > Red Hat Enterprise Linux AS release 4 (Nahant Update 4) > psql 8.2.4 > > DB Destination: > Debian GNU/Linux 4.0 > psql 8.1.9 > > I've tried: > pg_dump from the source box & from destination box > > from destination server > pg_dump -i -h source_server db > db.8.1.9.sql Are you issuing the dump and psql / restore command on the same machine? As previously mentioned, dumping from newer to older is not supported. pg_dump from 8.1 might not understand the data structures it finds in an 8.2 db, and pg_dump from 8.2 might create a dump that 8.1 doesn't support. Note that going the other way you are encouraged to use pg_dump from 8.2 to dump the 8.1 database for importing to 8.2. However, In this case I'm guessing that the problem is that you've got different client encodings on each end. i.e. you're dumping with one encoding setting and restoring with another. Note that pgsql autoconverts from the server's encoding to the client's encoding at the request of the client. so, you'll need to check your client encoding from psql on the source and target machines to see if they match. show client_encoding ; will tell you what your client encoding is.
Thanks all for the quick replies. Here is the latest issue, to verify that the pg_dump works, I'm going to do dump and restore on the same host/cluster. Source: DB_source: Red Hat Enterprise Linux AS release 4 (Nahant Update 4) psql 8.2.4 Destination: same machine different db name echo $LANG en_US.UTF-8 SET client_encoding = 'UTF8'; Command used: pg_dump -Fc srcdb > db.dump pg_restore -d devdb db.dump Results, same error. Now I'm really concerned. pg_restore: [archiver (db)] Error while PROCESSING TOC: pg_restore: [archiver (db)] Error from TOC entry 1625; 0 16680 TABLE DATA logs watersj pg_restore: [archiver (db)] COPY failed: ERROR: invalid byte sequence for encoding "UTF8": 0xdf69 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding CONTEXT: COPY logs, line 69238382 WARNING: errors ignored on restore: 1 Jesse Waters
On 9/5/07, jesse.waters@gmail.com <jesse.waters@gmail.com> wrote: > Thanks all for the quick replies. > > SET client_encoding = 'UTF8'; > > Results, same error. Now I'm really concerned. How / where are you setting the client_encoding? I'm wonder if it's really set when the pg_dump / pg_restore commands are being issued, or if you're inheriting a different encoding along with the new connection it gets. You can try: alter user set client_encoding='UTF8' for the postgres user to see if that helps.
jesse.waters@gmail.com wrote: > > Here is the latest issue, to verify that the pg_dump works, I'm going > to do dump and restore on the same host/cluster. > > Source: > DB_source: > Red Hat Enterprise Linux AS release 4 (Nahant Update 4) > psql 8.2.4 > Destination: > same machine different db name > > echo $LANG > en_US.UTF-8 > > SET client_encoding = 'UTF8'; > > Command used: > pg_dump -Fc srcdb > db.dump > pg_restore -d devdb db.dump > > Results, same error. Now I'm really concerned. > > pg_restore: [archiver (db)] Error while PROCESSING TOC: > pg_restore: [archiver (db)] Error from TOC entry 1625; 0 16680 TABLE DATA logs watersj > pg_restore: [archiver (db)] COPY failed: ERROR: invalid byte sequence for encoding "UTF8": 0xdf69 > HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding > CONTEXT: COPY logs, line 69238382 > WARNING: errors ignored on restore: 1 I can recreate this behaviour with 8.2.4 (UTF-8). psql> CREATE TABLE test (id serial PRIMARY KEY, val text); psql> INSERT INTO test (val) VALUES (E'\xdf\x69'); psql> \q $ pg_dump -F c -f x.dmp -t test testdb $ pg_restore -c -d testdb x.dmp pg_restore: [archiver (db)] Error while PROCESSING TOC: pg_restore: [archiver (db)] Error from TOC entry 1840; 0 45883 TABLE DATA test laurenz pg_restore: [archiver (db)] COPY failed: ERROR: invalid byte sequence for encoding "UTF8": 0xdf69 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". CONTEXT: COPY test, line 1 WARNING: errors ignored on restore: 1 The problem is that in (at least) one record in your table watersj.logs, there is a corrupt string. Unfortunately (as demonstrated above) it is possible to enter corrupt data into a PostgreSQL database, this is what must have happened in your case. I suggest that you identify and correct this string in the original database, then everything should work fine. You can extract the offending row from the dump, that should help to identify it. 69238382 rows is a little unwieldy, but tools like awk can help: pg_restore db.dump | awk '/^COPY logs /,/^\\\.$/ { if (lineno==69238382) print $0; ++lineno }' I think there is the desire to fix problems like this in 8.3, but I don't think that's done yet. Yours, Laurenz Albe