Обсуждение: Something like pg_dump for 7.4

Поиск
Список
Период
Сортировка

Something like pg_dump for 7.4

От
"Gregory S. Williamson"
Дата:
I am trying to migrate a modest sized database (the unload file is just under a gigabyte) which is moderately complex,
usingschemas and the like from 7.4 to 8.1 

Alas, pg_dump appears to incompatable with my needs as it keeps introducing invalid UTF-8 characters.

I have gone back and changed the offending data rows with SQL to make damned sure there are no illegal characters. I
redump,compress the thing, scp it and try to reload. Same error every time: 
psql:glacier_load.sql:1017: ERROR:  invalid UTF-8 byte sequence detected near byte 0x96
CONTEXT:  COPY clients, line 1751, column client_name: "XYZ Media  AB&E"
psql:glacier_load.sql:3019: ERROR:  current transaction is aborted, commands ignored until end of transaction block

See that double space after "Media" ? That's actually a single space, a dash, and another space: "XYZ Media - AB&E",
butevery export turns it into: 
XYZ Media <96> AB&E

Is there any way to do this ? I can't edit the resulting dump file by hand -- it's huge. And pg_dump on 7.4 does not
knowhow to a table inside a schema -- it seems to be all or nothing. 

TIA,

Greg Williamson
DBA
GlobeXplorer LLC





Re: Something like pg_dump for 7.4

От
Tom Lane
Дата:
"Gregory S. Williamson" <gsw@globexplorer.com> writes:
> I am trying to migrate a modest sized database (the unload file is just under a gigabyte) which is moderately
complex,using schemas and the like from 7.4 to 8.1 
> Alas, pg_dump appears to incompatable with my needs as it keeps introducing invalid UTF-8 characters.

It's quite unlikely that pg_dump is at fault.  Rather the data you are
dumping is probably wrong.  There were some errors in pre-8.1 versions
that caused them to accept byte sequences that are not actually legal
UTF8 strings.  8.1 has tightened up that code and so will reject data
that 7.4 thought was legal.

Check the archives for some previous discussions about reasonable ways
to fix bad data --- I seem to recall iconv being mentioned as one
possible filter for your dump file.

> Is there any way to do this ? I can't edit the resulting dump file by hand -- it's huge. And pg_dump on 7.4 does not
knowhow to a table inside a schema -- it seems to be all or nothing. 

You could use the 8.1 pg_dump against the 7.4 server if you need more
flexibility.

            regards, tom lane

Re: Something like pg_dump for 7.4

От
Artur Pietruk
Дата:
On Thu, Feb 16, 2006 at 09:11:13PM -0800, Gregory S. Williamson wrote:
> See that double space after "Media" ? That's actually a single space, a dash, and another space: "XYZ Media - AB&E",
butevery export turns it into: 
> XYZ Media <96> AB&E

    I'm not sure if I got your problem correctly here - but if you
really want to just replace " <96> " with " - ", then maybe do it with a
small sed script? Like sed -f mysed.sed mydump >mydump_fixed, where
mysed would be:

===8<===
s/ <96> / - /g
===8<===

    Just make sure <96> is the real char copy/pasted from the source
- you could copy it e.g. by opening vi with two windows, one with
'mydump', and another with 'mysed.sed'

    But it does not guarantee there are no other broken chars in
your dump. But catching them one by one, and getting also help with
iconv as Tom mentioned, you should be able to fix that - good luck!

--
--- Artur Pietruk, arturp@plukwa.net

Re: Something like pg_dump for 7.4

От
"Gregory S. Williamson"
Дата:
Thanks for the suggestion -- I'll try the iconv search and see.

What's odd is that I have *for sure* fixed the offending entries in the source table, but when I do a pg_dump they're
backlike the proverbial bad penny in the dump file. 

Certainly a reason for migrating, regardless of how painful it is/

G

-----Original Message-----
From:    Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent:    Thu 2/16/2006 10:19 PM
To:    Gregory S. Williamson
Cc:    pgsql-admin@postgresql.org
Subject:    Re: [ADMIN] Something like pg_dump for 7.4
"Gregory S. Williamson" <gsw@globexplorer.com> writes:
> I am trying to migrate a modest sized database (the unload file is just under a gigabyte) which is moderately
complex,using schemas and the like from 7.4 to 8.1 
> Alas, pg_dump appears to incompatable with my needs as it keeps introducing invalid UTF-8 characters.

It's quite unlikely that pg_dump is at fault.  Rather the data you are
dumping is probably wrong.  There were some errors in pre-8.1 versions
that caused them to accept byte sequences that are not actually legal
UTF8 strings.  8.1 has tightened up that code and so will reject data
that 7.4 thought was legal.

Check the archives for some previous discussions about reasonable ways
to fix bad data --- I seem to recall iconv being mentioned as one
possible filter for your dump file.

> Is there any way to do this ? I can't edit the resulting dump file by hand -- it's huge. And pg_dump on 7.4 does not
knowhow to a table inside a schema -- it seems to be all or nothing. 

You could use the 8.1 pg_dump against the 7.4 server if you need more
flexibility.

            regards, tom lane

!DSPAM:43f56b0a203851333710190!





Re: Something like pg_dump for 7.4

От
"Gregory S. Williamson"
Дата:
Tom, et. al.,

pg_dump was not the issue as you surmised. Bad data, fixed in some cases in the original db and in some cases with a
sedscript. I tried a pg 8.1 pg_dump and that in fact helped diagnose the (human) error. 

Thanks once again for the help.

gsw


-----Original Message-----
From:    Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent:    Thu 2/16/2006 10:19 PM
To:    Gregory S. Williamson
Cc:    pgsql-admin@postgresql.org
Subject:    Re: [ADMIN] Something like pg_dump for 7.4
"Gregory S. Williamson" <gsw@globexplorer.com> writes:
> I am trying to migrate a modest sized database (the unload file is just under a gigabyte) which is moderately
complex,using schemas and the like from 7.4 to 8.1 
> Alas, pg_dump appears to incompatable with my needs as it keeps introducing invalid UTF-8 characters.

It's quite unlikely that pg_dump is at fault.  Rather the data you are
dumping is probably wrong.  There were some errors in pre-8.1 versions
that caused them to accept byte sequences that are not actually legal
UTF8 strings.  8.1 has tightened up that code and so will reject data
that 7.4 thought was legal.

Check the archives for some previous discussions about reasonable ways
to fix bad data --- I seem to recall iconv being mentioned as one
possible filter for your dump file.

> Is there any way to do this ? I can't edit the resulting dump file by hand -- it's huge. And pg_dump on 7.4 does not
knowhow to a table inside a schema -- it seems to be all or nothing. 

You could use the 8.1 pg_dump against the 7.4 server if you need more
flexibility.

            regards, tom lane

!DSPAM:43f56b0a203851333710190!





Re: Something like pg_dump for 7.4

От
Tom Lane
Дата:
"Gregory S. Williamson" <gsw@globexplorer.com> writes:
> What's odd is that I have *for sure* fixed the offending entries in the source table, but when I do a pg_dump they're
backlike the proverbial bad penny in the dump file. 

Sounds to me like you're not dumping the same database that you're fixing...
check port numbers and so on.

            regards, tom lane