Обсуждение: Strange UTF8 issue with pg_dump/pg_restore

Поиск
Список
Период
Сортировка

Strange UTF8 issue with pg_dump/pg_restore

От
"Benjamin Krajmalnik"
Дата:
I have a database whose records are inserted via ODBC.
I turned on logging, and the client is explicitly setting the client encoding to UTF.
 
Rows are inserted into the tables.  When I do a pg_dump, I do not get any errors.  When I try tor restore, I am getting an error regarding an invalid UTF-8 character sequence.
Of course, this renders compressed backups useless, since I have no way of manually fixing the file.  The process of fixing the file is, of course, painful.
 
Is this a known issue?  I am running PG 8.1.4.  The problem happened on PG Windows.
Tomorrow I will try to see if the problem exists in the FreeBSD compile.  If I get it to reproduce, I will be able to send some data dumps, but will need to know who to send it to.
 
Regards,
 
Benjamin
 
 

Re: Strange UTF8 issue with pg_dump/pg_restore

От
"Aaron Bono"
Дата:
On 8/7/06, Benjamin Krajmalnik <kraj@illumen.com> wrote:
I have a database whose records are inserted via ODBC.
I turned on logging, and the client is explicitly setting the client encoding to UTF.
 
Rows are inserted into the tables.  When I do a pg_dump, I do not get any errors.  When I try tor restore, I am getting an error regarding an invalid UTF-8 character sequence.
Of course, this renders compressed backups useless, since I have no way of manually fixing the file.  The process of fixing the file is, of course, painful.

Why can't you manually fix it?  I know you cannot fix it while compressed but you should be able to fix it after it is uncompressed, right?

Have you tried to set the encoding just like the client before you do a restore from backup?

==================================================================
   Aaron Bono
   Aranya Software Technologies, Inc.
   http://www.aranya.com
   http://codeelixir.com
==================================================================

Re: Strange UTF8 issue with pg_dump/pg_restore

От
Jim Nasby
Дата:
On Aug 7, 2006, at 9:08 PM, Benjamin Krajmalnik wrote:
> I have a database whose records are inserted via ODBC.
> I turned on logging, and the client is explicitly setting the
> client encoding to UTF.
>
> Rows are inserted into the tables.  When I do a pg_dump, I do not
> get any errors.  When I try tor restore, I am getting an error
> regarding an invalid UTF-8 character sequence.
> Of course, this renders compressed backups useless, since I have no
> way of manually fixing the file.  The process of fixing the file
> is, of course, painful.
>
> Is this a known issue?  I am running PG 8.1.4.  The problem
> happened on PG Windows.
> Tomorrow I will try to see if the problem exists in the FreeBSD
> compile.  If I get it to reproduce, I will be able to send some
> data dumps, but will need to know who to send it to.

Are the dumps from an older version? There's a known issue that older
versions weren't doing proper checking, in which case you can have
invalid UTF8 characters. The fix is to run the dump through iconv.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Strange UTF8 issue with pg_dump/pg_restore

От
"Jim Nasby"
Дата:
Please include the list in your replies so other people can help.
-----Original Message-----
From: Benjamin Krajmalnik [mailto:kraj@illumen.com]
Sent: Tuesday, August 08, 2006 10:31 PM
To: Jim Nasby
Subject: RE: [ADMIN] Strange UTF8 issue with pg_dump/pg_restore

I had corrupt rows inserted on 8.1.4/Windows.  I had truncated the table, and it startd collecting data again.
I have since moved to 8.1.4/FreeBSD.  I have been running the data for 2 days now, so the odds of having a corrupt record is pretty good if the problem exists.  I will wait a few more days and will pg_dump the table which had the problems and attempt to restor it on a dev server to see if I see a problem.
 
Is iconv a utility that comes woth PG?  How can I run the data through it (I am doing pg_dump in custom format).
 
No, it's a unix utility. You'd have to dump in plain text output, and then run that through iconv. There's info about this in the 8.1 release notes.


From: Jim Nasby [mailto:jnasby@pervasive.com]
Sent: Tue 8/8/2006 10:48 AM
To: Benjamin Krajmalnik
Cc: pgsql admin
Subject: Re: [ADMIN] Strange UTF8 issue with pg_dump/pg_restore

On Aug 7, 2006, at 9:08 PM, Benjamin Krajmalnik wrote:
> I have a database whose records are inserted via ODBC.
> I turned on logging, and the client is explicitly setting the 
> client encoding to UTF.
>
> Rows are inserted into the tables.  When I do a pg_dump, I do not 
> get any errors.  When I try tor restore, I am getting an error 
> regarding an invalid UTF-8 character sequence.
> Of course, this renders compressed backups useless, since I have no 
> way of manually fixing the file.  The process of fixing the file 
> is, of course, painful.
>
> Is this a known issue?  I am running PG 8.1.4.  The problem 
> happened on PG Windows.
> Tomorrow I will try to see if the problem exists in the FreeBSD 
> compile.  If I get it to reproduce, I will be able to send some 
> data dumps, but will need to know who to send it to.

Are the dumps from an older version? There's a known issue that older 
versions weren't doing proper checking, in which case you can have 
invalid UTF8 characters. The fix is to run the dump through iconv.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461