Postgres Backup and Restore

Поиск
Список
Период
Сортировка
От Mikel Lindsaar
Тема Postgres Backup and Restore
Дата
Msg-id 57a815bf0802090345p1c65fa12r622e4b4de062eb82@mail.gmail.com
обсуждение исходный текст
Ответы Re: Postgres Backup and Restore
Список pgsql-admin
Hello List,

I have done some searching and looking at all the documentation I
could find, and I have found a (what i consider) a bug in pg_dump
pg_restore that I wanted to bounce off the list to find their
solutions.

I have a database which gets updated from many locations and it looks
like we have gotten some character encoding mis matches in the system.
 I am going through and fixing these as this is my problem, not
postgres's, however, in finding this bug in our data, I found a
situation with postgres.

If you have bad character encodings in your data, you can not backup
and restore that data using pg_dump and pg_restore, on Windows Server
2003, using compress or plain text modes.  Both modes fail on the bad
encoding when you try to restore.  I found this out while doing a test
run on our backup and restore process, this data had crept in over a
recent period and got into the database somehow - I am finding out how
:)

I see this as a bug because the data got in there and postgres
accepted it on the copy in, but then refuses to accept it on the
restore, but I wanted to find out what the list thought.

The database is 8.2, running on Windows Server 2003.  The database is
UTF8 as reported by \l in psql.

Now I have managed to do a backup and test restore, but I did this by
writing a ruby script that went through the plain text version of the
backup line by line and split out the good from the bad by attempting
to parse each line through Iconv.  It was not a fast process, but it
did result in me finding the 120 lines that contained bad data and get
them out of the file's COPY tables.

I then had to go and individually correct each of these fields to
handle the incorrect encodings.

Now, the problem is, I should be able to restore my backup, regardless
of this sort of problem.  The problem characters were all in text
fields and so should be able to be restored.

How are other admins handling such a situation?  What backup tools do you use?

I know on Oracle I can do a Binary hot backup which just bypasses the
whole problem.

But as it stands right now, I will have to keep this script handy or
constantly double and triple check encodings.

Note, I know the bad stuff shouldn't get in there to begin with, but
that is not the point here.  I should be able to restore to the state
I made the backup in.

Anyway, comments? Ideas?

I have attached some of the encodings below if anyone is interested.

Regards

Mikel
http://lindsaar.net/


"\201L J\303\203\342\200\234ZSEF U."
"\201N\t\\N\tHU\t2084\t20"
"DON\303\203\302\201T U."
"\201ROS\t\\N\tHU\t3580\t"
"\215HE 25\t\\N\t\\N\tHAM"
"\215GER\t\\N\t\\N\t2001-"
"\201SHINDEN 955-2\t\\"
"\215NBUCH\t\\N\tDE\t710"

В списке pgsql-admin по дате отправления:

Предыдущее
От: Bruce McAlister
Дата:
Сообщение: Re: Postgres 8.2 install on solaris 10
Следующее
От: Mike Blackwell
Дата:
Сообщение: Character encoding conversion