Обсуждение: PG export/import encoding issue

Поиск
Список
Период
Сортировка

PG export/import encoding issue

От
"Scott Toland"
Дата:

Hi guys

 

I have a 8.4 install that I am moving to a new DB server running 9.0. This is all well and god for the most part, except when it comes to restoring the data in several of the tables. See the problem is the old schema was an ASCII neutral zone, and the new server has the schema set for UTF-8 for i18n compliance. Over the years, people have inserted data in a wide range of languages in many encodings, none of it tracked. This of course leads to the dreaded encoding errors on import that, ,with normal COPY mechanics, result in empty tables where there should be thousands of rows.

 

Switching to INSERTS means I get to keep most of the table, and just lose the rows with encoding errors. Not great, but manageable – the real killer with this method is that an import takes hours, which we cannot allow.

 

Is there a way to make pg_restore escape out chars it has problems with instead of failing? Alternatively, what is the best method to quickly and accurately import this database onto the new server? I have tried sql and custom pg_dump formats, with (not surprisingly) the custom format being the heads-and-shoulders winner in terms of performance

 

Thanks a bunch guys

 

Scott Toland

 

Re: PG export/import encoding issue

От
Jens Wilke
Дата:
On Monday 19 December 2011 17:15:49 Scott Toland wrote:

> the problem is the old schema was
> an ASCII neutral zone, and the new server has the schema set for UTF-8

You have to convert the Dump:
http://blog.endpoint.com/2011/12/sanitizing-supposed-utf-8-data.html

HTH, Jens