Re: error while trying to change the database encoding on a database

Поиск
Список
Период
Сортировка
От Adrian Klaver
Тема Re: error while trying to change the database encoding on a database
Дата
Msg-id 201101240820.17047.adrian.klaver@gmail.com
обсуждение исходный текст
Ответ на Re: error while trying to change the database encoding on a database  (Geoffrey Myers <lists@serioustechnology.com>)
Ответы Re: error while trying to change the database encoding on a database
Список pgsql-general
On Monday 24 January 2011 8:06:38 am Geoffrey Myers wrote:
> Adrian Klaver wrote:
> > On Monday 24 January 2011 7:57:52 am Geoffrey Myers wrote:
> >> Adrian Klaver wrote:
> >>> On Monday 24 January 2011 6:38:55 am Geoffrey Myers wrote:
> >>>> We need to change the database encoding on our databases as they were
> >>>> created with the wrong encoding.  They were created as SQL_ASCII and
> >>>> we are changing them to UTF8.
> >>>>
> >>>> When testing this Friday, I received the following error:
> >>>>
> >>>> pg_restore: [archiver (db)] Error while PROCESSING TOC:
> >>>> pg_restore: [archiver (db)] Error from TOC entry 5225; 0 16990 TABLE
> >>>> DATA cust postgres
> >>>> pg_restore: [archiver (db)] COPY failed: ERROR:  invalid byte sequence
> >>>> for encoding "UTF8": 0xb0
> >>>> HINT:  This error can also happen if the byte sequence does not match
> >>>> the encoding expected by the server, which is controlled by
> >>>> "client_encoding".
> >>>> CONTEXT:  COPY cust, line 778
> >>>
> >>>                         ^^^^^^^ In the COPY command for that table.
> >>
> >> I picked up ont that, but the dump is binary, thus I can not view the
> >> actual code.
> >
> > Actually you can :) I should have mentioned it before. You can have
> > pg_restore restore to a file instead of a database by using the -f
> > switch. When you do that it creates plain text output. You could restore
> > the entire dump to the file or use the -t switch to get only the table
> > you need.
>
> Thanks for the suggestion.  As it stands, we are getting different
> errors for different hex characters, thus the solution we need is the
> ability to identify the characters that won't convert from SQL_ASCII to
> UTF8.  Is there a resource that would identify these characters?
>

Well the issue is that SQL_ASCII is not an encoding. From the docs:
http://www.postgresql.org/docs/9.0/interactive/multibyte.html#MULTIBYTE-CHARSET-SUPPORTED
"Thus, this setting is not so much a declaration that a specific encoding is in
use, as a declaration of ignorance about the encoding. In most cases, if you
are working with any non-ASCII data, it is unwise to use the SQL_ASCII setting
because PostgreSQL will be unable to help you by converting or validating
non-ASCII characters. "

What you need to do is determine what applications where putting data into the
database and what encoding they are using. I ran into this a couple of years
back with an app that was using WIN1252 for data being inserted into a couple
of tables in a SQL_ASCII database . Once I knew the encoding I dumped the table
schema only for those tables into a new UTF8 database. Using psql I set the
client_encoding to WIN1252 and then used \i to pull in a plain text data only
dump for each table.


>
> --
> Until later, Geoffrey
>
> "I predict future happiness for America if they can prevent
> the government from wasting the labors of the people under
> the pretense of taking care of them."
> - Thomas Jefferson



--
Adrian Klaver
adrian.klaver@gmail.com

В списке pgsql-general по дате отправления:

Предыдущее
От: Geoffrey Myers
Дата:
Сообщение: Re: error while trying to change the database encoding on a database
Следующее
От: Geoffrey Myers
Дата:
Сообщение: Re: error while trying to change the database encoding on a database