Re: error while trying to change the database encoding on a database

Поиск
Список
Период
Сортировка
От Geoffrey Myers
Тема Re: error while trying to change the database encoding on a database
Дата
Msg-id 4D3DB3FE.305@serioustechnology.com
обсуждение исходный текст
Ответ на Re: error while trying to change the database encoding on a database  (Adrian Klaver <adrian.klaver@gmail.com>)
Ответы Re: error while trying to change the database encoding on a database
Re: error while trying to change the database encoding on a database
Список pgsql-general
Adrian Klaver wrote:
> On Monday 24 January 2011 8:06:38 am Geoffrey Myers wrote:
>> Adrian Klaver wrote:
>>> On Monday 24 January 2011 7:57:52 am Geoffrey Myers wrote:
>>>> Adrian Klaver wrote:
>>>>> On Monday 24 January 2011 6:38:55 am Geoffrey Myers wrote:
>>>>>> We need to change the database encoding on our databases as they were
>>>>>> created with the wrong encoding.  They were created as SQL_ASCII and
>>>>>> we are changing them to UTF8.
>>>>>>
>>>>>> When testing this Friday, I received the following error:
>>>>>>
>>>>>> pg_restore: [archiver (db)] Error while PROCESSING TOC:
>>>>>> pg_restore: [archiver (db)] Error from TOC entry 5225; 0 16990 TABLE
>>>>>> DATA cust postgres
>>>>>> pg_restore: [archiver (db)] COPY failed: ERROR:  invalid byte sequence
>>>>>> for encoding "UTF8": 0xb0
>>>>>> HINT:  This error can also happen if the byte sequence does not match
>>>>>> the encoding expected by the server, which is controlled by
>>>>>> "client_encoding".
>>>>>> CONTEXT:  COPY cust, line 778
>>>>>                         ^^^^^^^ In the COPY command for that table.
>>>> I picked up ont that, but the dump is binary, thus I can not view the
>>>> actual code.
>>> Actually you can :) I should have mentioned it before. You can have
>>> pg_restore restore to a file instead of a database by using the -f
>>> switch. When you do that it creates plain text output. You could restore
>>> the entire dump to the file or use the -t switch to get only the table
>>> you need.
>> Thanks for the suggestion.  As it stands, we are getting different
>> errors for different hex characters, thus the solution we need is the
>> ability to identify the characters that won't convert from SQL_ASCII to
>> UTF8.  Is there a resource that would identify these characters?
>>
>
> Well the issue is that SQL_ASCII is not an encoding. From the docs:
> http://www.postgresql.org/docs/9.0/interactive/multibyte.html#MULTIBYTE-CHARSET-SUPPORTED
> "Thus, this setting is not so much a declaration that a specific encoding is in
> use, as a declaration of ignorance about the encoding. In most cases, if you
> are working with any non-ASCII data, it is unwise to use the SQL_ASCII setting
> because PostgreSQL will be unable to help you by converting or validating
> non-ASCII characters. "
>
> What you need to do is determine what applications where putting data into the
> database and what encoding they are using. I ran into this a couple of years
> back with an app that was using WIN1252 for data being inserted into a couple
> of tables in a SQL_ASCII database . Once I knew the encoding I dumped the table
> schema only for those tables into a new UTF8 database. Using psql I set the
> client_encoding to WIN1252 and then used \i to pull in a plain text data only
> dump for each table.

We hope to identify the characters and fix them in the existing
database, then convert.  It appears to be very limited, but it would
help if there was some way to identify these characters outside of
simply doing the reload of the data and finding the errors.

Hence the reason I asked about a resource that might identify the
characters.

>
>
>> --
>> Until later, Geoffrey
>>
>> "I predict future happiness for America if they can prevent
>> the government from wasting the labors of the people under
>> the pretense of taking care of them."
>> - Thomas Jefferson
>
>
>


--
Until later, Geoffrey

"I predict future happiness for America if they can prevent
the government from wasting the labors of the people under
the pretense of taking care of them."
- Thomas Jefferson

В списке pgsql-general по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: error while trying to change the database encoding on a database
Следующее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: error while trying to change the database encoding on a database