Re: error while trying to change the database encoding on a database

Поиск
Список
Период
Сортировка
От Geoffrey Myers
Тема Re: error while trying to change the database encoding on a database
Дата
Msg-id 4D3DCB8B.4060400@serioustechnology.com
обсуждение исходный текст
Ответ на Re: error while trying to change the database encoding on a database  (Adrian Klaver <adrian.klaver@gmail.com>)
Ответы Re: error while trying to change the database encoding on a database
Список pgsql-general
Adrian Klaver wrote:
> On 01/24/2011 09:16 AM, Geoffrey Myers wrote:
>
>>
>> We hope to identify the characters and fix them in the existing
>> database, then convert. It appears to be very limited, but it would help
>> if there was some way to identify these characters outside of simply
>> doing the reload of the data and finding the errors.
>>
>> Hence the reason I asked about a resource that might identify the
>> characters.
>
> The problem is that from the standpoint of the SQL_ASCII database there
> is nothing wrong with the characters per se. AFAIK there is no built in
> function to validate characters. The reason is that valid is determined
> by the encoding and if you know the encoding then you really don't need
> to determine validity. If you want to see one way others have tackled
> this, search on iconv in the mailing list archive. This requires working
> on an external copy of the data and knowing something about the
> encodings involved. The nearest I could ever find to an encoding
> detector is:
>
> http://chardet.feedparser.org/
>
> It is a Python program and the encodings it detects are limited but it
> might work for you.
>
> Given all the above, when I was faced with the problem you are facing I
> found it easiest to make an educated guess as to the original encoding
> and then do test restores with client_encoding set to my guess.

Understood.  We had figured the problem to be small, and it appears it
is and thus felt we could address it a character at a time.  Then get
this error:

pg_restore: [archiver (db)] Error from TOC entry 5258; 0 17549 TABLE
DATA fax postgres
pg_restore: [archiver (db)] COPY failed: ERROR:  invalid byte sequence
for encoding "UTF8": 0xe28053

That hex value doesn't translate to a single character.  I've dumped the
data to a file as you suggested, but reviewing the identified line
brings no joy.

--
Until later, Geoffrey

"I predict future happiness for America if they can prevent
the government from wasting the labors of the people under
the pretense of taking care of them."
- Thomas Jefferson

В списке pgsql-general по дате отправления:

Предыдущее
От: Fredric Fredricson
Дата:
Сообщение: Re: (Hopefully stupid) select question.
Следующее
От: Fredric Fredricson
Дата:
Сообщение: Re: (Hopefully stupid) select question.