Обсуждение: invalid UTF-8 byte sequences and iconv

Поиск
Список
Период
Сортировка

invalid UTF-8 byte sequences and iconv

От
Karen Springer
Дата:
Hi,

We have set up a new server and are needing to move our database from
7.3 to 8.1.4.  On restore I'm getting the 'invalid UTF-8 byte sequence'
error message.  If I use the command iconv -c -f UTF-8 -t UTF-8 -o
cleanfile.sql dumpfile.sql, then the characters are deleted and the
restore goes smoothly.  The problem is that we want those characters.
They are for example the degree symbol and the micro symbol.  Is there
anyway to bring these characters over?  Thanks in advance.

Karen

Re: invalid UTF-8 byte sequences and iconv

От
Alvaro Herrera
Дата:
Karen Springer wrote:
> Hi,
>
> We have set up a new server and are needing to move our database from
> 7.3 to 8.1.4.  On restore I'm getting the 'invalid UTF-8 byte sequence'
> error message.  If I use the command iconv -c -f UTF-8 -t UTF-8 -o
> cleanfile.sql dumpfile.sql, then the characters are deleted and the
> restore goes smoothly.  The problem is that we want those characters.
> They are for example the degree symbol and the micro symbol.  Is there
> anyway to bring these characters over?  Thanks in advance.

Huh, maybe using the real source encoding instead?  Try, for example,
using Latin-1.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: invalid UTF-8 byte sequences and iconv

От
Ivo Rossacher
Дата:
In earlier version of postgres the database did allow to store invalid byte
sequences. The newer versions do check correctly for the byte sequences and
do not allow invalid sequences. So if your dump is really in UTF8 already you
will have to search for the invalid sequences in the dump and replace them
with the correct one. (if you have a lot of them and a big dump recode might
by of help for you). If the dump is not UTF8 you have to pass the correct
encoding to iconv in the procedure you described.

Best regards
Ivo
Am Dienstag, 25. Juli 2006 21.04 schrieb Karen Springer:
> Hi,
>
> We have set up a new server and are needing to move our database from
> 7.3 to 8.1.4.  On restore I'm getting the 'invalid UTF-8 byte sequence'
> error message.  If I use the command iconv -c -f UTF-8 -t UTF-8 -o
> cleanfile.sql dumpfile.sql, then the characters are deleted and the
> restore goes smoothly.  The problem is that we want those characters.
> They are for example the degree symbol and the micro symbol.  Is there
> anyway to bring these characters over?  Thanks in advance.
>
> Karen
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

Re: invalid UTF-8 byte sequences and iconv

От
Thusitha Kodikara
Дата:
Hello,

We had a similar problem a few weeks back converting from PostgreSQL 7.3.10 to 7.4.13. After trying various methods, including iconv, we found that the one that worked in our case was to manually fix the data - which was only about 15 records fortunately.

BTW thanks Ivo for suggesing that solution at that instance also :)

Regards,
-Thusitha


Ivo Rossacher <rossacher@bluewin.ch> wrote:
In earlier version of postgres the database did allow to store invalid byte
sequences. The newer versions do check correctly for the byte sequences and
do not allow invalid sequences. So if your dump is really in UTF8 already you
will have to search for the invalid sequences in the dump and replace them
with the correct one. (if you have a lot of them and a big dump recode might
by of help for you). If the dump is not UTF8 you have to pass the correct
encoding to iconv in the procedure you described.

Best regards
Ivo
Am Dienstag, 25. Juli 2006 21.04 schrieb Karen Springer:
> Hi,
>
> We have set up a new server and are needing to move our database from
> 7.3 to 8.1.4. On restore I'm getting the 'invalid UTF-8 byte sequence'
> error message. If I use the command iconv -c -f UTF-8 -t UTF-8 -o
> cleanfile.sql dumpfile.sql, then the characters are deleted and the
> restore goes smoothly. The problem is that we want those characters.
> They are for example the degree symbol and the micro symbol. Is there
> anyway to bring these characters over? Thanks in advance.
>
> Karen
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster