Re: ERROR: could not convert UTF-8 character 0x00ef to ISO8859-1 possiblesolution

Поиск
Список
Период
Сортировка
От Anders Hermansen
Тема Re: ERROR: could not convert UTF-8 character 0x00ef to ISO8859-1 possiblesolution
Дата
Msg-id 20050427115434.GB30285@online.no
обсуждение исходный текст
Ответ на ERROR: could not convert UTF-8 character 0x00ef to ISO8859-1 possiblesolution  (Mauricio Hernández Durán <mhernandez@ingenian.com>)
Ответы Re: ERROR: could not convert UTF-8 character 0x00ef to ISO8859-1 possiblesolution  (Guillaume Cottenceau <gc@mnc.ch>)
Re: ERROR: could not convert UTF-8 character 0x00ef to ISO8859-1 possiblesolution  (Vadim Nasardinov <vadimn@redhat.com>)
Список pgsql-jdbc
* Guillaume Cottenceau (gc@mnc.ch) wrote:
> Anders Hermansen <anders 'at' yoyo.no> writes:
> > * Guillaume Cottenceau (gc@mnc.ch) wrote:
> > > Isn't there a problem with your UTF-8 data containing 0x00EF?
> >
> > E0 to EF hex (224 to 239): first byte of a three-byte sequence.
>
> Well 00 is first byte here, isn't it?

UTF-8 is a byte sequence, so it's not about the first byte in the whole
sequence. But about the first byte in a tree byte sequece.

There should be no nul (0) bytes when encoding UTF-8. I believe this is in the
specification to allow it to be compatible with C nul-terminated strings.

I believe that the byte sequence 0x00EF i illegal UTF-8 because:
1) It contains nul (0x00) byte
2) 0xEF is not followed by two more bytes

On the other hand U+00EF is a valid unicode code point. Which points to:
LATIN SMALL LETTER I WITH DIAERESIS
It is encoded as 0xC3AF in UTF-8
As 0x00EF in UTF-16 (and UCS-2 ?)
As 0xEF in ISO-8859-1


Anders Hermansen

В списке pgsql-jdbc по дате отправления:

Предыдущее
От: Dave Cramer
Дата:
Сообщение: Re: _pg_keyposition is gone in HEAD
Следующее
От: Markus Schaber
Дата:
Сообщение: Re: ERROR: could not convert UTF-8 character 0x00ef to ISO8859-1