Re: patch: utf8_to_unicode (trivial)

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: patch: utf8_to_unicode (trivial)
Дата
Msg-id 1281719926-sup-5928@alvh.no-ip.org
обсуждение исходный текст
Ответ на Re: patch: utf8_to_unicode (trivial)  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: patch: utf8_to_unicode (trivial)  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Excerpts from Robert Haas's message of vie ago 13 12:50:13 -0400 2010:
> On Fri, Aug 13, 2010 at 12:11 PM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:
> > src/include/port.h?
> 
> Oh, hey, look at that.  Any thought on what to about the fact that our
> two existing copies of utf2ucs() don't match?  (one tests against 0xf8
> where the other against 0xf0)

I'm not sure why it's masking 0xf8 instead of 0xf0.  It seems like c &
0xf8 == 0xf8 signals start of a 5-byte sequence which is not valid per
RFC 3629, according to wikipedia:
http://en.wikipedia.org/wiki/UTF-8#Description

(Moreover, 0xf5 to 0xf7 signal start of a 4-byte sequence for codepoints
that apparently are not supposed to be valid).

So apparently it's good that the code returns an invalid code in those
cases, i.e. wchar.c is right and mbprint is wrong.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: more numeric stuff
Следующее
От: Tom Lane
Дата:
Сообщение: Re: patch: utf8_to_unicode (trivial)