Re: Bug in UTF8-Validation Code?

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: Bug in UTF8-Validation Code?
Дата
Msg-id 20070404155032.GH8549@alvh.no-ip.org
обсуждение исходный текст
Ответ на Re: Bug in UTF8-Validation Code?  (Tatsuo Ishii <ishii@postgresql.org>)
Ответы Re: Bug in UTF8-Validation Code?  (Tatsuo Ishii <ishii@postgresql.org>)
Список pgsql-hackers
Tatsuo Ishii wrote:

> BTW, every encoding has its own charset. However the relationship
> between encoding and charset are not so simple as Unicode. For
> example, encoding EUC_JP correponds to multiple charsets, namely
> ASCII, JIS X 0201, JIS X 0208 and JIS X 0212. So a function which
> returns a "code point" is not quite usefull since it lacks the charset
> info. I think we need to continute design discussion, probably
> targetting for 8.4, not 8.3.

Is Unicode complete as far as Japanese chars go?  I mean, is there a
character in EUC_JP that is not representable in Unicode?

Because if Unicode is complete, ISTM it makes perfect sense to have a
unicode_char() (or whatever we end up calling it) that takes an Unicode
code point and returns a character in whatever JIS set you want
(specified by setting client_encoding to that).  Because then you solved
the problem nicely.


One thing that I find confusing in your text above is whether EUC_JP is
an encoding or a charset?  I would think that the various JIS X are
encodings, and EUC_JP is the charset; or is it the other way around?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tatsuo Ishii
Дата:
Сообщение: Re: Bug in UTF8-Validation Code?
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: Bug in UTF8-Validation Code?