Re: Java's Unicode Notation

Поиск
Список
Период
Сортировка
От Tatsuo Ishii
Тема Re: Java's Unicode Notation
Дата
Msg-id 20011111190422Y.t-ishii@sra.co.jp
обсуждение исходный текст
Ответ на Re: Beta going well  ("Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at>)
Список pgsql-hackers
From: Jean-Michel POURE <jm.poure@freesurf.fr>
Subject: Java's Unicode Notation 
Date: Thu, 08 Nov 2001 14:12:04 +0100
Message-ID: <4.2.0.58.20011108141018.00a59dc0@pop.freesurf.fr>

> Dear Tatsuo,
> 
> Could it be possible to use the Java Unicode Notation to define UTF-8 
> strings in PostgreSQL 7.2.

No. It's too late. We are in the beta freeze stage.

> Information can be found on http://czyborra.com/utf/
> 
> Do you think it is hard to implement?
> 
> Best regards,
> Jean-Michel POURE
> 
> ************************************************
> Java's Unicode Notation
> There are some less compact but more readable ASCII transformations the 
> most important of which is the Java Unicode Notation as allowed in Java 
> source code and processed by Java's native2ascii converter:
> putwchar(c)
> {
> if (c >= 0x10000) {
> printf ("\\u%04x\\u%04x" , 0xD7C0 + (c >> 10), 0xDC00 | c & 0x3FF);
> }
> else if (c >= 0x100) printf ("\\u%04x", c);
> else putchar (c);
> }
> The advantage of the \u20ac notation is that it is very easy to type it in 
> on any old ASCII keyboard and easy to look up the intended character if you 
> happen to have a copy of the Unicode book or the 
> {unidata2,names2,unihan}.txt files from the Unicode FTP site or CD-ROM or 
> know what U+20AC is the �.
> What's not so nice about the \u20ac notation is that the small letters are 
> quite unusual for Unicode characters, the backslashes have to be quoted for 
> many Unix tools, the four hexdigits without a terminator may appear merged 
> with the following word as in \u00a333 for ��33, it is unclear when and how 
> you have to escape the backslash character itself, 6 bytes for one 
> character may be considered wasteful, and there is no way to clearly 
> present the characters beyond \uffff without \ud800\udc00 surrogates, and 
> last but not least the plain hexnumbers may not be very helpful.
> JAVA is one of the target and source encodings of yudit and its uniconv 
> converter.
> 


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: compiling libpq++ on Solaris with Sun SPRO6U2 (fixed
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: Error on stock postgresql-tcl-7.1.3-2.i386.rpm included