Re: Encoding issues

Поиск
Список
Период
Сортировка
От Patrice Hédé
Тема Re: Encoding issues
Дата
Msg-id 20011010200307.L14587@idf.net
обсуждение исходный текст
Ответ на Encoding issues  (Tatsuo Ishii <t-ishii@sra.co.jp>)
Список pgsql-hackers
* Tatsuo Ishii <t-ishii@sra.co.jp> [011010 18:21]:
> Receiving a request to add ISO 8859-15 and 16, I review the multibyte
> support code and found several errors in it.
> 
> 1) There is a confusion between "LATIN5" and ISO 8859-5. LATIN5 is not
>    ISO 8859-5, but is actually ISO 8859-9. Should we rename LATIN5 to
>    "ISO8859-5" (or whatever) as the encoding name? I think we should.
>    For your information, here are the correct mapping between ISO
>    8859-n and LATINn.
> 
>    ISO 8859-1  LATIN1
>    ISO 8859-2  LATIN2
>    ISO 8859-3  LATIN3
>    ISO 8859-4  LATIN4
>    ISO 8859-9  LATIN5
>    ISO 8859-10 LATIN6

ISO-8859-14 LATIN 8
ISO-8859-15 LATIN 9 or LATIN 0
ISO-8859-16 LATIN 10

:)

> 2) The leading characters for some Cyrillic charsets are wrong.
> 
> Currently they are defined as:
> 
> #define LC_KOI8_R    0x8c    /* Cyrillic KOI8-R */
> #define LC_KOI8_U    0x8c    /* Cyrillic KOI8-U */
> #define LC_ISO8859_5    0x8d    /* ISO8859 Cyrillic */
> 
> These should be:
> 
> #define LC_KOI8_R    0x8b    /* Cyrillic KOI8-R */
> #define LC_KOI8_U    0x8b    /* Cyrillic KOI8-U */
> #define LC_ISO8859_5    0x8c    /* ISO8859 Cyrillic */
> 
>     The impact of correcting them would be for users who are storing
>     their data into database using MULE internal code. I think they
>     are quite few people using MULE internal code. So we could correct
>     them for 7.2.
> 
> Comments?
> 
> BTW, should we support ISO 8859-6 and beyond for 7.2? There have been
> some requests to do that. Supporting them are actually trivial works,
> should be one day job. The harder part is writing conversion function
> between encodings. However, there is very few demands to do that, I
> guess. If so, we could ommit the conversion capability for 7.2.
> Comments?

I think iso-8859-15 and 16 are important, if only because they are the
only two encodings which support the Euro (not speaking of unicode, of
course !), and at least iso-8859-15 has some official status in
western europe (on Unix systems at least... Windows users have their
own table where the Euro sign is stored somewhere else, I think at
0x80).

I have done the conversion for the mappings to and from unicode, but
you could get the original tables at :

http://www.unicode.org/Public/MAPPINGS/ISO8859/

(you can get iso-8859-10, 13 and 14 there as well ! 10 is supposed to
be for greenlandic and sámi, 13 for the baltic rim, and 14 for gaelic)

Just found on google the following link, where you can see quite a few
charsets (it doesn't have -16, too new probably) :

http://www.kostis.net/charsets/

Patrice

-- 
Patrice Hédé
email: patrice hede à islande org
www  : http://www.islande.org/


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Patrice Hédé
Дата:
Сообщение: Re: Mule internal code ?
Следующее
От: Patrice Hédé
Дата:
Сообщение: iso-8859-15/16 to MULE