Re: Encoding issues
От | Patrice Hédé |
---|---|
Тема | Re: Encoding issues |
Дата | |
Msg-id | 20011010200307.L14587@idf.net обсуждение исходный текст |
Ответ на | Encoding issues (Tatsuo Ishii <t-ishii@sra.co.jp>) |
Список | pgsql-hackers |
* Tatsuo Ishii <t-ishii@sra.co.jp> [011010 18:21]: > Receiving a request to add ISO 8859-15 and 16, I review the multibyte > support code and found several errors in it. > > 1) There is a confusion between "LATIN5" and ISO 8859-5. LATIN5 is not > ISO 8859-5, but is actually ISO 8859-9. Should we rename LATIN5 to > "ISO8859-5" (or whatever) as the encoding name? I think we should. > For your information, here are the correct mapping between ISO > 8859-n and LATINn. > > ISO 8859-1 LATIN1 > ISO 8859-2 LATIN2 > ISO 8859-3 LATIN3 > ISO 8859-4 LATIN4 > ISO 8859-9 LATIN5 > ISO 8859-10 LATIN6 ISO-8859-14 LATIN 8 ISO-8859-15 LATIN 9 or LATIN 0 ISO-8859-16 LATIN 10 :) > 2) The leading characters for some Cyrillic charsets are wrong. > > Currently they are defined as: > > #define LC_KOI8_R 0x8c /* Cyrillic KOI8-R */ > #define LC_KOI8_U 0x8c /* Cyrillic KOI8-U */ > #define LC_ISO8859_5 0x8d /* ISO8859 Cyrillic */ > > These should be: > > #define LC_KOI8_R 0x8b /* Cyrillic KOI8-R */ > #define LC_KOI8_U 0x8b /* Cyrillic KOI8-U */ > #define LC_ISO8859_5 0x8c /* ISO8859 Cyrillic */ > > The impact of correcting them would be for users who are storing > their data into database using MULE internal code. I think they > are quite few people using MULE internal code. So we could correct > them for 7.2. > > Comments? > > BTW, should we support ISO 8859-6 and beyond for 7.2? There have been > some requests to do that. Supporting them are actually trivial works, > should be one day job. The harder part is writing conversion function > between encodings. However, there is very few demands to do that, I > guess. If so, we could ommit the conversion capability for 7.2. > Comments? I think iso-8859-15 and 16 are important, if only because they are the only two encodings which support the Euro (not speaking of unicode, of course !), and at least iso-8859-15 has some official status in western europe (on Unix systems at least... Windows users have their own table where the Euro sign is stored somewhere else, I think at 0x80). I have done the conversion for the mappings to and from unicode, but you could get the original tables at : http://www.unicode.org/Public/MAPPINGS/ISO8859/ (you can get iso-8859-10, 13 and 14 there as well ! 10 is supposed to be for greenlandic and sámi, 13 for the baltic rim, and 14 for gaelic) Just found on google the following link, where you can see quite a few charsets (it doesn't have -16, too new probably) : http://www.kostis.net/charsets/ Patrice -- Patrice Hédé email: patrice hede à islande org www : http://www.islande.org/
В списке pgsql-hackers по дате отправления: