Обсуждение: Mule internal code ?

Поиск
Список
Период
Сортировка

Mule internal code ?

От
Patrice Hédé
Дата:
Hi,

As said in another mail, I have tried to add iso-8859-15 (Latin 9) &
iso-8859-16 (Latin 10) to PostgreSQL, I think I have done mostly all
that's necessary. But I miss two things :

- latin92mic/mic2latin9/latin102mic/mic2latin10 in conv.c
- the leading character value in pg_wchar.h

I don't know anything about MULE except that it's some Emacs standard
(why they didn't go for Unicode is beyond my understanding, is
off-topic on this list, and had probably some good argument at the
time).

Can someone point me to where I should look for that ? is it as easy
as iso-8859-2/3/4 support, or do I need to do something as iso-8859-5 ?

Thank you :)

Patrice.

-- 
Patrice HÉDÉ ------------------------------- patrice à islande org ----- --  Isn't it weird  how scientists  can
imagine all the matter of the
 
universe exploding out of a dot smaller than the head of a pin, but they
can't come up with a more evocative name for it than "The Big Bang" ? -- What would _you_ call the creation of the
universe? -- "The HORRENDOUS SPACE KABLOOIE !"               - Calvin and Hobbes
 
------------------------------------------ http://www.islande.org/ -----


Re: Mule internal code ?

От
Tatsuo Ishii
Дата:
> As said in another mail, I have tried to add iso-8859-15 (Latin 9) &
> iso-8859-16 (Latin 10) to PostgreSQL, I think I have done mostly all
> that's necessary. But I miss two things :

ISO-8859-15 and 16! I don't know anything beyond ISO-8859-10. Can you
give me any pointer (URL) explaining what they are?

> - latin92mic/mic2latin9/latin102mic/mic2latin10 in conv.c
> - the leading character value in pg_wchar.h
>
> I don't know anything about MULE except that it's some Emacs standard
> (why they didn't go for Unicode is beyond my understanding, is
> off-topic on this list, and had probably some good argument at the
> time).

Probably this is because Unicode is not perfect at all. For example,
the concept "encode everything in two-bytes" is obviously broken
down, some charsets (for example JIS X 0213) are not supported at all,
etc. etc...

> Can someone point me to where I should look for that ? is it as easy
> as iso-8859-2/3/4 support, or do I need to do something as iso-8859-5 ?

Docs for MULE internal code come with XEmacs. For example, see:

ftp://ftp.xemacs.org/pub/xemacs/docs/letter/internals-letter.pdf.gz

http://www.lns.cornell.edu/public/COMP/info/xemacs/internals/internals_15.html#SEC83

etc.
--
Tatsuo Ishii


Re: Mule internal code ?

От
Patrice Hédé
Дата:
* Tatsuo Ishii <t-ishii@sra.co.jp> [011010 18:20]:
> > As said in another mail, I have tried to add iso-8859-15 (Latin 9) &
> > iso-8859-16 (Latin 10) to PostgreSQL, I think I have done mostly all
> > that's necessary. But I miss two things :
> 
> ISO-8859-15 and 16! I don't know anything beyond ISO-8859-10. Can you
> give me any pointer (URL) explaining what they are?

http://www.evertype.com/sc2wg3.html

It links to files describing iso-8859-14 to 16.

14 is gaelic support, which I've never seen used (of course, I don't
speak irish, so that's probably why :) ), and it has nothing to do
with the euro.

15 is a "modernised" version of iso-8859-1. It removes some
not-so-widely used characters (currency place-holder, fraction
characters), to replace them with the euro sign, the french oe, OE,
and Y diaeresis, and the finnish/estonian s/S caron and z/Z caron.

That's the official 8-bit charset for western europe now (btw, the
other name is latin9, or latin0, as it's supposed to replace
iso8859-1, which is now what should be called a legacy encoding).

16 is quite new. It's supposed to do the same as iso-8859-15, but for
central europe countries. It had support for the euro sign, the
romanian language (t comma below, s comma below), but I've read
somewhere that it has lost support for two or three other central
europe countries... go figure...

> > - latin92mic/mic2latin9/latin102mic/mic2latin10 in conv.c
> > - the leading character value in pg_wchar.h
> >
> > I don't know anything about MULE except that it's some Emacs standard
> > (why they didn't go for Unicode is beyond my understanding, is
> > off-topic on this list, and had probably some good argument at the
> > time).
> 
> Probably this is because Unicode is not perfect at all. For example,
> the concept "encode everything in two-bytes" is obviously broken
> down, some charsets (for example JIS X 0213) are not supported at all,
> etc. etc...

Well, for the history iso-10646 was 32 bits from the beginning, and
Unicode didn't say that it was only 16 bits, though, to be fair, the
Unicode consortium said it didn't believe it would need more than 16
bits.

BTW, now, there is a statement that they wouldn't go above 0x10ffff,
which gives a bit more than 1 million characters... I think it should
be enough this time (but who knows !?).

Regarding the *main* issue with Unicode, which is support of japanese
kanji vs chinese (in the CJK unification), I must admit I don't know
the details, but arguments of both sides seem to be valid. I must
admit I would say "add the japanese version of the characters", since
it's not lack of space which is the problem now. But things like this
will get solved with time, and it really seems like Unicode will
achieve the so much needed charset unity it's been made for :)

> > Can someone point me to where I should look for that ? is it as
> > easy as iso-8859-2/3/4 support, or do I need to do something as
> > iso-8859-5 ?
> 
> Docs for MULE internal code come with XEmacs. For example, see:
> 
> ftp://ftp.xemacs.org/pub/xemacs/docs/letter/internals-letter.pdf.gz
> 
> http://www.lns.cornell.edu/public/COMP/info/xemacs/internals/internals_15.html#SEC83

Unfortunately, these explain the principles behind mule, not the way
to encode them from/to another character set :/

Patrice

-- 
Patrice Hédé
email: patrice hede à islande org
www  : http://www.islande.org/


Re: Mule internal code ?

От
Tatsuo Ishii
Дата:
> > ISO-8859-15 and 16! I don't know anything beyond ISO-8859-10. Can you
> > give me any pointer (URL) explaining what they are?
> 
> http://www.evertype.com/sc2wg3.html
> 
> It links to files describing iso-8859-14 to 16.
[snip] 
Thanks for the info.

> Well, for the history iso-10646 was 32 bits from the beginning, and
> Unicode didn't say that it was only 16 bits, though, to be fair, the
> Unicode consortium said it didn't believe it would need more than 16
> bits.
> 
> BTW, now, there is a statement that they wouldn't go above 0x10ffff,
> which gives a bit more than 1 million characters... I think it should
> be enough this time (but who knows !?).
> 
> Regarding the *main* issue with Unicode, which is support of japanese
> kanji vs chinese (in the CJK unification), I must admit I don't know
> the details, but arguments of both sides seem to be valid. I must
> admit I would say "add the japanese version of the characters", since
> it's not lack of space which is the problem now. But things like this
> will get solved with time, and it really seems like Unicode will
> achieve the so much needed charset unity it's been made for :)

IMHO we should not rely on particular encodings/charsets, including
Unicode (or ISO 10646), MULE internal code or whatever. My plan for
supporting CREATE CHARCTER SET etc. stuffs would be truly *neutral* to
any encodings/charsets.

> > > Can someone point me to where I should look for that ? is it as
> > > easy as iso-8859-2/3/4 support, or do I need to do something as
> > > iso-8859-5 ?
> > 
> > Docs for MULE internal code come with XEmacs. For example, see:
> > 
> > ftp://ftp.xemacs.org/pub/xemacs/docs/letter/internals-letter.pdf.gz
> > 
> > http://www.lns.cornell.edu/public/COMP/info/xemacs/internals/internals_15.html#SEC83
> 
> Unfortunately, these explain the principles behind mule, not the way
> to encode them from/to another character set :/

Please take look at "15.3.1 Internal String Encoding."
--
Tatsuo Ishii