Re: [PATCHES] Unicode combining characters

Поиск
Список
Период
Сортировка
От Tatsuo Ishii
Тема Re: [PATCHES] Unicode combining characters
Дата
Msg-id 20011015102619W.t-ishii@sra.co.jp
обсуждение исходный текст
Ответ на Re: Unicode combining characters  ("Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at>)
Список pgsql-hackers
I have committed part of Patrice's patches with minor fixes.
Uncommitted changes are related to the backend side, and the reason
could be found in the previous discussions (basically this is due to
the fact that current regex code does not support UTF-8 chars >=
0x10000). Instead pg_veryfymbstr() now rejects UTF-8 chars >= 0x10000.
--
Tatsuo Ishii

> Hi,
> 
> I should have sent the patch earlier, but got delayed by other stuff.
> Anyway, here is the patch:
> 
> - most of the functionality is only activated when MULTIBYTE is
>   defined,
> 
> - check valid UTF-8 characters, client-side only yet, and only on
>   output, you still can send invalid UTF-8 to the server (so, it's
>   only partly compliant to Unicode 3.1, but that's better than
>   nothing).
> 
> - formats with the correct number of columns (that's why I made it in
>   the first place after all), but only for UNICODE. However, the code
>   allows to plug-in routines for other encodings, as Tatsuo did for
>   the other multibyte functions.
> 
> - corrects a bit the UTF-8 code from Tatsuo to allow Unicode 3.1
>   characters (characters with values >= 0x10000, which are encoded on
>   four bytes).
> 
> - doesn't depend on the locale capabilities of the glibc (useful for
>   remote telnet).
> 
> I would like somebody to check it closely, as it is my first patch to
> pgsql.  Also, I created dummy .orig files, so that the two files I
> created are included, I hope that's the right way.
> 
> Now, a lot of functionality is NOT included here, but I will keep that
> for 7.3 :) That includes all string checking on the server side (which
> will have to be a bit more optimised ;) ), and the input checking on
> the client side for UTF-8, though that should not be difficult. It's
> just to send the strings through mbvalidate() before sending them to
> the server. Strong checking on UTF-8 strings is mandatory to be
> compliant with Unicode 3.1+ .
> 
> Do I have time to look for a patch to include iso-8859-15 for 7.2 ?
> The euro is coming 1. january 2002 (before 7.3 !) and over 280
> millions people in Europe will need the euro sign and only iso-8859-15
> and iso-8859-16 have it (and unfortunately, I don't think all Unices
> will switch to Unicode in the meantime)....
> 
> err... yes, I know that this is not every single person in Europe that
> uses PostgreSql, so it's not exactly 280m, but it's just a matter of
> time ! ;)
> 
> I'll come back (on pgsql-hackers) later to ask a few questions
> regarding the full unicode support (normalisation, collation,
> regexes,...) on the server side :)
> 
> Here is the patch !
> 
> Patrice.
> 
> -- 
> Patrice H�D� ------------------------------- patrice � islande org -----
>   --  Isn't it weird  how scientists  can imagine  all the matter of the
> universe exploding out of a dot smaller than the head of a pin, but they
> can't come up with a more evocative name for it than "The Big Bang" ?
>   -- What would _you_ call the creation of the universe ?
>   -- "The HORRENDOUS SPACE KABLOOIE !"               - Calvin and Hobbes
> ------------------------------------------ http://www.islande.org/ -----


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tatsuo Ishii
Дата:
Сообщение: Re: pg_client_encoding
Следующее
От: Lincoln Yeoh
Дата:
Сообщение: Re: Pre-forking backend