Re: Patch for collation using ICU

Поиск
Список
Период
Сортировка
От John Hansen
Тема Re: Patch for collation using ICU
Дата
Msg-id 5066E5A966339E42AA04BA10BA706AE50A930B@rodrick.geeknet.com.au
обсуждение исходный текст
Ответ на Patch for collation using ICU  (Palle Girgensohn <girgen@pingpong.net>)
Ответы Re: Patch for collation using ICU  (Alvaro Herrera <alvherre@dcc.uchile.cl>)
Список pgsql-hackers
Tatsuo Ishii wrote:
> Sent: Sunday, May 08, 2005 10:09 AM
> To: John Hansen
> Cc: pgman@candle.pha.pa.us; girgen@pingpong.net; 
> pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Patch for collation using ICU
> 
> > Bruce Momjian wrote:
> > > 
> > > There are two reasons for that optimization --- first, 
> some locale 
> > > support is broken and Unicode encoding with a C locale 
> crashes (not 
> > > an issue for ICU), and second, it is an optimization for 
> languages 
> > > like Japanese that want to use unicode, but don't need a locale 
> > > because upper/lower means nothing in those character sets.
> > 
> > No, upper/lower means nothing in those languages, so why would you 
> > need to optimize upper/lower if they're not used??
> > And if they are, it's obviously because the text contains 
> characters 
> > from other languages (probably english) and as such they 
> should behave 
> > correctly.
> 
> Yes, Japanese (and probably Chinese and Korean) languages 
> include ASCII character. More precisely ASCII is part of Japanese
> encodings(LATIN1 is not, however). And we have no problem at 
> all with glibc/C locale. See below("unitest" is an UNICODE database).
> 
> unitest=# create table t1(t text);
> CREATE TABLE
> unitest=# \encoding EUC_JP
> unitest=# insert into t1 values('abcあいう');
> INSERT 1842628 1
> unitest=# select upper(t) from t1;
>    upper   
> -----------
>  ABCあいう
> (1 row)
> 
> So Japanese(including ASCII)/UNICODE behavior is perfectly 
> correct at this moment. 

Right, so you _never_ use accented ascii characters in Japanese? 
(like è for example, whose uppercase is È)

> So I strongly object removing that optimization.

I'm guessing this would call for a vote then, since if implementing ICU, then
I'd have to object to leaving it in.

Changing the bahaviour of ICU doesn't seem right. Changing the behaviour of pg, 
so that it works as it should when using unicode, seems the right solution to me.

> --
> Tatsuo Ishii
> 
> 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "John Hansen"
Дата:
Сообщение: Re: Patch for collation using ICU
Следующее
От: "John Hansen"
Дата:
Сообщение: Re: [GENERAL] Invalid unicode in COPY problem