Re: utf-8 and cultural sensitive sorting

Поиск
Список
Период
Сортировка
От Tatsuo Ishii
Тема Re: utf-8 and cultural sensitive sorting
Дата
Msg-id 20050713.100728.41628839.t-ishii@sra.co.jp
обсуждение исходный текст
Ответ на Re: utf-8 and cultural sensitive sorting  (Alex Stapleton <alexs@advfn.com>)
Список pgsql-general
> It depends what language you want to sort. Lots of languages do not
> have a sort alphabet. For example, Japanese. It can be quite
> difficult to sort unusual languages like this. I am not aware of any
> standard technique for sorting Japanese text other than keeping an
> arbitrarily sorted dictionary (courtesy of whatever the most popular
> Japanese dictionary at the time happens to be perhaps) and then doing
> hash lookups in the for indexing values. As you can imagine, this is
> not particularly fast. I have not actually tried this, but I expect
> PosgreSQL will simply sort in a fairly binary fashion. As in, it gets
> sorted in according to the binary value of the characters, or the
> UTF-8 offsets, or something like that.

Above is almost correct but usually sorting by the JIS code order is
enough for most Japanese applications (I believe same thing can be
said to Chinese). I do not recommend using locale for sorting
Japanese. It quite frequently happens that the locale support for
multibyte encodings is totally broken. See recent posting titled
"[GENERAL] Japanese words not distinguished" for more details.

If you have to live with UTF-8 database, I recommend turning off the
locale support and use CONVERT to sort Japanese. For example,

SELECT * FROM t1 ORDER BY CONVERT(col1 USING utf_8_to_euc_jp);

> On 12 Jul 2005, at 15:48, <sknipe@tucows.com> <sknipe@tucows.com> wrote:
>
> > Our product will be storing its character data in utf-8 format
> > (unicode encoding).
> >
> > What is the best way to achive cultural sensitive sorting using the
> > utf-8 data?
> >
> > Is it possible have the locale apply to a connection?
> >
> > If so, is the cultural sorting support mature in PostgreSQL?
> >
> > What type of performance can be expected as compared with the
> > normal c locale sorting?
> >
> > Thanks very much,
> >
> > Steve.
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 1: if posting/reading through Usenet, please send an appropriate
> >       subscribe-nomail command to majordomo@postgresql.org so that
> > your
> >       message can get through to the mailing list cleanly
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

В списке pgsql-general по дате отправления:

Предыдущее
От: Michael Fuhr
Дата:
Сообщение: Re: Temp tables...
Следующее
От: Tatsuo Ishii
Дата:
Сообщение: Re: Japanese words not distinguished