Re: Patch for collation using ICU

Поиск

Список

Период

Сортировка

От	Palle Girgensohn
Тема	Re: Patch for collation using ICU
Дата	26 марта 2005 г. 05:10:03
Msg-id	55C6D914B6055CD5721BEC40@palle.girgensohn.se обсуждение исходный текст
Ответ на	Patch for collation using ICU (Palle Girgensohn <girgen@pingpong.net>)
Ответы	Re: Patch for collation using ICU (Stephan Szabo <sszabo@megazone.bigpanda.com>) Re: Patch for collation using ICU (Hannu Krosing <hannu@tm.ee>)
Список	pgsql-hackers

Дерево обсуждения

--On fredag, mars 25, 2005 00.40.04 +0100 Palle Girgensohn
<girgen@pingpong.net> wrote:

> Hi!
>
> I've put together a patch for using IBM's ICU package for collation.
>
> If your OS does not have full support for collation ur
> uppercase/lowercase in multibyte locales, this might be useful. If you
> are using a multibyte character encoding in your database and want
> collation, i.e. order by, and also lower(), upper() and initcap() to work
> properly, this patch will do just that.
>
> This patch is needed for FreeBSD, since this OS has no support for
> collation of for example unicode locales (that is, wcscoll(3) does not do
> what you expect if you set LC_ALL=sv_SE.UTF-8, for example). AFAIK the
> patch is *not* necessary for Linux, although IBM claims ICU collation to
> be about twice as fast as glibc for simple western locales.
>
> It adds a configure switch, `--with-icu', which will set up the code to
> use ICU instead of wchar_t and wcscoll.
>
> This has been tested only on FreeBSD-4.11 & FreeBSD-5-stable, where it
> seems to run well. I've not had the time to do any comparative
> performance tests yet, but it seems it is at least not slower than using
> LATIN1 with sv_SE.ISO8859-1 locale, perhaps even faster.
>
> I'd be delighted if some more experienced postgresql hackers would review
> this stuff. The patch is pretty compact, so it's fast reading :)  I'm
> planning to add this patch as an option (tagged "experimental") to
> FreeBSD's postgresql port. Any ideas about whether this is a good idea or
> not?
>
> Any thoughts or ideas are welcome!
>
> Cheers,
> Palle
>
> Patch at:
> <http://people.freebsd.org/~girgen/postgresql-icu/pg-801-icu-2005-03-14.d
> iff>
>
> ICU at sourceforge: <http://icu.sf.net/>


Hi!

There's a new patch to fix some reported problems.

<http://people.freebsd.org/~girgen/postgresql-icu/pg-801-icu-2005-03-26.diff>

This version uses the DatabaseEncoding and sets the ICU encoding at the
same time. I had to create a conversion table from PostgreSQL's own,
somewhat odd and non-standard, names of encodings, into the prefered IANA
names. On or two of the more odd ones might be slightly incorrect,
hopefully not too far off anyway?

I've noticed a couple of things about using the ICU patch vs. pristine
pg-8.0.1:

- ORDER BY is case insensitive when using ICU. This might break the SQL
standard (?), but sure is nice :)

- When the database is initialized using the C locale, upper() and lower()
normally does not work at all for non-ASCII characters even if the
database's encoding is say LATIN1 or UNICODE. (does not work for me anyway,
on FreeBSD, and this is probably correct since the locale is still `C', I
believe?). The ICU patch changes nothing for the LATIN1 case, since it does
not act on single byte encodings, but for the UNICODE representation, it
works and does what I expect it to, namely upper() and lower() neatly
upper- or lowercase diacritical characters, i.e. lower('ÅÄÖ') -> 'åäö'.
This is a good thing, although I'm surprised that upper/lower is dragged
along with the LC_COLLATE fixation at initdb. I never run initdb in the C
locale, but only now do I realize how broken that really is if you need to
store anything else than English :-)

I'd be delighted to get more feedback about this stuff.

Thanks,
Palle

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Karel Zak
Дата: 26 марта 2005 г., 04:50:29
Сообщение: Re: Bug 1500

Следующее

От: Palle Girgensohn
Дата: 26 марта 2005 г., 05:13:44
Сообщение: Re: Patch for collation using ICU

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Patch for collation using ICU

Предыдущее

Следующее