Re: different sort order in windows and linux version

Поиск
Список
Период
Сортировка
От Martijn van Oosterhout
Тема Re: different sort order in windows and linux version
Дата
Msg-id 20060702101302.GB8316@svana.org
обсуждение исходный текст
Ответ на Re: different sort order in windows and linux version  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: different sort order in windows and linux version  (Karsten Hilbert <Karsten.Hilbert@gmx.net>)
Re: different sort order in windows and linux version  (Agent M <agentm@themactionfaction.com>)
Список pgsql-general
On Sat, Jul 01, 2006 at 06:23:07PM -0400, Tom Lane wrote:
> "Tomi NA" <hefest@gmail.com> writes:
> > Basically, it comes down to three possibilities, doesn't it:
> > 1.) use an existing library
> > 2.) write a pgsql specific implementation
> > 3.) forget about it and tend to other issues
>
> > Personally, I don't really care if it's 1) or 2): I'm just afraid it's
> > going to be 3).
> > Is this a licencing issue (with regard to ICU beeing under the IBM
> > public licence)?
>
> Licensing is a concern --- IBM's appears to be not quite BSD enough.
> Size and portability of the library are concerns.  Performance is a
> concern.  Whether the patch makes the library required or optional is
> a concern (if required, the portability issue becomes a whole lot more
> urgent).  Loss of existing functionality is a concern --- for instance,
> if the patch is such that UTF8 becomes the only supported server
> encoding, it'll probably be rejected forthwith.

Licence - It's the X/MIT licence, which is almost identical to the BSD
licence.

http://dev.icu-project.org/cgi-bin/viewcvs.cgi/*checkout*/icu/license.html
http://en.wikipedia.org/wiki/MIT_License

But I don't think anyone is actually considering importing ICU into the
postgres source tree, are they?

Size - I'm not sure this is relevent since I don't think we want to
incorporate it into postgres itself, just let people use it if they
have it. In any case though, the default dataset is 8MB. This includes
support for every locale and charset it knows about.

If you drop the conversion stuff (because postgres already has that)
you're down to about 4MB.

Since ICU supports userdefined tables, we could provide a single
cross-platform dataset and get the user's ICU library implementation to
use that.

Portability - ICU runs on all the platforms postgres does, AFAICS.

http://dev.icu-project.org/cgi-bin/viewcvs.cgi/icu/readme.html?rev=release-3-4#HowToBuildSupported

Performance - ICU is approximatly four times faster than glibc for
collation. Even once you include keygen time (including conversion) it
comes out about 40% faster.

http://icu.sourceforge.net/charts/collation_icu4c_glibc.html

ICU is not slow.

> Well, the Japanese think that UTF8 is not the solution to all their
> worries, so they won't be happy with a UTF8-only solution.  Likewise,
> those of us who only need single-byte character sets won't be very happy
> with being forced to accept multi-byte processing overhead.

I've not quite understood the japenese problem with Unicode. My
understanding is that it was primarily due to widespread use of broken
converters.

In any case, ICU appears to beat glibc with single byte encodings, even
including the multi-byte conversion.

However, the most important point is that people have said they'll take
the speed hit if they could get consistant collation. For speed you can
always throw more hardware. But no amount of hardware will fix your
collation issues.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Вложения

В списке pgsql-general по дате отправления:

Предыдущее
От: Tino Wildenhain
Дата:
Сообщение: Re: How to Backup like in mysql or ms sql server
Следующее
От: Dragan Matic
Дата:
Сообщение: Re: different sort order in windows and linux version