Обсуждение: Re: Win32 unicode vs ICU

Поиск
Список
Период
Сортировка

Re: Win32 unicode vs ICU

От
Tom Lane
Дата:
[ moving to -hackers for wider discussion ]

"Magnus Hagander" <mha@sollentuna.net> wrote in
http://archives.postgresql.org/pgsql-patches/2005-08/msg00039.php

>> I've been working with Palles ICU patch to make it work on 
>> win32, and I believe I have it done. While doing it I noticed 
>> that ICU basically converts to UTF16 and back - I previously 
>> thought it worked on UTF8 strings. Based on this I also tried 
>> out an implementation for the win32-unicode problem that does 
>> *not* require ICU. It uses the win32 native functions to map 
>> to utf16 and back, and then to process the text there. And I 
>> got through with much less code than the ICU version, while 
>> doing the same thing.
>>  
>> I am unsure of how to proceed. As I see it there are three paths:
>> 1) Use native win32 functionality only on win32
>> 2) Use ICU functionality only on win32
>> 3) Allow both ICU and native functionality, compile time 
>>    switch --with-icu (same as unix with the ICU patch)

We need to figure out what we're going to do about this.  Given where
we are in the release cycle, I am pretty strongly tempted to just apply
the smaller patch (just map utf8/utf16 using Windows native functions)
for PG 8.1.

I think that ICU would be interesting as the base for a much larger
patch that gets us away from depending on libc's locale support at all
(in particular, getting rid of the "one locale per database" problem).
But it seems like a heck of a big dependency to incur for any lesser goal.

I feel it makes sense to apply the smaller patch in any case, so that
there's a Win32 solution not requiring ICU (ie, I can't see an argument
for doing (2) rather than (3)).

Comments?

Also,

> And anohter question - my native patch touches the same 
> functions as the ICU patch. Can somebody who knows the 
> internals confirm or deny that these are all the required 
> locations, or do we need to modify more?

There is a strxfrm() call in src/backend/utils/adt/selfuncs.c,
which probably needs to be looked at too.
        regards, tom lane


Re: Win32 unicode vs ICU

От
Alvaro Herrera
Дата:
On Sat, Aug 20, 2005 at 12:17:47PM -0400, Tom Lane wrote:

> I think that ICU would be interesting as the base for a much larger
> patch that gets us away from depending on libc's locale support at all
> (in particular, getting rid of the "one locale per database" problem).
> But it seems like a heck of a big dependency to incur for any lesser goal.

There is a locale project from the Gnome guys, with an eye towards a
wider audience.  The announcement, which states the goals of the
project, is here:

http://mail.gnome.org/archives/locale-list/2005-August/msg00000.html

The project website is at http://live.gnome.org/LocaleProject

The big problem with this is that the license is likely to be LGPL, so
there's probably not much code we could use.  OTOH, it's possible that
we could borrow some ideas from them.  In particular, they are based
mostly on the Common Locale Data Repository,
http://www.unicode.org/cldr/


However, this thread on their list, which is about the license they will
choose, hints that rewriting the whole CLDR handling from scratch would
be very painful:

http://mail.gnome.org/archives/locale-list/2005-August/msg00004.html

This is precisely the reason they are using LGPL: they do not want to
have to rewrite it all, which they would were they to choose a license
like BSD.  (Personally I think this is folly -- someone else will have
to rewrite it again with a BSD license sometime, and then the value of
their work would be decreased.)

-- 
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"A wizard is never late, Frodo Baggins, nor is he early.He arrives precisely when he means to."  (Gandalf, en LoTR
FoTR)


Re: Win32 unicode vs ICU

От
Palle Girgensohn
Дата:
--On lördag, augusti 20, 2005 12.17.47 -0400 Tom Lane <tgl@sss.pgh.pa.us>
wrote:

> [ moving to -hackers for wider discussion ]
>
> "Magnus Hagander" <mha@sollentuna.net> wrote in
> http://archives.postgresql.org/pgsql-patches/2005-08/msg00039.php
>
>>> I've been working with Palles ICU patch to make it work on
>>> win32, and I believe I have it done. While doing it I noticed
>>> that ICU basically converts to UTF16 and back - I previously
>>> thought it worked on UTF8 strings. Based on this I also tried
>>> out an implementation for the win32-unicode problem that does
>>> *not* require ICU. It uses the win32 native functions to map
>>> to utf16 and back, and then to process the text there. And I
>>> got through with much less code than the ICU version, while
>>> doing the same thing.
>>>
>>> I am unsure of how to proceed. As I see it there are three paths:
>>> 1) Use native win32 functionality only on win32
>>> 2) Use ICU functionality only on win32
>>> 3) Allow both ICU and native functionality, compile time
>>>    switch --with-icu (same as unix with the ICU patch)
>
> We need to figure out what we're going to do about this.  Given where
> we are in the release cycle, I am pretty strongly tempted to just apply
> the smaller patch (just map utf8/utf16 using Windows native functions)
> for PG 8.1.
>
> I think that ICU would be interesting as the base for a much larger
> patch that gets us away from depending on libc's locale support at all
> (in particular, getting rid of the "one locale per database" problem).
> But it seems like a heck of a big dependency to incur for any lesser goal.
>
> I feel it makes sense to apply the smaller patch in any case, so that
> there's a Win32 solution not requiring ICU (ie, I can't see an argument
> for doing (2) rather than (3)).
>
> Comments?

I don't mind either way, but while Win32 will work with Magnus' patch,
FreeBSD won't; it needs the ICU patch to work. OTH, I maintain the FreeBSD
port where I already have the patch as an ("experiemental") option. Not
every FreeBSD user uses the ports system, though.

So, it is a question whether FreeBSD's unicode support is important or not,
I guess? Win32 will work both ways.

/Palle



Re: Win32 unicode vs ICU

От
Bruce Momjian
Дата:
Palle Girgensohn wrote:
> > I feel it makes sense to apply the smaller patch in any case, so that
> > there's a Win32 solution not requiring ICU (ie, I can't see an argument
> > for doing (2) rather than (3)).
> >
> > Comments?
> 
> I don't mind either way, but while Win32 will work with Magnus' patch, 
> FreeBSD won't; it needs the ICU patch to work. OTH, I maintain the FreeBSD 
> port where I already have the patch as an ("experiemental") option. Not 
> every FreeBSD user uses the ports system, though.
> 
> So, it is a question whether FreeBSD's unicode support is important or not, 
> I guess? Win32 will work both ways.

How is FreeBSD's Unicode support broken?  I was not aware of that.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Win32 unicode vs ICU

От
Palle Girgensohn
Дата:
--On måndag, augusti 22, 2005 09.19.58 -0400 Bruce Momjian
<pgman@candle.pha.pa.us> wrote:

> Palle Girgensohn wrote:
>> > I feel it makes sense to apply the smaller patch in any case, so that
>> > there's a Win32 solution not requiring ICU (ie, I can't see an argument
>> > for doing (2) rather than (3)).
>> >
>> > Comments?
>>
>> I don't mind either way, but while Win32 will work with Magnus' patch,
>> FreeBSD won't; it needs the ICU patch to work. OTH, I maintain the
>> FreeBSD  port where I already have the patch as an ("experiemental")
>> option. Not  every FreeBSD user uses the ports system, though.
>>
>> So, it is a question whether FreeBSD's unicode support is important or
>> not,  I guess? Win32 will work both ways.
>
> How is FreeBSD's Unicode support broken?  I was not aware of that.

FreeBSD has no unicode collation support. Hence the need for ICU.

/Palle



Re: Win32 unicode vs ICU

От
Tom Lane
Дата:
Palle Girgensohn <girgen@pingpong.net> writes:
> <pgman@candle.pha.pa.us> wrote:
>> How is FreeBSD's Unicode support broken?  I was not aware of that.

> FreeBSD has no unicode collation support. Hence the need for ICU.

Well, this obviously doesn't bother anyone who uses FreeBSD, so it need
not bother us either.  I do not feel a need to take on ICU in order to
implement features that are not present anywhere else on the platform.
        regards, tom lane


Re: Win32 unicode vs ICU

От
Palle Girgensohn
Дата:

--On måndag, augusti 22, 2005 10.12.11 -0400 Tom Lane <tgl@sss.pgh.pa.us>
wrote:

> Palle Girgensohn <girgen@pingpong.net> writes:
>> <pgman@candle.pha.pa.us> wrote:
>>> How is FreeBSD's Unicode support broken?  I was not aware of that.
>
>> FreeBSD has no unicode collation support. Hence the need for ICU.
>
> Well, this obviously doesn't bother anyone who uses FreeBSD, so it need
> not bother us either.  I do not feel a need to take on ICU in order to
> implement features that are not present anywhere else on the platform.

It bothered me enough to patch postgresql. :)  And I use it with Java,
which has working unicode support, soo... Oh well, I can live with that -
I'll maintain my patch locally for the time beeing, if that's what's
required.

/Palle