Обсуждение: Re: Win32 unicode vs ICU
[ moving to -hackers for wider discussion ] "Magnus Hagander" <mha@sollentuna.net> wrote in http://archives.postgresql.org/pgsql-patches/2005-08/msg00039.php >> I've been working with Palles ICU patch to make it work on >> win32, and I believe I have it done. While doing it I noticed >> that ICU basically converts to UTF16 and back - I previously >> thought it worked on UTF8 strings. Based on this I also tried >> out an implementation for the win32-unicode problem that does >> *not* require ICU. It uses the win32 native functions to map >> to utf16 and back, and then to process the text there. And I >> got through with much less code than the ICU version, while >> doing the same thing. >> >> I am unsure of how to proceed. As I see it there are three paths: >> 1) Use native win32 functionality only on win32 >> 2) Use ICU functionality only on win32 >> 3) Allow both ICU and native functionality, compile time >> switch --with-icu (same as unix with the ICU patch) We need to figure out what we're going to do about this. Given where we are in the release cycle, I am pretty strongly tempted to just apply the smaller patch (just map utf8/utf16 using Windows native functions) for PG 8.1. I think that ICU would be interesting as the base for a much larger patch that gets us away from depending on libc's locale support at all (in particular, getting rid of the "one locale per database" problem). But it seems like a heck of a big dependency to incur for any lesser goal. I feel it makes sense to apply the smaller patch in any case, so that there's a Win32 solution not requiring ICU (ie, I can't see an argument for doing (2) rather than (3)). Comments? Also, > And anohter question - my native patch touches the same > functions as the ICU patch. Can somebody who knows the > internals confirm or deny that these are all the required > locations, or do we need to modify more? There is a strxfrm() call in src/backend/utils/adt/selfuncs.c, which probably needs to be looked at too. regards, tom lane
On Sat, Aug 20, 2005 at 12:17:47PM -0400, Tom Lane wrote: > I think that ICU would be interesting as the base for a much larger > patch that gets us away from depending on libc's locale support at all > (in particular, getting rid of the "one locale per database" problem). > But it seems like a heck of a big dependency to incur for any lesser goal. There is a locale project from the Gnome guys, with an eye towards a wider audience. The announcement, which states the goals of the project, is here: http://mail.gnome.org/archives/locale-list/2005-August/msg00000.html The project website is at http://live.gnome.org/LocaleProject The big problem with this is that the license is likely to be LGPL, so there's probably not much code we could use. OTOH, it's possible that we could borrow some ideas from them. In particular, they are based mostly on the Common Locale Data Repository, http://www.unicode.org/cldr/ However, this thread on their list, which is about the license they will choose, hints that rewriting the whole CLDR handling from scratch would be very painful: http://mail.gnome.org/archives/locale-list/2005-August/msg00004.html This is precisely the reason they are using LGPL: they do not want to have to rewrite it all, which they would were they to choose a license like BSD. (Personally I think this is folly -- someone else will have to rewrite it again with a BSD license sometime, and then the value of their work would be decreased.) -- Alvaro Herrera (<alvherre[a]alvh.no-ip.org>) "A wizard is never late, Frodo Baggins, nor is he early.He arrives precisely when he means to." (Gandalf, en LoTR FoTR)
--On lördag, augusti 20, 2005 12.17.47 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > [ moving to -hackers for wider discussion ] > > "Magnus Hagander" <mha@sollentuna.net> wrote in > http://archives.postgresql.org/pgsql-patches/2005-08/msg00039.php > >>> I've been working with Palles ICU patch to make it work on >>> win32, and I believe I have it done. While doing it I noticed >>> that ICU basically converts to UTF16 and back - I previously >>> thought it worked on UTF8 strings. Based on this I also tried >>> out an implementation for the win32-unicode problem that does >>> *not* require ICU. It uses the win32 native functions to map >>> to utf16 and back, and then to process the text there. And I >>> got through with much less code than the ICU version, while >>> doing the same thing. >>> >>> I am unsure of how to proceed. As I see it there are three paths: >>> 1) Use native win32 functionality only on win32 >>> 2) Use ICU functionality only on win32 >>> 3) Allow both ICU and native functionality, compile time >>> switch --with-icu (same as unix with the ICU patch) > > We need to figure out what we're going to do about this. Given where > we are in the release cycle, I am pretty strongly tempted to just apply > the smaller patch (just map utf8/utf16 using Windows native functions) > for PG 8.1. > > I think that ICU would be interesting as the base for a much larger > patch that gets us away from depending on libc's locale support at all > (in particular, getting rid of the "one locale per database" problem). > But it seems like a heck of a big dependency to incur for any lesser goal. > > I feel it makes sense to apply the smaller patch in any case, so that > there's a Win32 solution not requiring ICU (ie, I can't see an argument > for doing (2) rather than (3)). > > Comments? I don't mind either way, but while Win32 will work with Magnus' patch, FreeBSD won't; it needs the ICU patch to work. OTH, I maintain the FreeBSD port where I already have the patch as an ("experiemental") option. Not every FreeBSD user uses the ports system, though. So, it is a question whether FreeBSD's unicode support is important or not, I guess? Win32 will work both ways. /Palle
Palle Girgensohn wrote: > > I feel it makes sense to apply the smaller patch in any case, so that > > there's a Win32 solution not requiring ICU (ie, I can't see an argument > > for doing (2) rather than (3)). > > > > Comments? > > I don't mind either way, but while Win32 will work with Magnus' patch, > FreeBSD won't; it needs the ICU patch to work. OTH, I maintain the FreeBSD > port where I already have the patch as an ("experiemental") option. Not > every FreeBSD user uses the ports system, though. > > So, it is a question whether FreeBSD's unicode support is important or not, > I guess? Win32 will work both ways. How is FreeBSD's Unicode support broken? I was not aware of that. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
--On måndag, augusti 22, 2005 09.19.58 -0400 Bruce Momjian <pgman@candle.pha.pa.us> wrote: > Palle Girgensohn wrote: >> > I feel it makes sense to apply the smaller patch in any case, so that >> > there's a Win32 solution not requiring ICU (ie, I can't see an argument >> > for doing (2) rather than (3)). >> > >> > Comments? >> >> I don't mind either way, but while Win32 will work with Magnus' patch, >> FreeBSD won't; it needs the ICU patch to work. OTH, I maintain the >> FreeBSD port where I already have the patch as an ("experiemental") >> option. Not every FreeBSD user uses the ports system, though. >> >> So, it is a question whether FreeBSD's unicode support is important or >> not, I guess? Win32 will work both ways. > > How is FreeBSD's Unicode support broken? I was not aware of that. FreeBSD has no unicode collation support. Hence the need for ICU. /Palle
Palle Girgensohn <girgen@pingpong.net> writes: > <pgman@candle.pha.pa.us> wrote: >> How is FreeBSD's Unicode support broken? I was not aware of that. > FreeBSD has no unicode collation support. Hence the need for ICU. Well, this obviously doesn't bother anyone who uses FreeBSD, so it need not bother us either. I do not feel a need to take on ICU in order to implement features that are not present anywhere else on the platform. regards, tom lane
--On måndag, augusti 22, 2005 10.12.11 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Palle Girgensohn <girgen@pingpong.net> writes: >> <pgman@candle.pha.pa.us> wrote: >>> How is FreeBSD's Unicode support broken? I was not aware of that. > >> FreeBSD has no unicode collation support. Hence the need for ICU. > > Well, this obviously doesn't bother anyone who uses FreeBSD, so it need > not bother us either. I do not feel a need to take on ICU in order to > implement features that are not present anywhere else on the platform. It bothered me enough to patch postgresql. :) And I use it with Java, which has working unicode support, soo... Oh well, I can live with that - I'll maintain my patch locally for the time beeing, if that's what's required. /Palle