Re: Unicode combining characters
От | Tatsuo Ishii |
---|---|
Тема | Re: Unicode combining characters |
Дата | |
Msg-id | 20011002101416E.t-ishii@sra.co.jp обсуждение исходный текст |
Ответ на | Re: Unicode combining characters (Bruce Momjian <pgman@candle.pha.pa.us>) |
Ответы |
Re: Unicode combining characters
(Bruce Momjian <pgman@candle.pha.pa.us>)
|
Список | pgsql-hackers |
> Can someone give me TODO items for this discussion? What about: Improve Unicode combined character handling -- Tatsuo Ishii > > > So, this shows two problems : > > > > > > - length() on the server side doesn't handle correctly Unicode [I have > > > the same result with char_length()], and returns the number of chars > > > (as it is however advertised to do), rather the length of the > > > string. > > > > This is a known limitation. > > > > > - the psql frontend makes the same mistake. > > > > > > I am using version 7.1.3 (debian sid), so it may have been corrected > > > in the meantime (in this case, I apologise, but I have only recently > > > started again to use PostgreSQL and I haven't followed -hackers long > > > enough). > > > > > > > > > => I think fixing psql shouldn't be too complicated, as the glibc > > > should be providing the locale, and return the right values (is this > > > the case ? and what happens for combined latin + chinese characters > > > for example ? I'll have to try that later). If it's not fixed already, > > > do you want me to look at this ? [it will take some time, as I haven't > > > set up any development environment for postgres yet, and I'm away for > > > one week from thursday]. > > > > Sounds great. > > > > > I was wondering if some people have already thought about this, or > > > already done something, or if some of you are interested in this. If > > > nobody does anything, I'll do something eventually, probably before > > > Christmas (I don't have much time for this, and I don't need the > > > functionality right now), but if there is an interest, I could team > > > with others and develop it faster :) > > > > I'm very interested in your point. I will start studying [1][2] after > > the beta freeze. > > > > > Anyway, I'm open to suggestions : > > > > > > - implement it in C, in the core, > > > > > > - implement it in C, as contributed custom functions, > > > > This may be a good starting point. > > > > > I can't really accept a solution which would rely on the underlaying > > > libc, as it may not provide the necessary locales (or maybe, then, > > > > I totally agree here. > > > > > The main functions I foresee are : > > > > > > - provide a normalisation function to all 4 forms, > > > > > > - provide a collation_key(text, language) function, as the calculation > > > of the key may be expensive, some may want to index on the result (I > > > would :) ), > > > > > > - provide a collation algorithm, using the two previous facilities, > > > which can do primary to tertiary collation (cf TR#10 for a detailed > > > explanation). > > > > > > I haven't looked at PostgreSQL code yet (shame !), so I may be > > > completely off-track, in which case I'll retract myself and won't > > > bother you again (on that subject, that is ;) )... > > > > > > Comments ? > > -- > > Tatsuo Ishii > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 2: you can get off all lists at once with the unregister command > > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > > > > -- > Bruce Momjian | http://candle.pha.pa.us > pgman@candle.pha.pa.us | (610) 853-3000 > + If your life is a hard drive, | 830 Blythe Avenue > + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
В списке pgsql-hackers по дате отправления: