Обсуждение: PostgreSQL
Hello, all!
I have a good question for PostgreSQL FAQ.
How to use string functions (like UPPER()/LOWER()) for non-latin strings?
Why UPPER() function doesn't work with my UNICODE PostgreSQL database which contains non-latin characters (like cyrillic)?
How to make case insensetive search by text field which contains non-latin characters?
Thanks for your answers!
Best regards
Eugeny
Not sure. I thought it would work. --------------------------------------------------------------------------- Eugeny Balakhonov wrote: > Hello, all! > > I have a good question for PostgreSQL FAQ. > > How to use string functions (like UPPER()/LOWER()) for non-latin strings? > Why UPPER() function doesn't work with my UNICODE PostgreSQL database which contains non-latin characters (like cyrillic)? > How to make case insensetive search by text field which contains non-latin characters? > > Thanks for your answers! > > Best regards > Eugeny -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
I confirm this behavour: cyrilic words are not changed by lower()/upper()
functions, nor catched by ilike.
I am using :
=> SELECT version();
version
- ---------------------------------------------------------------
PostgreSQL 7.2.2 on i686-pc-linux-gnu, compiled by GCC 2.95.2
(1 row)
Nothing special was done during database creation (no encoding selected).
> Not sure. I thought it would work.
> > How to use string functions (like UPPER()/LOWER()) for non-latin strings?
> > Why UPPER() function doesn't work with my UNICODE PostgreSQL database
> > which contains non-latin characters (like cyrillic)? How to make case
> > insensetive search by text field which contains non-latin characters?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)
iD8DBQE/Nw7wV+WKOINIfOYRAuhmAJwMEkdgqXkt6ZhgJsFZfQH2mELRwgCfeDeV
L9TbSItEb0tAC7cI0cKwg6A=
=veHN
-----END PGP SIGNATURE-----
On Mon, 11 Aug 2003, Bruce Momjian wrote: > > Not sure. I thought it would work. > No, it doesn't works. Several people already complained about bad unicode support. I recall Tatsuo comment some piece of code. I have a little page http://www.sai.msu.su/~megera/postgres/utf8.html about my experience with UTF8 and cyrillic. > --------------------------------------------------------------------------- > > Eugeny Balakhonov wrote: > > Hello, all! > > > > I have a good question for PostgreSQL FAQ. > > > > How to use string functions (like UPPER()/LOWER()) for non-latin strings? > > Why UPPER() function doesn't work with my UNICODE PostgreSQL database which contains non-latin characters (like cyrillic)? > > How to make case insensetive search by text field which contains non-latin characters? > > > > Thanks for your answers! > > > > Best regards > > Eugeny > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Well, I have no mention of this problem in the TODO list, so I would like to get a good description of why it isn't working. Looking at the code, I see upper() is defined in oracle_compat.c (you would think it would be more standard), and it calls toupper(), so it probably works on single-bytes encodings, but not multi-byte ones. Is this correct? is there a way to do multi-byte toupper? Perhaps converting to wide characters and calling towupper()? --------------------------------------------------------------------------- Oleg Bartunov wrote: > On Mon, 11 Aug 2003, Bruce Momjian wrote: > > > > > Not sure. I thought it would work. > > > > No, it doesn't works. Several people already complained about bad > unicode support. I recall Tatsuo comment some piece of code. > I have a little page http://www.sai.msu.su/~megera/postgres/utf8.html > about my experience with UTF8 and cyrillic. > > > > > --------------------------------------------------------------------------- > > > > Eugeny Balakhonov wrote: > > > Hello, all! > > > > > > I have a good question for PostgreSQL FAQ. > > > > > > How to use string functions (like UPPER()/LOWER()) for non-latin strings? > > > Why UPPER() function doesn't work with my UNICODE PostgreSQL database which contains non-latin characters (like cyrillic)? > > > How to make case insensetive search by text field which contains non-latin characters? > > > > > > Thanks for your answers! > > > > > > Best regards > > > Eugeny > > > > > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
I think if Postgres were to be completely UTF8 compatible, and as the default configuration, we'd do a lot better against'the others', and take more of Oracle's market. Bruce Momjian wrote: > Well, I have no mention of this problem in the TODO list, so I would > like to get a good description of why it isn't working. > > Looking at the code, I see upper() is defined in oracle_compat.c (you > would think it would be more standard), and it calls toupper(), so it > probably works on single-bytes encodings, but not multi-byte ones. Is > this correct? is there a way to do multi-byte toupper? Perhaps > converting to wide characters and calling towupper()? > > --------------------------------------------------------------------------- > > Oleg Bartunov wrote: > >>On Mon, 11 Aug 2003, Bruce Momjian wrote: >> >> >>>Not sure. I thought it would work. >>> >> >>No, it doesn't works. Several people already complained about bad >>unicode support. I recall Tatsuo comment some piece of code. >>I have a little page http://www.sai.msu.su/~megera/postgres/utf8.html >>about my experience with UTF8 and cyrillic. >> >> >> >> >>>--------------------------------------------------------------------------- >>> >>>Eugeny Balakhonov wrote: >>> >>>>Hello, all! >>>> >>>>I have a good question for PostgreSQL FAQ. >>>> >>>>How to use string functions (like UPPER()/LOWER()) for non-latin strings? >>>>Why UPPER() function doesn't work with my UNICODE PostgreSQL database which contains non-latin characters (like cyrillic)? >>>>How to make case insensetive search by text field which contains non-latin characters? >>>> >>>>Thanks for your answers! >>>> >>>>Best regards >>>>Eugeny >>> >>> >> Regards, >> Oleg >>_____________________________________________________________ >>Oleg Bartunov, sci.researcher, hostmaster of AstroNet, >>Sternberg Astronomical Institute, Moscow University (Russia) >>Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >>phone: +007(095)939-16-83, +007(095)939-23-83 >> > >
Added to TODO:
* Fix upper()/lower() to work for multibyte encodings
---------------------------------------------------------------------------
Alexander Litvinov wrote:
[ PGP not available, raw data follows ]
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> I confirm this behavour: cyrilic words are not changed by lower()/upper()
> functions, nor catched by ilike.
>
> I am using :
> => SELECT version();
> version
> - ---------------------------------------------------------------
> PostgreSQL 7.2.2 on i686-pc-linux-gnu, compiled by GCC 2.95.2
> (1 row)
>
> Nothing special was done during database creation (no encoding selected).
>
> > Not sure. I thought it would work.
>
> > > How to use string functions (like UPPER()/LOWER()) for non-latin strings?
> > > Why UPPER() function doesn't work with my UNICODE PostgreSQL database
> > > which contains non-latin characters (like cyrillic)? How to make case
> > > insensetive search by text field which contains non-latin characters?
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.2.2 (GNU/Linux)
>
> iD8DBQE/Nw7wV+WKOINIfOYRAuhmAJwMEkdgqXkt6ZhgJsFZfQH2mELRwgCfeDeV
> L9TbSItEb0tAC7cI0cKwg6A=
> =veHN
> -----END PGP SIGNATURE-----
>
[ End of raw data]
--
Bruce Momjian | http://candle.pha.pa.us
pgman@candle.pha.pa.us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073