Обсуждение: Different result when using '=' and 'like' in unicode mode
I used PostgreSQL 7.2.1 on redhat 7.3, installed using the rpm from redhat,.
When I createdb -E UNICODE, and import data using \copy command, after
converted all the imported text files to UTF-8 encoding. It looks well, I
can get
the data from several platforms in different locale. But, a strange problem,
when i using '=' in where clause, to match the varchar value, the result is
not correct,
for example, the table structure is
Table "tbl_test"
Column | Type | Modifiers
-------------+-----------------------+-----------
id | integer |
name | character varying(40) |
the sql is
select * from tbl_test where name = 'ç³ç°';
but the result contains other data, such as ç³çº, æ³çº, æ³ç°
and I changed the sql to
select * from tbl_test where name like 'ç³ç°';
select * from tbl_test where name ilike 'ç³ç°';
the results both correct.
I don't know the reason of this problem, and in my application it not
allowed
to use 'like' replaced '=', the data amount is too large.
Is this a bug of PostgreSQL? or only a problem in redhat environment? and
how to fix it?
Thx,
Jiang Sheng
Jiang Sheng writes: > the sql is > select * from tbl_test where name = 'ç³ç°'; > > but the result contains other data, such as ç³çº, æ³çº, æ³ç° > > and I changed the sql to > select * from tbl_test where name like 'ç³ç°'; > select * from tbl_test where name ilike 'ç³ç°'; > > the results both correct. The LIKE operator uses a character-by-character comparison (which could quite possibly behave oddly if your strings contain combining characters, but that is a different issue). The = operator uses the host operating system's locale facilities to do a locale-aware comparison. So in general LIKE and = are not interchangeable. If the result you get for = is wrong even under some locale, then you probably don't have the right locale set on your server. Recent glibc systems require you to set your locale to xx_YY.utf8 (rather than just xx_YY) if you want to use Unicode. -- Peter Eisentraut peter_e@gmx.net
Thanks a lot! In your opinion, should I re-compile the PostgreSQL after set the $LANG to ja_JP.utf8? or I need to set other variables' value.. I changed the $LANG value, but the result still not correct, and not found anything about it in PostgrSQL's document. Thx Jiang Sheng ----- Original Message ----- éä¿¡è : "Peter Eisentraut" <peter_e@gmx.net> å®å : "Jiang Sheng" <jiangsheng@sis.sh.cn> Cc: <pgsql-bugs@postgresql.org> éä¿¡æ¥æ : 2002å¹´11æ18æ¥ 6:21 ä»¶å : Re: [BUGS] Different result when using '=' and 'like' in unicode mode > Jiang Sheng writes: > > > the sql is > > select * from tbl_test where name = 'ç³ç°'; > > > > but the result contains other data, such as ç³çº, æÂ³çº, æÂ³ç° > > > > and I changed the sql to > > select * from tbl_test where name like 'ç³ç°'; > > select * from tbl_test where name ilike 'ç³ç°'; > > > > the results both correct. > > The LIKE operator uses a character-by-character comparison (which could > quite possibly behave oddly if your strings contain combining characters, > but that is a different issue). The = operator uses the host operating > system's locale facilities to do a locale-aware comparison. So in general > LIKE and = are not interchangeable. > > If the result you get for = is wrong even under some locale, then you > probably don't have the right locale set on your server. Recent glibc > systems require you to set your locale to xx_YY.utf8 (rather than just > xx_YY) if you want to use Unicode. > > -- > Peter Eisentraut peter_e@gmx.net >
Jiang Sheng writes: > In your opinion, should I re-compile the PostgreSQL after set the $LANG > to ja_JP.utf8? or I need to set other variables' value.. Recompiling has nothing to do with that. Please see the Administrator's Guide chapter on localization for all the information. -- Peter Eisentraut peter_e@gmx.net