Обсуждение: hu_HU.UTF8 case insensitive search fail to return values
Hi
I have a problem related to the encoding. Maybe this is not the right place to ask for help, in this case please tell me where to ask for help.
database version: PostgreSQL 8.1.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.5 (Debian 1:3.3.5-13)
The problem is the following:
I have created the database with UTF8 encoding (createdb -O postgres -E UTF8 dbname). The database uses the following encodings/locales:
lc_collate | hu_HU.UTF8 | Shows the collation order locale.
lc_ctype | hu_HU.UTF8 | Shows the character classification and case conversion locale.
lc_messages | C | Sets the language in which messages are displayed.
lc_monetary | C | Sets the locale for formatting monetary amounts.
lc_numeric | C | Sets the locale for formatting numbers.
lc_time | C | Sets the locale for formatting date and time values.
Lats say we have a table named some_table with the following datas:
create table some_table(id integer, str varchar(100));
insert into some_table values(1, 'alma a fa alatt');
insert into some_table values(2, 'nyári piros alma');
insert into some_table values(3, 'ez egy állomány');
insert into some_table values(4, 'egy másik ÁLLOMÁNY');
(note that I used a acute and A acute intentionally; these are some Hungarian, not too meaningful words)
When I execute the following query, the following results are returned:
SELECT * FROM some_table WHERE str ILIKE '%ál%"
for the following:
SELECT * FROM some_table WHERE str ILIKE '%Ál%"
i get
instead of
for
select * from some_table where str ilike '%á%'
i get
but for
select * from some_table where str ilike '%Á%'
i get:
UPPER, LOWER and LIKE is working perfectly. The problem appears to be just with the ILIKE operator.
NOTE: the following bug report, that had not get any answer is similar to my problem: http://svr5.postgresql.org/pgsql-bugs/2005-07/msg00157.php
I have a problem related to the encoding. Maybe this is not the right place to ask for help, in this case please tell me where to ask for help.
database version: PostgreSQL 8.1.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.5 (Debian 1:3.3.5-13)
The problem is the following:
I have created the database with UTF8 encoding (createdb -O postgres -E UTF8 dbname). The database uses the following encodings/locales:
lc_collate | hu_HU.UTF8 | Shows the collation order locale.
lc_ctype | hu_HU.UTF8 | Shows the character classification and case conversion locale.
lc_messages | C | Sets the language in which messages are displayed.
lc_monetary | C | Sets the locale for formatting monetary amounts.
lc_numeric | C | Sets the locale for formatting numbers.
lc_time | C | Sets the locale for formatting date and time values.
Lats say we have a table named some_table with the following datas:
create table some_table(id integer, str varchar(100));
insert into some_table values(1, 'alma a fa alatt');
insert into some_table values(2, 'nyári piros alma');
insert into some_table values(3, 'ez egy állomány');
insert into some_table values(4, 'egy másik ÁLLOMÁNY');
(note that I used a acute and A acute intentionally; these are some Hungarian, not too meaningful words)
When I execute the following query, the following results are returned:
SELECT * FROM some_table WHERE str ILIKE '%ál%"
| id | str |
|---|---|
3 | ez egy állomány |
for the following:
SELECT * FROM some_table WHERE str ILIKE '%Ál%"
i get
| id | str |
|---|---|
4 | egy másik ÁLLOMÁNY |
instead of
| id | str |
|---|---|
3 | ez egy állomány |
4 | egy másik ÁLLOMÁNY |
for
select * from some_table where str ilike '%á%'
i get
| id | str |
|---|---|
2 | nyári piros alma |
3 | ez egy állomány |
4 | egy másik ÁLLOMÁNY |
but for
select * from some_table where str ilike '%Á%'
i get:
| id | str |
|---|---|
4 | egy másik ÁLLOMÁNY |
UPPER, LOWER and LIKE is working perfectly. The problem appears to be just with the ILIKE operator.
NOTE: the following bug report, that had not get any answer is similar to my problem: http://svr5.postgresql.org/pgsql-bugs/2005-07/msg00157.php
--
tel: +40-(0)-365-410-540 | László-Róbert, Albert tel: +40-(0)-742-874-854 |
| e-mail: albertlaszlorobert@gmail.com e-mail: albert_laszlorobert@yahoo.com |
"László-Róbert, Albert" <albertlaszlorobert@gmail.com> writes:
> [ ILIKE fails to match case-insensitively in 8.1.2 ]
The ILIKE code doesn't work well at all in multibyte encodings in 8.1
and before. You should get reasonable results in one of the LATINn
encodings (and a matching locale of course). If you need to use UTF8,
I recommend updating to 8.2 which handles multibyte characters better.
regards, tom lane
Tom Lane wrote: > "László-Róbert, Albert" <albertlaszlorobert@gmail.com> writes: > >> [ ILIKE fails to match case-insensitively in 8.1.2 ] >> > > The ILIKE code doesn't work well at all in multibyte encodings in 8.1 > and before. You should get reasonable results in one of the LATINn > encodings (and a matching locale of course). If you need to use UTF8, > I recommend updating to 8.2 which handles multibyte characters better. > > regards, tom lane > > the administrator finally configured properly the postgres 8.2, and the reported problem looks to be fixed. i ran successfully the ILIKE test on hungarian language. thanks for the tip. regards, laszlo-robert, albert