Обсуждение: hu_HU.UTF8 case insensitive search fail to return values

Поиск
Список
Период
Сортировка

hu_HU.UTF8 case insensitive search fail to return values

От
"László-Róbert, Albert"
Дата:
Hi
I have a problem related to the encoding. Maybe this is not the right place to ask for help, in this case please tell me where to ask for help.

database version: PostgreSQL 8.1.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.5 (Debian 1:3.3.5-13)

The problem is the following:
I have created the database with UTF8 encoding (createdb -O postgres -E UTF8 dbname). The database uses the following encodings/locales:
lc_collate | hu_HU.UTF8 | Shows the collation order locale.
lc_ctype | hu_HU.UTF8 | Shows the character classification and case conversion locale.
lc_messages | C | Sets the language in which messages are displayed.
lc_monetary | C | Sets the locale for formatting monetary amounts.
lc_numeric | C | Sets the locale for formatting numbers.
lc_time | C | Sets the locale for formatting date and time values.

Lats say we have a table named some_table with the following datas:
create table some_table(id integer, str varchar(100));

insert into some_table values(1, 'alma a fa alatt');
insert into some_table values(2, 'nyári piros alma');
insert into some_table values(3, 'ez egy állomány');
insert into some_table values(4, 'egy másik ÁLLOMÁNY');

(note that I used a acute and A acute intentionally; these are some Hungarian, not too meaningful words)

When I execute the following query, the following results are returned:

SELECT * FROM some_table WHERE str ILIKE '%ál%"

idstr
3
ez egy állomány

for the following:
SELECT * FROM some_table WHERE str ILIKE '%Ál%"
i get
idstr
4
egy másik ÁLLOMÁNY

instead of

idstr
3
ez egy állomány
4
egy másik ÁLLOMÁNY

for
select * from some_table where str ilike '%á%'
i get
idstr
2
nyári piros alma
3
ez egy állomány
4
egy másik ÁLLOMÁNY

but for
select * from some_table where str ilike '%Á%'
i get:
idstr
4
egy másik ÁLLOMÁNY


UPPER, LOWER and LIKE is working perfectly. The problem appears to be just with the ILIKE operator.

NOTE: the following bug report, that had not get any answer is similar to my problem: http://svr5.postgresql.org/pgsql-bugs/2005-07/msg00157.php
--
RSoft Solutions Ltd
tel: +40-(0)-365-410-540
László-Róbert, Albert
tel: +40-(0)-742-874-854
e-mail: albertlaszlorobert@gmail.com
e-mail: albert_laszlorobert@yahoo.com

Re: hu_HU.UTF8 case insensitive search fail to return values

От
Tom Lane
Дата:
"László-Róbert, Albert" <albertlaszlorobert@gmail.com> writes:
> [ ILIKE fails to match case-insensitively in 8.1.2 ]

The ILIKE code doesn't work well at all in multibyte encodings in 8.1
and before.  You should get reasonable results in one of the LATINn
encodings (and a matching locale of course).  If you need to use UTF8,
I recommend updating to 8.2 which handles multibyte characters better.
        regards, tom lane


Re: hu_HU.UTF8 case insensitive search fail to return values

От
Albert László-Róbert
Дата:
Tom Lane wrote:
> "László-Róbert, Albert" <albertlaszlorobert@gmail.com> writes:
>
>> [ ILIKE fails to match case-insensitively in 8.1.2 ]
>>
>
> The ILIKE code doesn't work well at all in multibyte encodings in 8.1
> and before.  You should get reasonable results in one of the LATINn
> encodings (and a matching locale of course).  If you need to use UTF8,
> I recommend updating to 8.2 which handles multibyte characters better.
>
>             regards, tom lane
>
>
the administrator finally configured properly the postgres 8.2, and the
reported problem looks to be fixed. i ran successfully the ILIKE test on
hungarian language.
thanks for the tip.
regards, laszlo-robert, albert