Matching uppercased russian words (\x0410-\x042F) in UTF8 database 8.4.13

Поиск
Список
Период
Сортировка
От Alexander Farber
Тема Matching uppercased russian words (\x0410-\x042F) in UTF8 database 8.4.13
Дата
Msg-id CAADeyWjZUQU-mwN30rxZs_2A_HvBzrtWwd9kX+c+hmo5kfmN+w@mail.gmail.com
обсуждение исходный текст
Ответы Re: Matching uppercased russian words (\x0410-\x042F) in UTF8 database 8.4.13
Re: Matching uppercased russian words (\x0410-\x042F) in UTF8 database 8.4.13
Список pgsql-general
Hello,

I have prepared an SQL fiddle for my question:
http://sqlfiddle.com/#!11/8a494/4

And also described it in more detail at
http://stackoverflow.com/questions/15500270/string-matching-in-insert-trigger-how-to-use-in-conditionals-to-return-null

Does anybody please know how to check for
UTF8 range \x0410-\x042F in my code below?

I've tried both
new.word !~ '^[\x0410-\x042F]{2,}$'
(fails with syntax error) and
new.word !~ '^[\u0410-\u042F]{2,}$'
(triggers even for correct words):


create table good_words (
        word varchar(64) primary key
);

create or replace function keep_clean() returns trigger as $body$
        begin
                new.word := upper(new.word);

                /* next line does not compile? */
                IF new.word !~ '^[\x0410-\x042F]{2,}$' THEN
                    RAISE EXCEPTION 'Not an uppercased Russian word in UTF8';
                END IF;

                IF new.word ~ '^[ЪЫЬ]' OR new.word ~ 'Ъ$' THEN
                    return NULL;
                END IF;

                /* does not return NULL for 'ошибббка'? */
                IF new.word ~ '(.)\1\1' AND new.word NOT LIKE '%ШЕЕЕ%'
AND new.word NOT LIKE '%ЗМЕЕЕ%' THEN
                    return NULL;
                END IF;

                return new;
        end;
$body$ language plpgsql;


Thank you
Alex


В списке pgsql-general по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: [HACKERS] Trust intermediate CA for client certificates
Следующее
От: Alexander Farber
Дата:
Сообщение: Re: Matching uppercased russian words (\x0410-\x042F) in UTF8 database 8.4.13