UTF8 regexp and char classes still does not work

Поиск
Список
Период
Сортировка
От Sergey Burladyan
Тема UTF8 regexp and char classes still does not work
Дата
Msg-id 877hi5a6wr.fsf@home.progtech.ru
обсуждение исходный текст
Ответы Re: UTF8 regexp and char classes still does not work  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
I see this in 9.0 Release note:
- Support locale-specific regular expression processing with UTF-8 server encoding (Tom Lane)   Locale-specific regular
expressionfunctionality includes   case-insensitive matching and locale-specific character classes.
 

But character classes still does not work, example (git REL9_0_STABLE c767c3bd):
select version();                                                       version
               
 

------------------------------------------------------------------------------------------------------------------------PostgreSQL
9.0.0on x86_64-unknown-linux-gnu, compiled by GCC gcc (Debian 4.4.4-8) 4.4.5 20100728 (prerelease), 64-bit
 

--- CYRILLIC SMALL LETTER ZHE ~* CYRILLIC CAPITAL LETTER ZHE
select E'\320\266' ~* E'\320\226', E'\320\266' ~ '[[:alpha:]]+', 'g' ~ '[[:alpha:]]+';?column? | ?column? | ?column? 
----------+----------+----------t        | f        | t

all must be true, like below:

create database koi8 template template0 encoding 'koi8r' lc_collate 'ru_RU.KOI8-R' lc_ctype 'ru_RU.KOI8-R';
\c koi8
set client_encoding TO utf8;
select E'\326' ~* E'\366', E'\326' ~ '[[:alpha:]]+', 'g' ~ '[[:alpha:]]+';?column? | ?column? | ?column? 
----------+----------+----------t        | t        | t

As i can see in Tom's patch 0d323425 only functions like pg_wc_isalpha is changed, but
this pg_wc_isalpha is called from
static struct cvec *
cclass(struct vars * v,        /* context */          const chr *startp,  /* where the name starts */          const
chr*endp,    /* just past the end of the name */          int cases)          /* case-independent? */
 
function, and this function have comment "For the moment, assume that only char codes < 256 can be in these classes"
andit call pg_wc_isalpha like this:
 
for (i = 0; i <= UCHAR_MAX; i++)
{   if (pg_wc_isalpha((chr) i))       addchr(cv, (chr) i);
}
UCHAR_MAX is 255

I do not understand fully this algorithm of regular expressions, but i think cclass function also need fix.

-- 
Sergey Burladyan


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Proposal: plpgsql - "for in array" statement
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: documentation udpates to pgupgrade.html