Re: UTF8 regexp and char classes still does not work

Поиск

Список

Период

Сортировка

От	Sergey Burladyan
Тема	Re: UTF8 regexp and char classes still does not work
Дата	28 сентября 2010 г. 19:37:47
Msg-id	8739sta40g.fsf@home.progtech.ru обсуждение исходный текст
Ответ на	Re: UTF8 regexp and char classes still does not work (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-hackers

Дерево обсуждения

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Hmm, you're right.  I only tested that on Latin1 characters, for which
> it does work because those have Unicode points below 256.  I'm not
> sure of a reasonable solution for the general case --- we certainly
> don't want this function iterating up to 2^21 or thereabouts.

Yes, i understand this problem. How perl do this? May be this Unicode table can
be precomputed or linked to postgres binary from external source?

> Your test case seems to be using KOI8 encoding, though, which doesn't
> have anything to do with UTF8 behavior.

It's just for example of expected result. See first test, it is UTF8, two bytes per character:
> > --- CYRILLIC SMALL LETTER ZHE ~* CYRILLIC CAPITAL LETTER ZHE
> > select E'\320\266' ~* E'\320\226', E'\320\266' ~ '[[:alpha:]]+', 'g' ~ '[[:alpha:]]+';
> >  ?column? | ?column? | ?column? 
> > ----------+----------+----------
> >  t        | f        | t


-- 
Sergey Burladyan

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: UTF8 regexp and char classes still does not work