BUG #7999: Regexp with utf8

Поиск
Список
Период
Сортировка
От somloieater@gmail.com
Тема BUG #7999: Regexp with utf8
Дата
Msg-id E1UKnf7-0005Sa-L4@wrigleys.postgresql.org
обсуждение исходный текст
Ответы Re: BUG #7999: Regexp with utf8  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      7999
Logged by:          david
Email address:      somloieater@gmail.com
PostgreSQL version: 9.1.8
Operating system:   linux
Description:        =



\y and \Y do not behave correctly next to
multibyte utf-8 characters - they seem to invert their senses=CB=90

Propper behaivour with ascii e
'es'~$$\y[e=C9=9B]s$$  =3D> t =

Inverted behaviour with epsilon
'=C9=9Bs'~$$\y[e=C9=9B]s$$  =3D> f
'=C9=9Bs'~$$[e=C9=9B]\ys$$  =3D> t
'=C9=9Bs'~$$[e=C9=9B]\Ys$$  =3D> f

This seems to be a case of utf8 characters not being recognised as
word-forming:

'=C9=9B'~$$\w'$$ =3D> f

I've checked with a few other characters which are >1byte in utf8. U+00F0
counds as \w, but nothing I've tried > FF matches. I wonder if it's
something to do with >256? =


In case anyone else hits this bug, replacing \y with
  (^|$|\s|[[:punct:]]) seems to work for me, although it's ugly.

В списке pgsql-bugs по дате отправления:

Предыдущее
От: John R Pierce
Дата:
Сообщение: Re: BUG #7998: Could not able to connect database
Следующее
От: roberto.menoncin@netspa.it
Дата:
Сообщение: BUG #8000: ExclusiveLock on a simple SELECT ?