Обсуждение: BUG #5766: regexp \y doesn't work properly when a word starts on ends with a UTF-8 char

Поиск
Список
Период
Сортировка

BUG #5766: regexp \y doesn't work properly when a word starts on ends with a UTF-8 char

От
"Grzegorz Daniluk"
Дата:
The following bug has been logged online:

Bug reference:      5766
Logged by:          Grzegorz Daniluk
Email address:      gdaniluk@gmail.com
PostgreSQL version: 9.0.1
Operating system:   Windows 7 64-bit
Description:        regexp \y doesn't work properly when a word starts on
ends with a UTF-8 char
Details:

select regexp_replace('Foo Pasaż Bar', E'\\yPasaż\\y', '');

Above query doesn't replace the word 'Pasaż'. It returns full 'Foo Pasaż
Bar' string, when the correct behavior is to return 'Foo  Bar'.

When the 'ż' is replaced with normal ASCII character like 'z',
regexp_replace works as expected.

My db details:
ENCODING = 'UTF8'
LC_COLLATE = 'Polish_Poland.1250'
LC_CTYPE = 'Polish_Poland.1250'

Re: BUG #5766: regexp \y doesn't work properly when a word starts on ends with a UTF-8 char

От
Tom Lane
Дата:
"Grzegorz Daniluk" <gdaniluk@gmail.com> writes:
> select regexp_replace('Foo Pasaż Bar', E'\\yPasaż\\y', '');

> Above query doesn't replace the word 'Pasaż'. It returns full 'Foo Pasaż
> Bar' string, when the correct behavior is to return 'Foo  Bar'.

Is this problem limited to \y, or do other regex operations that depend
on locale-specific character classification also not work for you?

            regards, tom lane