Re: BUG #7999: Regexp with utf8

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: BUG #7999: Regexp with utf8
Дата
Msg-id 13363.1364401197@sss.pgh.pa.us
обсуждение исходный текст
Ответ на BUG #7999: Regexp with utf8  (somloieater@gmail.com)
Список pgsql-bugs
somloieater@gmail.com writes:
> PostgreSQL version: 9.1.8

> I've checked with a few other characters which are >1byte in utf8. U+00F0
> counds as \w, but nothing I've tried > FF matches. I wonder if it's
> something to do with >256?

Yup.  This is partially resolved in PG 9.2, but will never be fixed in
older branches.  From the commit log:

    Also, remove the hard-wired limitation to not consider wctype.h results for
    character codes above 255.  It turns out that we can't push the limit as
    far up as I'd originally hoped, because the regex colormap code is not
    efficient enough to cope very well with character classes containing many
    thousand letters, which a Unicode locale is entirely capable of producing.
    Still, we can push it up to U+7FF (which I chose as the limit of 2-byte
    UTF8 characters), which will at least make Eastern Europeans happy pending
    a better solution.  Thus, this commit resolves the specific complaint in
    bug #6457, but not the more general issue that letters of non-western
    alphabets are mostly not recognized as matching [[:alpha:]].

            regards, tom lane

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: BUG #8000: ExclusiveLock on a simple SELECT ?
Следующее
От: Josh Berkus
Дата:
Сообщение: Re: BUG #7969: Postgres Recovery Fatal With: "incorrect local pin count:2"