Re: Notes about fixing regexes and UTF-8 (yet again)

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Notes about fixing regexes and UTF-8 (yet again)
Дата
Msg-id 28618.1329585314@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Notes about fixing regexes and UTF-8 (yet again)  (NISHIYAMA Tomoaki <tomoakin@staff.kanazawa-u.ac.jp>)
Ответы Re: Notes about fixing regexes and UTF-8 (yet again)  (Dimitri Fontaine <dimitri@2ndQuadrant.fr>)
Список pgsql-hackers
NISHIYAMA Tomoaki <tomoakin@staff.kanazawa-u.ac.jp> writes:
> I don't believe it is valid to ignore CJK characters above U+20000.
> If it is used for names, it will be stored in the database.
> If the behaviour is different from characters below U+FFFF, you will
> get a bug report in meanwhile.

I am skeptical that there is enough usage of such things to justify
slowing regexp operations down for everybody.  Note that it's not only
the initial probe of libc behavior that's at stake here --- the more
character codes are treated as letters, the larger the DFA transition
maps get and the more time it takes to build them.  So I'm unexcited
about just cranking up the loop limit in pg_ctype_get_cache.

> On the other hand, it is ok if processing of characters above U+10000
> is very slow, as far as properly processed, because it is considered
> rare.

Yeah, it's conceivable that we could implement something whereby
characters with codes above some cutoff point are handled via runtime
calls to iswalpha() and friends, rather than being included in the
statically-constructed DFA maps.  The cutoff point could likely be a lot
less than U+FFFF, too, thereby saving storage and map build time all
round.

However, that "we" above is the editorial "we".  *I* am not going to
do this.  Somebody who actually has a need for it should step up.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Erik Rijkers"
Дата:
Сообщение: pg_restore ignores PGDATABASE
Следующее
От: Jeff Janes
Дата:
Сообщение: Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)