Re: Character classes

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Character classes
Дата
Msg-id 24386.1558375597@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Character classes  (PG Doc comments form <noreply@postgresql.org>)
Ответы Re: Character classes  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-docs
PG Doc comments form <noreply@postgresql.org> writes:
> On https://www.postgresql.org/docs/11/functions-matching.html paragraph
> 9.7.3.2. Bracket Expressions says "Standard character class names are:
> alnum, alpha, blank, cntrl, digit, graph, lower, print, punct, space, upper,
> xdigit". The class "ascii" exists, but is not mentioned (probably a
> combination of some of the other classes). Are there any other classes?

Hm, fair question.  I think the text means to say that these are the
character class names required by the POSIX regexp spec, which is
accurate.  A look into our src/backend/regex/regc_locale.c will show
you that we also implement "ascii", and no others.  That probably ought
to be documented.

> Do they work only for ASCII characters (e.g. '\u00A0' is not picked up
> by '[:blank:]')?

The POSIX ones are implemented by calling the C library, so it's whatever
the ctype.h and wctype.h functions think is appropriate for your LC_CTYPE
setting.

The 20-year-old reference in our text to ctype(3) seems rather unhelpful
today; in the first place, there's no such man page on my Linux systems,
and in the second place, wctype(3) is more important if it exists, and
in the third place what a reader actually wants to know is that this
is controlled by the LC_CTYPE server parameter.  It'd likely be better
to dump the man-page reference altogether and instead point readers to
our "Locale Support" chapter.

            regards, tom lane



В списке pgsql-docs по дате отправления:

Предыдущее
От: PG Doc comments form
Дата:
Сообщение: Character classes
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: Character classes