Обсуждение: Character classes

Поиск
Список
Период
Сортировка

Character classes

От
PG Doc comments form
Дата:
The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/11/functions-matching.html
Description:

On https://www.postgresql.org/docs/11/functions-matching.html paragraph
9.7.3.2. Bracket Expressions says "Standard character class names are:
alnum, alpha, blank, cntrl, digit, graph, lower, print, punct, space, upper,
xdigit". The class "ascii" exists, but is not mentioned (probably a
combination of some of the other classes). Are there any other classes? Do
they work only for ASCII characters (e.g. '\u00A0' is not picked up by
'[:blank:]')?
best regards
geert

Re: Character classes

От
Tom Lane
Дата:
PG Doc comments form <noreply@postgresql.org> writes:
> On https://www.postgresql.org/docs/11/functions-matching.html paragraph
> 9.7.3.2. Bracket Expressions says "Standard character class names are:
> alnum, alpha, blank, cntrl, digit, graph, lower, print, punct, space, upper,
> xdigit". The class "ascii" exists, but is not mentioned (probably a
> combination of some of the other classes). Are there any other classes?

Hm, fair question.  I think the text means to say that these are the
character class names required by the POSIX regexp spec, which is
accurate.  A look into our src/backend/regex/regc_locale.c will show
you that we also implement "ascii", and no others.  That probably ought
to be documented.

> Do they work only for ASCII characters (e.g. '\u00A0' is not picked up
> by '[:blank:]')?

The POSIX ones are implemented by calling the C library, so it's whatever
the ctype.h and wctype.h functions think is appropriate for your LC_CTYPE
setting.

The 20-year-old reference in our text to ctype(3) seems rather unhelpful
today; in the first place, there's no such man page on my Linux systems,
and in the second place, wctype(3) is more important if it exists, and
in the third place what a reader actually wants to know is that this
is controlled by the LC_CTYPE server parameter.  It'd likely be better
to dump the man-page reference altogether and instead point readers to
our "Locale Support" chapter.

            regards, tom lane



Re: Character classes

От
Thomas Munro
Дата:
On Tue, May 21, 2019 at 6:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The 20-year-old reference in our text to ctype(3) seems rather unhelpful
> today; in the first place, there's no such man page on my Linux systems,
> and in the second place, wctype(3) is more important if it exists, and
> in the third place what a reader actually wants to know is that this
> is controlled by the LC_CTYPE server parameter.  It'd likely be better
> to dump the man-page reference altogether and instead point readers to
> our "Locale Support" chapter.

No opinion on the reference, but out of curiosity I hunted down the
equivalent man page on a RHEL system.  There it goes by ctype.h(0P),
which makes some kind of sense: there isn't a ctype function, so it
has no business in section 3, while wctype is a function so there is a
wctype(3) along with a header page wctype.h(0P).  0P seems to be for
POSIX headers, or something like that.  BSDen don't seem to bother
with this distinction and just provide ctype(3).

-- 
Thomas Munro
https://enterprisedb.com