Обсуждение: Character classes
The following documentation comment has been logged on the website: Page: https://www.postgresql.org/docs/11/functions-matching.html Description: On https://www.postgresql.org/docs/11/functions-matching.html paragraph 9.7.3.2. Bracket Expressions says "Standard character class names are: alnum, alpha, blank, cntrl, digit, graph, lower, print, punct, space, upper, xdigit". The class "ascii" exists, but is not mentioned (probably a combination of some of the other classes). Are there any other classes? Do they work only for ASCII characters (e.g. '\u00A0' is not picked up by '[:blank:]')? best regards geert
PG Doc comments form <noreply@postgresql.org> writes: > On https://www.postgresql.org/docs/11/functions-matching.html paragraph > 9.7.3.2. Bracket Expressions says "Standard character class names are: > alnum, alpha, blank, cntrl, digit, graph, lower, print, punct, space, upper, > xdigit". The class "ascii" exists, but is not mentioned (probably a > combination of some of the other classes). Are there any other classes? Hm, fair question. I think the text means to say that these are the character class names required by the POSIX regexp spec, which is accurate. A look into our src/backend/regex/regc_locale.c will show you that we also implement "ascii", and no others. That probably ought to be documented. > Do they work only for ASCII characters (e.g. '\u00A0' is not picked up > by '[:blank:]')? The POSIX ones are implemented by calling the C library, so it's whatever the ctype.h and wctype.h functions think is appropriate for your LC_CTYPE setting. The 20-year-old reference in our text to ctype(3) seems rather unhelpful today; in the first place, there's no such man page on my Linux systems, and in the second place, wctype(3) is more important if it exists, and in the third place what a reader actually wants to know is that this is controlled by the LC_CTYPE server parameter. It'd likely be better to dump the man-page reference altogether and instead point readers to our "Locale Support" chapter. regards, tom lane
On Tue, May 21, 2019 at 6:06 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > The 20-year-old reference in our text to ctype(3) seems rather unhelpful > today; in the first place, there's no such man page on my Linux systems, > and in the second place, wctype(3) is more important if it exists, and > in the third place what a reader actually wants to know is that this > is controlled by the LC_CTYPE server parameter. It'd likely be better > to dump the man-page reference altogether and instead point readers to > our "Locale Support" chapter. No opinion on the reference, but out of curiosity I hunted down the equivalent man page on a RHEL system. There it goes by ctype.h(0P), which makes some kind of sense: there isn't a ctype function, so it has no business in section 3, while wctype is a function so there is a wctype(3) along with a header page wctype.h(0P). 0P seems to be for POSIX headers, or something like that. BSDen don't seem to bother with this distinction and just provide ctype(3). -- Thomas Munro https://enterprisedb.com