Re: ts_locale.c: why no t_isalnum() test?

Поиск
Список
Период
Сортировка
От Corey Huinker
Тема Re: ts_locale.c: why no t_isalnum() test?
Дата
Msg-id CADkLM=fgm4_A7b9_pXE=QPCB+JpxD4sTRue4SXKk9TvkB0LWig@mail.gmail.com
обсуждение исходный текст
Ответ на ts_locale.c: why no t_isalnum() test?  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: ts_locale.c: why no t_isalnum() test?
Список pgsql-hackers
On Wed, Oct 5, 2022 at 3:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I happened to wonder why various places are testing things like

#define ISWORDCHR(c)    (t_isalpha(c) || t_isdigit(c))

rather than using an isalnum-equivalent test.  The direct answer
is that ts_locale.c/.h provides no such test function, which
apparently is because there's not a lot of potential callers in
the core code.  However, both pg_trgm and ltree could benefit
from adding one.

There's no semantic hazard here: the documentation I consulted
is all pretty explicit that is[w]alnum is true exactly when
either is[w]alpha or is[w]digit are.  For example, POSIX saith

    The iswalpha() and iswalpha_l() functions shall test whether wc is a
    wide-character code representing a character of class alpha in the
    current locale, or in the locale represented by locale, respectively;
    see XBD Locale.

    The iswdigit() and iswdigit_l() functions shall test whether wc is a
    wide-character code representing a character of class digit in the
    current locale, or in the locale represented by locale, respectively;
    see XBD Locale.

    The iswalnum() and iswalnum_l() functions shall test whether wc is a
    wide-character code representing a character of class alpha or digit
    in the current locale, or in the locale represented by locale,
    respectively; see XBD Locale.

While I didn't try to actually measure it, these functions don't
look remarkably cheap.  Doing char2wchar() twice when we only need
to do it once seems silly, and the libc functions themselves are
probably none too cheap for multibyte characters either.

Hence, I propose the attached.  I got rid of some places that were
unnecessarily checking pg_mblen before applying t_iseq(), too.

                        regards, tom lane


I see this is already committed, but I'm curious, why do t_isalpha and t_isdigit have the pair of /* TODO */ comments? This unfinished business isn't explained anywhere in the file.


 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Decoupling antiwraparound autovacuum from special rules around auto cancellation
Следующее
От: Tom Lane
Дата:
Сообщение: Re: ts_locale.c: why no t_isalnum() test?