[HACKERS] ICU locales and text/char(n) SortSupport on Windows

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема [HACKERS] ICU locales and text/char(n) SortSupport on Windows
Дата
Msg-id CAH2-WznnOrK=u-Ui2+vVk+-exMvAk9=nLbyaYVSmWCpAJ5en+A@mail.gmail.com
обсуждение исходный текст
Ответы [HACKERS] !USE_WIDE_UPPER_LOWER compile errors in v10+  (Noah Misch <noah@leadboat.com>)
Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows  (Noah Misch <noah@leadboat.com>)
Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Список pgsql-hackers
varstr_sortsupport() only allows Windows to use SortSupport with a
non-C-locale (when the server encoding happens to be UTF-8, which I
assume is the common case). This is because we (quite reasonably)
don't want to have to duplicate the ugly UTF-8 to UTF-16 conversion
hack from varstr_cmp() for the SortSupport authoritative comparator
(varstrfastcmp_locale() shouldn't get its own copy of this kludge,
because it's supposed to be "fast"). This broad restriction made sense
when Windows + UTF-8 + non-C-locale necessarily required the
aforementioned UTF-16 conversion kludge. However, iff an ICU locale is
in use on Windows (or any other platform), then we can always use
SortSupport, regardless of anything else (we should not have the core
code install a fmgr comparison shim that just calls varstr_cmp(),
though we still do). We don't actually need the UTF-16 kludge at all,
so we can use SortSupport without any special care.

The current state of affairs doesn't make any sense, AFAICT, and so
the restriction should be removed on general principle: we *already*
expect ICU to have no restrictions that are peculiar to Windows, as we
see in varstr_cmp() and str_tolower(). It's just arbitrary to hold on
to this restriction. This restriction also seems worth fixing because
Windows users are generally more likely to want to use ICU locales;
most of them would otherwise end up actually paying the overhead for
the UTF-16 kludge. (Presumably the UTF-16 conversion makes text
sorting *even slower* than it would be if we merely didn't do
SortSupport, which is to say: very slow indeed.)

In summary, we're currently attaching the use of SortSupport to the
wrong thing. We're treating this UTF-16 business as something that
implies a broad OS/platform restriction, when in fact it should be
treated as implying a restriction for one particular collation
provider only (a collation provider that happens to be built into
Windows, but isn't really special to us).

Attached patch shows what I'm getting at. This is untested, since I
don't use Windows. Proceed with caution.

On a related note, am I the only one that finds it questionable that
str_tolower() has an "#ifdef USE_WIDE_UPPER_LOWER" block that itself
contains an "#ifdef USE_ICU" block? It seems like those two things
might get conflated on some platforms. We don't want lower() to ever
not use the ICU infrastructure when an ICU collation is used, and yet
it's not obvious that that's impossible. I understand that the code in
regc_pg_locale.c kind of insists on using USE_WIDE_UPPER_LOWER
facilities, and that that was always accepted as legacy that ICU had
to live with. Maybe a static assertion is all that we need here (ICU
builds must also be USE_WIDE_UPPER_LOWER builds).

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dmitry Dolgov
Дата:
Сообщение: Re: [HACKERS] [PATCH] Generic type subscripting
Следующее
От: Douglas Doole
Дата:
Сообщение: Re: [HACKERS] Add Roman numeral conversion to to_number