[HACKERS] ICU locales and text/char(n) SortSupport on Windows

Поиск

Список

Период

Сортировка

От	Peter Geoghegan
Тема	[HACKERS] ICU locales and text/char(n) SortSupport on Windows
Дата	17 сентября 2017 г. 01:33:53
Msg-id	CAH2-WznnOrK=u-Ui2+vVk+-exMvAk9=nLbyaYVSmWCpAJ5en+A@mail.gmail.com обсуждение исходный текст
Ответы	[HACKERS] !USE_WIDE_UPPER_LOWER compile errors in v10+ Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows
Список	pgsql-hackers

Дерево обсуждения

varstr_sortsupport() only allows Windows to use SortSupport with a
non-C-locale (when the server encoding happens to be UTF-8, which I
assume is the common case). This is because we (quite reasonably)
don't want to have to duplicate the ugly UTF-8 to UTF-16 conversion
hack from varstr_cmp() for the SortSupport authoritative comparator
(varstrfastcmp_locale() shouldn't get its own copy of this kludge,
because it's supposed to be "fast"). This broad restriction made sense
when Windows + UTF-8 + non-C-locale necessarily required the
aforementioned UTF-16 conversion kludge. However, iff an ICU locale is
in use on Windows (or any other platform), then we can always use
SortSupport, regardless of anything else (we should not have the core
code install a fmgr comparison shim that just calls varstr_cmp(),
though we still do). We don't actually need the UTF-16 kludge at all,
so we can use SortSupport without any special care.

The current state of affairs doesn't make any sense, AFAICT, and so
the restriction should be removed on general principle: we *already*
expect ICU to have no restrictions that are peculiar to Windows, as we
see in varstr_cmp() and str_tolower(). It's just arbitrary to hold on
to this restriction. This restriction also seems worth fixing because
Windows users are generally more likely to want to use ICU locales;
most of them would otherwise end up actually paying the overhead for
the UTF-16 kludge. (Presumably the UTF-16 conversion makes text
sorting *even slower* than it would be if we merely didn't do
SortSupport, which is to say: very slow indeed.)

In summary, we're currently attaching the use of SortSupport to the
wrong thing. We're treating this UTF-16 business as something that
implies a broad OS/platform restriction, when in fact it should be
treated as implying a restriction for one particular collation
provider only (a collation provider that happens to be built into
Windows, but isn't really special to us).

Attached patch shows what I'm getting at. This is untested, since I
don't use Windows. Proceed with caution.

On a related note, am I the only one that finds it questionable that
str_tolower() has an "#ifdef USE_WIDE_UPPER_LOWER" block that itself
contains an "#ifdef USE_ICU" block? It seems like those two things
might get conflated on some platforms. We don't want lower() to ever
not use the ICU infrastructure when an ICU collation is used, and yet
it's not obvious that that's impossible. I understand that the code in
regc_pg_locale.c kind of insists on using USE_WIDE_UPPER_LOWER
facilities, and that that was always accepted as legacy that ICU had
to live with. Maybe a static assertion is all that we need here (ICU
builds must also be USE_WIDE_UPPER_LOWER builds).

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

0001-Allow-ICU-to-use-SortSupport-on-Windows-with-UTF-8.patch

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

[HACKERS] ICU locales and text/char(n) SortSupport on Windows

Вложения