Re: Built-in CTYPE provider

Поиск
Список
Период
Сортировка
От Jeff Davis
Тема Re: Built-in CTYPE provider
Дата
Msg-id 90c32479a1f486e5ecad89cb6fe5508d1ae4cfd5.camel@j-davis.com
обсуждение исходный текст
Ответ на Re: Built-in CTYPE provider  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Built-in CTYPE provider  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Wed, 2023-12-20 at 14:24 -0500, Robert Haas wrote:
> This makes sense to me, too, but it feels like it might work out
> better for speakers of English than for speakers of other languages.

There's very little in the way of locale-specific tailoring for ctype
behaviors in ICU or glibc -- only for the 'az', 'el', 'lt', and 'tr'
locales. While English speakers like us may benefit from being aligned
with the default ctype behaviors, those behaviors are not at all
specific to 'en' locales in ICU or glibc.

Collation varies a lot more between locales. I wouldn't call memcmp
ideal for English ('Zebra' comes before 'apple', which seems wrong to
me). If memcmp sorting does favor any particular group, I would say it
favors programmers more than English speakers. But that could just be
my perspective and I certainly understand the point that memcmp
ordering is more tolerable for some languages than others.

> Right now, I tend to get databases that default to en_US.utf8, and if
> the default changed to C.utf8, then the case-comparison behavior
> might
> be different

en_US.UTF-8 and C.UTF-8 have the same ctype behavior.

>  For
> someone who is currently defaulting to es_ES.utf8 or fr_FR.utf8, a
> change to C.utf8 would be a much bigger problem, I would think.

Those locales all have the same ctype behavior.

It turns out that that en_US.UTF-8 and fr_FR.UTF-8 also have the same
collation order -- no tailoring beyond root collation according to CLDR
files for 'en' and 'fr' (though note that 'fr_CA' does have tailoring).
That doesn't mean the experience of switching to memcmp order is
exactly the same for a French speaker and an English speaker, but I
think it's interesting.

> That might be OK if they don't care about
> ordering for any purpose other than equality lookups, but otherwise
> it's going to force them to change the default, where today they
> don't
> have to do that.

To be clear, I haven't proposed changing the initdb default. This
thread is about adding a builtin provider with builtin ctype, which I
believe a lot of users would like.

It also might be the best chance we have to get to a reasonable default
behavior at some point in the future. It would be always available,
fast, stable, better semantics than "C" for many locales, and we can
document it. In any case, we don't need to decide that now. If the
builtin provider is useful, we should do it.

Regards,
    Jeff Davis




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: Add --check option to pgindent
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Remove MSVC scripts from the tree