Re: Order changes in PG16 since ICU introduction

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Order changes in PG16 since ICU introduction
Дата
Msg-id CA+TgmobxgvonJT8vO_v0k9_rgDmn0CKvTe48Z=w95rrBqC95KA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Order changes in PG16 since ICU introduction  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: Order changes in PG16 since ICU introduction  (Jeff Davis <pgsql@j-davis.com>)
Re: Order changes in PG16 since ICU introduction  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
On Fri, Apr 21, 2023 at 3:25 PM Jeff Davis <pgsql@j-davis.com> wrote:
> I am also having second thoughts about accepting "C" or "POSIX" as an
> ICU locale and transforming it to "en-US-u-va-posix" in v16. It's not
> terribly useful (why not just use memcmp()?), it's not fast in my
> measurements (en-US is faster), so maybe it's better to just throw an
> error and tell the user to use C (or provider=none as I suggest
> above)?

I mean, to renew a complaint I've made previously, how the heck is
anyone supposed to understand what's going on here?

We have no meaningful documentation of how to select an ICU locale
that works for you. We have a couple of examples and a suggestion that
you should use BCP 47. But when I asked before for documentation
references, the ones you provided were not clear, basically
incomprehensible. In follow-up discussion, you admitted you'd had to
consult the source code to figure certain things out.

And the fact that "C" or "POSIX" gets transformed into
"en-US-u-va-posix" is also completely documented. That string appears
twice in the code, but zero times in the documentation. There's code
to do it, but users shouldn't have to read code, and it wouldn't help
much if they did, because the code comments don't really explain the
rationale behind this choice either.

I find the fact that people are having trouble here completely
predictable. Of course if people ask for "C" and the system tells them
that it's using "en-US-u-va-posix" instead they're going to be
confused and ask questions, exactly as is happening here. glibc
collations aren't particularly well-documented either, but people have
some experience with, and they can get a list of values that have a
chance of working from /usr/share/locale, and they know what "C"
means. Nobody knows what "en-US-u-va-posix" is. It's not even
Googleable, really, whereas "C locale" is.

My opinion is that the switch to using ICU by default is ill-advised
and should be reverted. The compatibility break isn't worth whatever
advantages ICU may have, the documentation to allow people to
transition to ICU with reasonable effort doesn't exist, and the fact
that within weeks of feature freeze people who know a lot about
PostgreSQL are struggling to get the behavior they want is a really
bad sign.

--
Robert Haas
EDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nathan Bossart
Дата:
Сообщение: Re: New committers: Nathan Bossart, Amit Langote, Masahiko Sawada
Следующее
От: Nathan Bossart
Дата:
Сообщение: Re: optimize several list functions with SIMD intrinsics