Re: Order changes in PG16 since ICU introduction

Поиск
Список
Период
Сортировка
От Jeff Davis
Тема Re: Order changes in PG16 since ICU introduction
Дата
Msg-id d321c19850a236e76850f00437a34b8f8f1abb90.camel@j-davis.com
обсуждение исходный текст
Ответ на Re: Order changes in PG16 since ICU introduction  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Order changes in PG16 since ICU introduction  ("Daniel Verite" <daniel@manitou-mail.org>)
Список pgsql-hackers
On Fri, 2023-04-21 at 16:00 -0400, Tom Lane wrote:
> I think I might like this idea, except for one thing: you're
> imagining
> that the locale doesn't control anything except string comparisons.
> What about to_upper/to_lower, character classifications in regexes,
> etc?

If provider='libc' and LC_CTYPE='C', str_toupper/str_tolower are
handled with asc_tolower/asc_toupper. The regex character
classification is done with pg_char_properties. In these cases neither
ICU nor libc is used; it's just code in postgres.

libc is special in that you can set LC_COLLATE and LC_CTYPE separately,
so that different locales are used for sorting and character
classification. That's potentially useful to set LC_COLLATE to C for
performance reasons, while setting LC_CTYPE to a useful locale. We
don't allow ICU to set collation and ctype separately (it would be
possible to allow it, but I don't think there's a huge demand and it's
arguably inconsistent to set them differently).

> (I'm not sure whether those operations can get redirected to ICU
> today
> or whether they still always go to libc, but we'll surely want to fix
> it eventually if the latter is still true.)

Those operations do get redirected to ICU today. There are extensions
that call locale-sensitive libc functions directly, and obviously those
won't use ICU.


> Aside from the user-surprise issues discussed up to now, pg_dump
> scripts
> emitted by pre-v15 pg_dump are not going to contain LOCALE_PROVIDER
> clauses in CREATE DATABASE, and people are going to be very unhappy
> if that means they suddenly get totally different locale semantics
> after restoring into a new DB.

Agreed.

>   I think we need some plan for mapping
> libc-style locale specs into ICU locales so that we can make that
> more nearly transparent.

ICU does a reasonable job mapping libc-like locale names to ICU
locales, e.g. en_US to en-US, etc. The ordering semantics aren't
guaranteed to be the same, of course (because the libc-locales are
platform-dependent), but it's at least conceptually the same locale.

>
> Maybe this means we are not ready to do ICU-by-default in v16.
> It certainly feels like there might be more here than we want to
> start designing post-feature-freeze.

This thread is already on the Open Items list. As long as it's not too
disruptive to others I'll leave it as-is for now to see how this sorts
out. Right now it's not clear to me how much of this is a v15 issue vs
a v16 issue.

Regards,
    Jeff Davis




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Add two missing tests in 035_standby_logical_decoding.pl
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Add two missing tests in 035_standby_logical_decoding.pl