Re: pg_collation.collversion for C.UTF-8

Поиск
Список
Период
Сортировка
От Jeff Davis
Тема Re: pg_collation.collversion for C.UTF-8
Дата
Msg-id 56ef55fc2212334e1f72b3d8128106e9ab37fe5a.camel@j-davis.com
обсуждение исходный текст
Ответ на Re: pg_collation.collversion for C.UTF-8  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: pg_collation.collversion for C.UTF-8  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
On Thu, 2023-05-25 at 14:48 -0400, Tom Lane wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
> > What should we do with locales like C.UTF-8 in both libc and ICU?
>
> I vote for passing those to the existing C-specific code paths,

Great, this would be a big step toward solving the ICU usability issues
in this thread:

https://postgr.es/m/000b01d97465%24c34bbd60%2449e33820%24%40pcorp.us

> Probably "C", or "C.anything", or "POSIX", or "POSIX.anything".
> Case-independent might be good, but we haven't accepted such in
> the past, so I don't feel strongly about it.  (Arguably, passing
> lower case "c" to the provider would provide an "out" to anybody
> who dislikes our choices here.)

Patch attached with your suggestions. It's based on the first patch in
the series I posted here:

https://postgr.es/m/a4388fa3acabf7794ac39fdb471ad97eebdfbe11.camel@j-davis.com

We still need to consider backwards compatibility. If someone has a
collation with locale name C.UTF-8 in an earlier version, any change to
the interpretation of that locale name after an upgrade carries a
corruption risk. The risks are different in ICU vs libc:

  For ICU: iculocale=C in an earlier version was a mistake that must
have been explicitly requested by the user. However, if such a mistake
was made, the indexes would have been created using the ICU root
locale, which is very different from the C locale. So reinterpreting
iculocale=C as memcmp() would be likely to result in index corruption.
Patch 0002 (also based on a patch from the series linked above) solves
this with a pg_upgrade check for iculocale=C in versions 15 and
earlier. The upgrade check is not likely to affect many users, and
those it does affect have a mis-defined collation and would benefit
from the check.

  For libc: this change may affect any user who happened to have
LANG=C.UTF-8 in their environment at initdb time, which is probably a
lot of users, and some buildfarm members. However, the average risk
seems to be much lower, because we've gone a long time with the
assumption that C.UTF-8 has the same behavior as C, and this only
recently came up. Also, I'm not sure how obscure the cases are even if
there is a difference; perhaps they don't often occur in practice? It's
not clear to me how we mitigate this risk further, though.

Regards,
    Jeff Davis


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Cleaning up nbtree after logical decoding on standby work
Следующее
От: Kaiting Chen
Дата:
Сообщение: Re: Is NEW.ctid usable as table_tuple_satisfies_snapshot?