Re: [18] Policy on IMMUTABLE functions and Unicode updates

Поиск
Список
Период
Сортировка
От Jeremy Schneider
Тема Re: [18] Policy on IMMUTABLE functions and Unicode updates
Дата
Msg-id CA+fnDAbmn2d5tzZsj-4wmD0jApHTsg_zGWUpteb=OMSsX5rdAg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [18] Policy on IMMUTABLE functions and Unicode updates  (Laurenz Albe <laurenz.albe@cybertec.at>)
Ответы Re: [18] Policy on IMMUTABLE functions and Unicode updates
Re: [18] Policy on IMMUTABLE functions and Unicode updates
Список pgsql-hackers
On Tue, Jul 23, 2024 at 1:11 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Mon, 2024-07-22 at 13:55 -0400, Robert Haas wrote:
> On Mon, Jul 22, 2024 at 1:18 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > I understand the difficulty (madness) of discussing every Unicode
> > change.  If that's unworkable, my preference would be to stick with some
> > Unicode version and never modify it, ever.
>
> I think that's a completely non-viable way forward. Even if everyone
> here voted in favor of that, five years from now there will be someone
> who shows up to say "I can't use your crappy software because the
> Unicode tables haven't been updated in five years, here's a patch!".
> And, like, what are we going to do? Still keeping shipping the 2024
> version of Unicode four hundred years from now, assuming humanity and
> civilization and PostgreSQL are still around then? Holding something
> still "forever" is just never going to work.

I hear you.  It would be interesting to know what other RDBMS do here.

Other RDBMS are very careful not to corrupt databases, afaik including function based indexes, by changing Unicode. I’m not aware of any other RDBMS that updates Unicode versions in place; instead they support multiple Unicode versions and do not drop the old ones.

See also:

I know Jeff mentioned that Unicode tables copied into Postgres for normalization have been updated a few times. Did anyone ever actually discuss the fact that things like function based indexes can be corrupted by this, and weigh the reasoning? Are there past mailing list threads touching on the corruption problem and making the argument why updating anyway is the right thing to do? I always assumed that nobody had really dug deeply into this before the last few years.

I do agree it isn’t as broad of a problem as linguistic collation itself, which causes a lot more widespread corruption when it changes (as we’ve seen from glibc 2.28 and also other older hacker mailing list threads about smaller changes in older glibc versions corrupting databases). For now, Postgres only has code-point collation and the other Unicode functions mentioned in this thread.

-Jeremy

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: [18] Policy on IMMUTABLE functions and Unicode updates
Следующее
От: Noah Misch
Дата:
Сообщение: Re: Use read streams in CREATE DATABASE command when the strategy is wal_log