Re: Collation versioning

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: Collation versioning
Дата
Msg-id CAEepm=3BE0W92MohEtHQL8nyfJs2bYgoHr_c_pxdg3XVp5xOhw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Collation versioning  (Stephen Frost <sfrost@snowman.net>)
Ответы Re: Collation versioning  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-hackers
On Mon, Sep 17, 2018 at 9:02 AM Stephen Frost <sfrost@snowman.net> wrote:
> * Thomas Munro (thomas.munro@enterprisedb.com) wrote:
> > Once you get into downstream effects of changes (whether they are
> > recorded in the database or elsewhere), I think it's basically beyond
> > our event horizon.  Why and when did the collation definition change
> > (bug fix in CLDR, decree by the Académie Française taking effect on 1
> > January 2019, ...)?  We could all use bitemporal databases and
> > multi-version ICU, but at some point it all starts to look like an
> > episode of Dr Who.  I think we should make a clear distinction between
> > things that invalidate the correct working of the database, and more
> > nebulous effects that we can't possibly track in general.
>
> I tend to agree in general, but I don't think it's beyond us to consider
> multi-version ICU and being able to perform online reindexing (such that
> a given system could be migrated from one collation to another over a
> time while the system is still online, instead of having to take a
> potentially long downtime hit to rebuild indexes after an upgrade, or
> having to rebuild the entire system using some kind of logical
> replication...).

It's a very interesting idea with a high nerd-sniping factor[1].
Practically speaking, I wonder if you can actually do that with
typical Linux distributions where the ICU data is in a shared library
(eg libicudata.so.57), and may also be dependent on the ICU code
version (?) -- do you run into problems linking to several of them at
the same time?  Maybe you have to ship your own ICU collations in
"data" form to pull that off.  But someone mentioned that
distributions don't like you to do that (likewise for tzinfo and other
such things that no one wants 42 copies of on their system).
Actually, if I had infinite resources I'd really like to go and make
libc support multiple collation versions with a standard interface
(technically easy, bureaucratically hard); I don't really like leaving
libc behind.  But I digress.

I'd like to propose the 3 more humble goals I mentioned a few messages
back as earlier steps.  OS collation changes aren't really like Monty
Python's Spanish Inquisition: they usually hit you when you're doing
major operating system upgrades or setting up a streaming replica to a
different OS version IIUC.  That is, they probably happen during
maintenance windows when REINDEX would hopefully be plausible, and
presumably critical systems get tested on the new OS version before
production is upgraded.  It'd be kind to our users to make the problem
non-silent at that time so they can plan for it (and of course also
alert them if it happens when nobody expects it, because knowing you
have a problem is better than not knowing).

[1] https://xkcd.com/356/

--
Thomas Munro
http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: XMLNAMESPACES (was Re: Clarification of nodeToString() use cases)
Следующее
От: Tomas Vondra
Дата:
Сообщение: infinite loop in parallel hash joins / DSA / get_best_segment