Re: Collation versioning

Поиск
Список
Период
Сортировка
От Douglas Doole
Тема Re: Collation versioning
Дата
Msg-id CADE5jYKn6FA0a35P+_HJuSb_abq-D5Z-nE6rO9brjpu8jBTMgA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Collation versioning  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: Collation versioning  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Re: Collation versioning  (Thomas Munro <thomas.munro@enterprisedb.com>)
Re: Collation versioning  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Список pgsql-hackers
On Sun, Sep 16, 2018 at 1:20 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
3.  Fix the tracking of when reindexes need to be rebuilt, so that you
can't get it wrong (as you're alluding to above).

I've mentioned this in the past, but didn't seem to get any traction, so I'll try it again ;-)

The focus on indexes when a collation changes is, in my opinion, the least of the problems. You definitely have to worry about indexes, but they can be easily rebuilt. What about other places where collation is hardened into the system, such as constraints?

For example, in ICU 4.6 the handling of accents changed for French. Previously accents were considered right-to-left but ICU 4.6 reversed this. So consider a constraint like CHECK COL < 'coté' (last letter is U+00E9, small letter e with acute). Prior to ICU 4.6 the value 'côte' (second letter is U+00F4, small letter o with circumflex) would have passed this constraint. With 4.6 or later it would be rejected because of the accent ordering change. As soon as the collation changes, this table becomes inconsistent and a reindex isn't going to help it. This becomes a data cleansing problem at this point (which sucks for the user because their data was clean immediately prior to the "upgrade").

There have similarly been cases where ICU changes have caused equal characters to become unequal and vice versa. (Unfortunately all my ICU notes with examples are at my previous employer.) Consider the effect on RI constraints. The primary key can be fixed with a reindex (although dealing with two existing values becoming equal is a pain). But then the user also has to deal with the foreign keys since they may now have foreign keys which have no match in the primary key.

And constraints problems are even easier than triggers. Consider a database with complex BI rules that are implemented through triggers that fire when values are/are not equal. If the equality of strings change, there could be bad data throughout the tables. (At least with constraints the inter-column dependencies are explicit in the catalogs. With triggers anything goes.)

All this collation stuff is great, and I know users want it, but it feels like were pushing them out of an airplane with a ripped parachute every time the collation libraries change. Maybe they'll land safely or maybe things will get very messy.

- Doug
Salesforce

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: XMLNAMESPACES (was Re: Clarification of nodeToString() use cases)
Следующее
От: Andrew Gierth
Дата:
Сообщение: Re: Collation versioning