Re: Collation versioning

Поиск
Список
Период
Сортировка
От Douglas Doole
Тема Re: Collation versioning
Дата
Msg-id CADE5jYKEa81RmX5RTvDnD+TQsdSVKks1QazD4aoiz+4SNV1=5A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Collation versioning  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: Collation versioning  (Greg Stark <stark@mit.edu>)
Список pgsql-hackers
On Sun, Sep 16, 2018 at 1:14 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
We have to start somewhere and indexes are the first thing that people
notice, and are much likely to actually be a problem (personally I've
encountered many cases of index corruption due to collation changes in
the wild, but never a constraint corruption,

Problems that people notice are good - it's the ones that they don't notice that cause real grief ;-)

Given that PostgreSQL requires equal values to be binary identical I think that you'll avoid a lot of the problems that caused me so much trouble in DB2. Even if someone creates a range constraint or partitioned table, the boundary values aren't typically going to be impacted (my French example was a little contrived). That said, there have been a few changes that would have much more obvious impacts. One language stopped treating 'ch' as a single collating element that came somewhere after 'c'+'z' and treated it as 'c'+'h' instead. And and one of the North Germanic languages moved an accented character from after 'z' to before 'z' (or maybe it was vice versa - I miss my library of Unicode presentations).

Once you get into downstream effects of changes (whether they are
recorded in the database or elsewhere), I think it's basically beyond
our event horizon.
... 
I think we should make a clear distinction between
things that invalidate the correct working of the database, and more
nebulous effects that we can't possibly track in general.

I agree that PostgreSQL can't resolve the downstream changes, but that's a very subtle distinction. As a user, if an upgrade caused my data to no longer comply with my carefully architected and database enforced BI rules, I would definitely argue that the correct working of the database has been invalidated. (You can make technical arguments about the OS upgrading the library, but the fundamental issue is that my PostgreSQL database is broken.)

You can probably argue that PostgreSQL and DB2 users look at the world differently, but that's why DB2 ended up shipping its own copies of the ICU library. Once a user creates an object using ICU vX, we made sure that version of the library was always available to avoid these problems. (The libraries were on a private path with non-standard names so there was no collision with the OS library. Fortunately I'd moved on before anyone really started complaining about why 5 copies of ICU were being installed when they weren't even using Unicode.)
 
[1] https://www.postgresql.org/message-id/CAEepm%3D30SQpEUjau%3DdScuNeVZaK2kJ6QQDCHF75u5W%3DCz%3D3Scw%40mail.gmail.com

I'd missed this post. In it you asked:

I wonder how it manages to deal with fr_CA's reversed secondary weighting
rule which requires you to consider diacritics in reverse order --
apparently abandoned in France but still used in Canada -- using a fixed
size space for state between calls. 

ucol_nextSortKeyPart() only keeps track of the current position in the generated sort key as its state. So, if you call it multiple times to generate the sort key piecemeal, it recomputes the entire sort key until it has enough bytes to satisfy your request. (That is, if you're doing 4 bytes at a time, on the first call it generates and returns bytes 1-4. On the second call it generates 1-8 and returns 5-8. Next it generates 1-12 and returns 9-12.) Needless to say, this gets very expensive very fast.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Online verification of checksums
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: Online verification of checksums