Re: Collation version tracking for macOS
От | Thomas Munro |
---|---|
Тема | Re: Collation version tracking for macOS |
Дата | |
Msg-id | CA+hUKGJtmxV43_zjRdJxxEzpAZoQ5BUhzM2N9_Njh85oTt564g@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Collation version tracking for macOS (Jeff Davis <pgsql@j-davis.com>) |
Ответы |
Re: Collation version tracking for macOS
|
Список | pgsql-hackers |
On Wed, Nov 30, 2022 at 1:32 PM Jeff Davis <pgsql@j-davis.com> wrote: > On Wed, 2022-11-30 at 10:29 +1300, Thomas Munro wrote: > > On Wed, Nov 30, 2022 at 9:59 AM Jeff Davis <pgsql@j-davis.com> wrote: > > > Here's what I found for the 'ar' locale (firstminor/lastminor are > > > the > > > icu library versions, firstcollversion/lastcollversion are their > > > respective collation versions for the given locale): > > > > > > firstminor | lastminor | firstcollversion | lastcollversion > > > ------------+-----------+------------------+----------------- > > > 60.1 | 60.3 | 153.80.32 | 153.80.32.1 > > > 64.1 | 64.2 | 153.96.35 | 153.97.35.8 > > > 68.1 | 68.2 | 153.14.38 | 153.14.38.8 > > > (3 rows) > > > > Right, this fits with what I said earlier: the third component is > > CLDR > > major, fourth component is CLDR minor except from ICU 61 on the CLDR > > minor is << 3'd (X.X.38.8 means CLDR 38.1). > > What about 64.1 -> 64.2? That changed the *second* component from 96 -> > 97. Are we agreed that collations can materially change in minor ICU > releases? That means that the Unicode/UCA version switched from 12 to 12.1, so that's a confirmed sighting of a UCA minor version bump within one ICU major version. Let's see what the purpose of that Unicode minor release was[1]: "Unicode 12.1 adds exactly one character, for a total of 137,929 characters. The new character added to Version 12.1 is: U+32FF SQUARE ERA NAME REIWA Version 12.1 adds that single character to enable software to be rapidly updated to support the new Japanese era name in calendrical systems and date formatting. The new Japanese era name was officially announced on April 1, 2019, and is effective as of May 1, 2019." Wow! Wikipedia says[2] "the "rei" character 令 has never appeared before". The sort order of characters that didn't previously exist is a special topic. In theory they can't hurt you because you shouldn't have been using them, but PostgreSQL doesn't enforce that (other systems do), so you could be exposed to a change from whatever default ordering the non-existent codepoint had for random implementation reasons to some deliberate ordering which may or may not be the same. Are all Unicode/UCA minor versions of that type? I dunno. Something to research, but [3] is far too vague and [4] is about other problems. [1] https://unicode.org/versions/Unicode12.1.0/ [2] https://en.wikipedia.org/wiki/Reiwa [3] https://www.unicode.org/versions/#major_minor [4] https://www.unicode.org/policies/stability_policy.html
В списке pgsql-hackers по дате отправления: