Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE andwork_mem values
От | Peter Geoghegan |
---|---|
Тема | Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE andwork_mem values |
Дата | |
Msg-id | CAH2-WzkD2D0pT5NX4+MitH420O6Lfn5aiD5zWqVKuVq3qOSDBQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE andwork_mem values (Peter Geoghegan <pg@bowt.ie>) |
Ответы |
Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-bugs |
On Mon, Aug 14, 2017 at 12:54 PM, Peter Geoghegan <pg@bowt.ie> wrote: > I might have used the wrong terminology on this "locales vs. > collations" thing. Perhaps what we actually need is pg_collation > entries at initdb time that are an enumeration of all locales *and > their regions*, as opposed to all locales. I'm researching this now. I've figured this out. It's not a matter of "locales vs. collations" -- it's a matter of avoiding skipping some ICU collations during initdb, which we seem to do right now. CollationCreate() is being passed "if_not_exists = true" for ICU collations during initdb for ICU collations, within pg_import_system_collations(). We're actually not creating new entries for some collations that get their own distinct listing/ucol_countAvailable() iteration. This happens because the ICU-wise name isn't fully spelled out (country wasn't included for entries that had one), and so we incorrectly ignore some collation entries as "duplicates". (I don't know why some regional variants, like de_AT, are still included in master's pg_collation, but most regional variants are not.) Attached self-contained program should give you some idea what I'm talking about. Sample output: $ ./icu-coll-versions | grep Spanish Spanish | es | es | | es | es Spanish (Latin America) | es_419 | es | 419 | es | es Spanish (Argentina) | es_AR | es | AR | es | es Spanish (Bolivia) | es_BO | es | BO | es | es Spanish (Chile) | es_CL | es | CL | es | es Spanish (Colombia) | es_CO | es | CO | es | es Spanish (Costa Rica) | es_CR | es | CR | es | es Spanish (Cuba) | es_CU | es | CU | es | es Spanish (Dominican Republic) | es_DO | es | DO | es | es Spanish (Ceuta & Melilla) | es_EA | es | EA | es | es Spanish (Ecuador) | es_EC | es | EC | es | es Spanish (Spain) | es_ES | es | ES | es | es Spanish (Equatorial Guinea) | es_GQ | es | GQ | es | es Spanish (Guatemala) | es_GT | es | GT | es | es Spanish (Honduras) | es_HN | es | HN | es | es Spanish (Canary Islands) | es_IC | es | IC | es | es Spanish (Mexico) | es_MX | es | MX | es | es Spanish (Nicaragua) | es_NI | es | NI | es | es Spanish (Panama) | es_PA | es | PA | es | es Spanish (Peru) | es_PE | es | PE | es | es Spanish (Philippines) | es_PH | es | PH | es | es Spanish (Puerto Rico) | es_PR | es | PR | es | es Spanish (Paraguay) | es_PY | es | PY | es | es Spanish (El Salvador) | es_SV | es | SV | es | es Spanish (United States) | es_US | es | US | es | es Spanish (Uruguay) | es_UY | es | UY | es | es Spanish (Venezuela) | es_VE | es | VE | es | es As an example, "es_VE" is a canonical locale name here. Note that there are many more collations listed here than Spanish ICU pg_collation entries that you'll find following initdb (forgetting about the keyword variant stuff, which is really another issue). In order to get stable pg_collation names, we should probably use country code within pg_import_system_collations(), for collations where there is a country code. We also need to add script (using uloc_getScript()), so that those collations with a non-default script (e.g. "Serbian (Latin, Bosnia & Herzegovina)") similarly have their own pg_collation entries (FYI, "script" isn't broken out into its own column by my test program). In short, every row that this test program outputs should have a pg_collation entry after initdb -- the number should match exactly. Do we really need to pass "if_not_exists = true", anyway? Why shouldn't initdb fail if there are apparent duplicate ICU collations? BTW, I suspect that this is why very old ICU versions appeared to not accept language tags constructed with subtags produced by ucol_getKeywordValuesForLocale(), per eccead9's commit message. The collation's country/script was omitted, so concatenating variants somehow didn't make sense (wrong script?). It certainly makes no sense that earlier ICU versions just didn't get a valid language tag when using the infrastructure that is supposed to be used to build language tags. It may be necessary to revert eccead9, once this understanding of the situation is confirmed. -- Peter Geoghegan -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Вложения
В списке pgsql-bugs по дате отправления:
Следующее
От: Tom LaneДата:
Сообщение: Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values