Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE andwork_mem values

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE andwork_mem values
Дата
Msg-id CAH2-WzkD2D0pT5NX4+MitH420O6Lfn5aiD5zWqVKuVq3qOSDBQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE andwork_mem values  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
On Mon, Aug 14, 2017 at 12:54 PM, Peter Geoghegan <pg@bowt.ie> wrote:
> I might have used the wrong terminology on this "locales vs.
> collations" thing. Perhaps what we actually need is pg_collation
> entries at initdb time that are an enumeration of all locales *and
> their regions*, as opposed to all locales. I'm researching this now.

I've figured this out. It's not a matter of "locales vs. collations"
-- it's a matter of avoiding skipping some ICU collations during
initdb, which we seem to do right now.

CollationCreate() is being passed "if_not_exists = true" for ICU
collations during initdb for ICU collations, within
pg_import_system_collations(). We're actually not creating new entries
for some collations that get their own distinct
listing/ucol_countAvailable() iteration. This happens because the
ICU-wise name isn't fully spelled out (country wasn't included for
entries that had one), and so we incorrectly ignore some collation
entries as "duplicates".

(I don't know why some regional variants, like de_AT, are still
included in master's pg_collation, but most regional variants are
not.)

Attached self-contained program should give you some idea what I'm
talking about. Sample output:

$ ./icu-coll-versions | grep Spanish
Spanish                                           | es          | es
     |           | es        | es
Spanish (Latin America)                           | es_419      | es
     | 419       | es        | es
Spanish (Argentina)                               | es_AR       | es
     | AR        | es        | es
Spanish (Bolivia)                                 | es_BO       | es
     | BO        | es        | es
Spanish (Chile)                                   | es_CL       | es
     | CL        | es        | es
Spanish (Colombia)                                | es_CO       | es
     | CO        | es        | es
Spanish (Costa Rica)                              | es_CR       | es
     | CR        | es        | es
Spanish (Cuba)                                    | es_CU       | es
     | CU        | es        | es
Spanish (Dominican Republic)                      | es_DO       | es
     | DO        | es        | es
Spanish (Ceuta & Melilla)                         | es_EA       | es
     | EA        | es        | es
Spanish (Ecuador)                                 | es_EC       | es
     | EC        | es        | es
Spanish (Spain)                                   | es_ES       | es
     | ES        | es        | es
Spanish (Equatorial Guinea)                       | es_GQ       | es
     | GQ        | es        | es
Spanish (Guatemala)                               | es_GT       | es
     | GT        | es        | es
Spanish (Honduras)                                | es_HN       | es
     | HN        | es        | es
Spanish (Canary Islands)                          | es_IC       | es
     | IC        | es        | es
Spanish (Mexico)                                  | es_MX       | es
     | MX        | es        | es
Spanish (Nicaragua)                               | es_NI       | es
     | NI        | es        | es
Spanish (Panama)                                  | es_PA       | es
     | PA        | es        | es
Spanish (Peru)                                    | es_PE       | es
     | PE        | es        | es
Spanish (Philippines)                             | es_PH       | es
     | PH        | es        | es
Spanish (Puerto Rico)                             | es_PR       | es
     | PR        | es        | es
Spanish (Paraguay)                                | es_PY       | es
     | PY        | es        | es
Spanish (El Salvador)                             | es_SV       | es
     | SV        | es        | es
Spanish (United States)                           | es_US       | es
     | US        | es        | es
Spanish (Uruguay)                                 | es_UY       | es
     | UY        | es        | es
Spanish (Venezuela)                               | es_VE       | es
     | VE        | es        | es

As an example, "es_VE" is a canonical locale name here.

Note that there are many more collations listed here than Spanish ICU
pg_collation entries that you'll find following initdb (forgetting
about the keyword variant stuff, which is really another issue).

In order to get stable pg_collation names, we should probably use
country code within pg_import_system_collations(), for collations
where there is a country code. We also need to add script (using
uloc_getScript()), so that those collations with a non-default script
(e.g. "Serbian (Latin, Bosnia & Herzegovina)") similarly have their
own pg_collation entries (FYI, "script" isn't broken out into its own
column by my test program). In short, every row that this test program
outputs should have a pg_collation entry after initdb -- the number
should match exactly.

Do we really need to pass "if_not_exists = true", anyway? Why
shouldn't initdb fail if there are apparent duplicate ICU collations?

BTW, I suspect that this is why very old ICU versions appeared to not
accept language tags constructed with subtags produced by
ucol_getKeywordValuesForLocale(), per eccead9's commit message. The
collation's country/script was omitted, so concatenating variants
somehow didn't make sense (wrong script?). It certainly makes no sense
that earlier ICU versions just didn't get a valid language tag when
using the infrastructure that is supposed to be used to build language
tags. It may be necessary to revert eccead9, once this understanding
of the situation is confirmed.

-- 
Peter Geoghegan

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [BUGS] BUG #14779: Can't Backup Database
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values