Re: [HACKERS] ICU collation variant keywords and pg_collation entries(Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: [HACKERS] ICU collation variant keywords and pg_collation entries(Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)
Дата
Msg-id CAH2-Wzm22vtxvD-e1oz90DE8Z_M61_8amHsDOZf1PWRKfRmj1g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] ICU collation variant keywords and pg_collation entries (Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values)  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [HACKERS] ICU collation variant keywords and pg_collation entries(Was: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_memvalues)  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Список pgsql-hackers
On Mon, Aug 7, 2017 at 3:23 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The thing that I'm particularly thinking about is that if someone wants
> an ICU variant collation that we didn't make initdb provide, they'll do
> a CREATE COLLATION and go use it.  At update time, pg_dump or pg_upgrade
> will export/import that via CREATE COLLATION, and the only way it fails
> is if ICU rejects the collation name as garbage.  (Which, as we already
> established upthread, it's quite unlikely to do.)

Actually, it's *impossible* for ICU to fail to accept any string as a
valid locale within CREATE COLLATION, because CollationCreate() simply
doesn't sanitize ICU names. It doesn't do something like call
get_icu_language_tag(), unlike initdb (within
pg_import_system_collations()).

If I add such a test to CollationCreate(), it does a reasonable job of
sanitizing, while preserving the spirit of the BCP 47 language tag
format by not assuming that the user didn't specify a brand new locale
that it hasn't heard of. All of these are accepted with unmodified
master:

postgres=# CREATE COLLATION test1 (provider = icu, locale = 'en-x-icu');
CREATE COLLATION
postgres=# CREATE COLLATION test2 (provider = icu, locale = 'foo bar baz');
ERROR:  XX000: could not convert locale name "foo bar baz" to language
tag: U_ILLEGAL_ARGUMENT_ERROR
LOCATION:  get_icu_language_tag, collationcmds.c:454
postgres=# CREATE COLLATION test3 (provider = icu, locale = 'en-gb-icu');
ERROR:  XX000: could not convert locale name "en-gb-icu" to language
tag: U_ILLEGAL_ARGUMENT_ERROR
LOCATION:  get_icu_language_tag, collationcmds.c:454
postgres=# CREATE COLLATION test4 (provider = icu, locale = 'not-a-country');
CREATE COLLATION

If it's mandatory for get_icu_language_tag() to not throw an error
during initdb import when passed strings like these (that are
generated mechanically), why should we not do the same with CREATE
COLLATION? While the choice to preserve BCP 47's tolerance of missing
collations is debatable, not doing at least this much up-front is a
bug IMV.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: [HACKERS] Subscription code improvements
Следующее
От: Noah Misch
Дата:
Сообщение: Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash