Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47language tags. Should it?

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47language tags. Should it?
Дата
Msg-id CAH2-WznUjv8F0_0D-DMzYOAX8q2CdvEQKZ9PQEhcmnJ8JvHSxQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47language tags. Should it?  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Ответы Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47language tags. Should it?  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers
On Mon, Sep 25, 2017 at 11:40 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 9/22/17 16:46, Peter Geoghegan wrote:
>> But you are *already* canonicalizing ICU collation names as BCP 47. My
>> point here is: Why not finish the job off, and *also* canonicalize
>> colcollate in the same way? This won't break ucol_open() if we take
>> appropriate precautions when we go to use the Postgres collation/ICU
>> locale.
>
> Reading over this code again, it is admittedly not quite clear why this
> "canonicalization" is in there right now.  I think it had something to
> do with how we built the keyword variants at one point.  It might not
> make sense.  I'd be glad to take that out and use the result straight
> from uloc_getAvailable() for collcollate.  That is, after all, the
> "canonical" version that ICU chooses to report to us.

But then our users categorically have to know about both formats,
without any practical benefit to make up for it. You will also get
people that don't realize that only one format is supported on some
versions if go this way.

>> One concern that makes me suggest this is: What happens when
>> the user *downgrades* ICU version, from a version where colcollate is
>> BCP 47 to one where it would not have been at initdb time? That will
>> break the downgrade in an unpleasant way, including in installations
>> that never do a CREATE COLLATION themselves. We want to be able to
>> restore a basebackup on a somewhat different OS, and have that still
>> work following REINDEX. At least, that seems like it should be an
>> important goal for us.
>
> This is an interesting point, and my proposal above would fix that.

I've already written a patch to standardize collcollate. If we do the
way you describe above instead, then what happens when ICU finally
removes the already deprecated legacy format?

> However, I think that taking a PostgreSQL data directory and moving or
> copying it to an *older* OS installation is always going to have a
> potential for problems.  So I wouldn't spend a huge amount of effort
> just to fix this specific case.

The downgrade thing is just the simplest, most immediate example of
where failing to standardize collcollate now could cause problems.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: [HACKERS] Built-in plugin for logical decoding output
Следующее
От: Christopher Browne
Дата:
Сообщение: Re: [HACKERS] Built-in plugin for logical decoding output