Re: [HACKERS] Re: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [HACKERS] Re: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?
Дата
Msg-id 23949.1506799694@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] Re: CREATE COLLATION does not sanitize ICU's BCP 47language tags. Should it?  (Noah Misch <noah@leadboat.com>)
Ответы Re: [HACKERS] Re: CREATE COLLATION does not sanitize ICU's BCP 47language tags. Should it?  (Peter Geoghegan <pg@bowt.ie>)
Re: [HACKERS] Re: CREATE COLLATION does not sanitize ICU's BCP 47language tags. Should it?  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Re: [HACKERS] Re: CREATE COLLATION does not sanitize ICU's BCP 47language tags. Should it?  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Список pgsql-hackers
Noah Misch <noah@leadboat.com> writes:
> On Sat, Sep 30, 2017 at 11:25:43AM -0400, Tom Lane wrote:
>> Sure, but dealing with that is mechanical: reindex the necessary indexes
>> and you're done.

> In the general case, one must revalidate CHECK constraints, re-partition
> tables, revalidate range values, and reindex.

True, but that is what it is: nothing we can do is going to affect the
consequences of a collation behavior change, if there is one.  What's more
useful for our immediate purposes is to ask whether we can reliably detect
a collation behavior change.  False negatives are bad, but so are false
positives, because those would force DBAs to jump through lots of hoops
unnecessarily.

So: are canonicalized locale descriptions any better or worse by that
metric than non-canonicalized descriptions?  In principle I think a
canonicalized description might be more likely to be recognized as
the "same" locale by another ICU version than one that isn't, but
I don't know whether there's any meaningful difference in practice.

Another point here is whether, even if a new ICU version recognizes
a locale description as being "the same" interpretation that an old
ICU version used, will it report the same collation version?  Limited
experimentation suggests that the collversions we're actually getting
out of ICU depend on little other than the libicu version.  "select
distinct collversion from pg_collation where collversion is not null"
produces this on ICU 4.2.1:
49.192.5.4149.192.0.41

and this on 52.1:
58.0.6.5058.0.0.50

and this on 57.1:
153.64.29153.64

This suggests to me that arguing about canonicalization is moot so
far as avoiding reindexing goes: if you change ICU library versions,
you're screwed and will have to jump through all the reindexing hoops,
no matter what we do or don't do.  (Maybe we are looking at the wrong
information to populate collversion?)

Now, it may still be worthwhile to argue about whether canonicalization
will help the other component of the problem, which is whether you can
dump and reload CREATE COLLATION commands into a new ICU version and
expect to get more-or-less-the-same behavior as before.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: [HACKERS] 10RC1 crash testing MultiXact oddity
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: [HACKERS] Re: CREATE COLLATION does not sanitize ICU's BCP 47language tags. Should it?