Re: [HACKERS] Re: CREATE COLLATION does not sanitize ICU's BCP 47language tags. Should it?

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: [HACKERS] Re: CREATE COLLATION does not sanitize ICU's BCP 47language tags. Should it?
Дата
Msg-id CAH2-Wz=VbytqGTYfuT_qsN9U+e=rMw4UirOdMGDYt_jtS7kbpg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Re: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Sat, Sep 30, 2017 at 12:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> This suggests to me that arguing about canonicalization is moot so
> far as avoiding reindexing goes: if you change ICU library versions,
> you're screwed and will have to jump through all the reindexing hoops,
> no matter what we do or don't do.  (Maybe we are looking at the wrong
> information to populate collversion?)

Just to be clear, I was never arguing that canonicalization would
avoid reindexing, and I didn't think anyone else was. I believe that
the version string will change when the behavior of its collator
changes for any reason and in any way. This includes changes to how
binary sort keys are generated.

(We don't currently store binary keys on disk, so a change to that
representation doesn't really necessitate a REINDEX for us. The UCA
spec explicitly decouples compatibility for indexing with binary keys
from changes needed due to human requirements. ICU binary sort keys
are compressed, and this is sometimes improved, independently of the
evolution of how natural language experts say text should be sorted.)

> Now, it may still be worthwhile to argue about whether canonicalization
> will help the other component of the problem, which is whether you can
> dump and reload CREATE COLLATION commands into a new ICU version and
> expect to get more-or-less-the-same behavior as before.

Again, to be clear, that's what I was arguing all along. I think that
users will get very good results with this approach. When the sorting
behavior of a locale is refined by natural language experts, it's
almost certainly a very small change that most users that are affected
won't actually notice. For example, when en-us-x-icu is changed,
that's actually due to a general change in the inherited root collator
that doesn't really affect English speakers. There is no practical
question about how you're supposed to sort English text.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] Re: CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: [HACKERS] alter server for foreign table