Re: ICU for global collation

Поиск
Список
Период
Сортировка
От Peter Eisentraut
Тема Re: ICU for global collation
Дата
Msg-id 07878ad1-d94d-5a92-565f-c0dfdea8b61b@enterprisedb.com
обсуждение исходный текст
Ответ на Re: ICU for global collation  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 15.03.22 18:28, Robert Haas wrote:
> On Tue, Mar 15, 2022 at 12:58 PM Peter Eisentraut
> <peter.eisentraut@enterprisedb.com> wrote:
>> On 14.03.22 19:57, Robert Haas wrote:
>>> 1. What will happen if I set the ICU collation to something that
>>> doesn't match the libc collation? How bad are the consequences?
>>
>> These are unrelated, so there are no consequences.
> 
> Can you please elaborate on this?

The code that is aware of ICU generally works like this:

if (locale_provider == ICU)
   result = call ICU code
else
   result = call libc code
return result

However, there is code out there, both within PostgreSQL itself and in 
extensions, that does not do that yet.  Ideally, we would eventually 
change all that over, but it's not happening now.  So we ought to 
preserve the ability to set the libc to keep that legacy code working 
for now.

This legacy code by definition doesn't know about ICU, so it doesn't 
care whether the ICU setting "matches" the libc setting or anything like 
that.  It will just do its thing depending on its own setting.

The only consequence of settings that don't match is that the different 
pieces of code behave semantically inconsistently (e.g., some routine 
thinks the data is Greek and other code thinks the data is French).  But 
that's up to the user to set correctly.  And the actual scenarios where 
you can actually do anything semantically relevant this way are very 
limited.

A second point is that the LC_CTYPE setting tells other parts of libc 
what the current encoding is.  This affects gettext for example.  So you 
need to set this to something sensible even if you don't use libc locale 
routines otherwise.

>>> 2. If I want to avoid a mismatch between the two, then I will need a
>>> way to figure out which libc collation corresponds to a given ICU
>>> collation. How do I do that?
>>
>> You can specify the same name for both.
> 
> Hmm. If every name were valid in both systems, I don't think you'd be
> proposing two fields.

Earlier versions of this patch and predecessor patches indeed had common 
fields.  But in fact the two systems accept different values if you want 
to delve into the advanced features.  But for basic usage something like 
"en_US" will work for both.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Corruption during WAL replay
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: pg_walinspect - a new extension to get raw WAL data and WAL stats