ICU, locale and collation question

Поиск
Список
Период
Сортировка
От Oscar Carlberg
Тема ICU, locale and collation question
Дата
Msg-id 2a1de395-c69a-3dae-4459-dbdb633df637@fortnox.se
обсуждение исходный текст
Ответы Re: ICU, locale and collation question
Список pgsql-general
Hello,

We have a bunch of existing Postgres 10 clusters running on CentOS 7,
which have been initialized (initdb) with these collation options;

-E 'UTF-8'
--lc-collate=sv_SE.UTF-8
--lc-ctype=sv_SE.UTF-8
--lc_monetary=sv_SE.UTF-8
--lc-numeric=sv_SE.UTF-8
--lc-time=sv_SE.UTF-8
--lc-messages=en_US.UTF-8

And createdb were provided with these locale options when setting up
databases:
--lc-collate=sv_SE.UTF-8
--lc-ctype=sv_SE.UTF-8

\l in psql gives:

          Name         |          Owner           | Encoding | Collate  
|    Ctype    |
----------------------+--------------------------+----------+-------------+-------------+
  test-db              | test-user                | UTF8     |
sv_SE.UTF-8 | sv_SE.UTF-8 |


We're upgrading the servers using logical replication, and would like to
take the opportunity to switch to ICU rather than relying on glibc, to
avoid future problems with index corruption if using physical
replication between servers with different versions of glibc.

We're trying to figure out the most correct way to configure postgres to
achieve this. Currently we have:

-E 'UTF-8'
--locale-provider=icu
--icu-locale=sv-SE-x-icu

And createdb are provided with locale options:
--lc-collate=C
--lc-ctype=C

\l in psql now gives:

         Name         |        Owner        | Encoding | Collate   |   
Ctype    | ICU Locale  | Locale Provider |
---------------------+---------------------+----------+-------------+-------------+-------------+-----------------+
  test-db             | test-user           | UTF8     | C           |
C           | sv-SE-x-icu | icu             |

Is this a safe configuration to avoid index corruption, and other
problems, while still being compatible with the previous locale
settings? We have done some testing and it appears ORDER BY does sort
rows according to Swedish localization in the ICU configured test-db.

We're uncertain since this blogpost ->
https://peter.eisentraut.org/blog/2022/09/26/icu-features-in-postgresql-15
mentions that there are still some postgres code relying on libc locale
facilities. Should we set lc-collate and lc-ctype to sv_SE.UTF-8 when
creating databases in addition to the ICU options provided to initdb due
to this? Will we still be safe from glibc related corruption as long as
--locale-provider=icu --icu-locale=sv-SE-x-icu is set?

Best Regards,

Oscar


--
Innehållet i detta e-postmeddelande är konfidentiellt och avsett endast för
adressaten.Varje spridning, kopiering eller utnyttjande av innehållet är
förbjuden utan tillåtelse av avsändaren. Om detta meddelande av misstag
gått till fel adressat vänligen radera det ursprungliga meddelandet och
underrätta avsändaren via e-post



В списке pgsql-general по дате отправления:

Предыдущее
От: Dilip Kumar
Дата:
Сообщение: Re: "PANIC: could not open critical system index 2662" - twice
Следующее
От: Tom Lane
Дата:
Сообщение: Re: "PANIC: could not open critical system index 2662" - twice