Re: Initcap works differently with different locale providers

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: Initcap works differently with different locale providers
Дата
Msg-id CAPpHfdvsYKYG2PnhxEha=aE+Mtj=+peFbRah_K4A60aYsqFbrg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Initcap works differently with different locale providers  (Alexander Korotkov <aekorotkov@gmail.com>)
Ответы Re: Initcap works differently with different locale providers
Список pgsql-docs
On Mon, Jul 28, 2025 at 1:20 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On 25 Sep 2024, at 18:13, Oleg Tselebrovskiy <o.tselebrovskiy@postgrespro.ru> wrote:
>
> Greetings, everyone!
>
> One of our clients has found a difference in behaviour of initcap function when
> using different locale providers, shown below
>
> postgres=# create database test_db_1 locale_provider=icu locale="ru_RU.UTF-8" template=template0;
> NOTICE:  using standard form "ru-RU" for ICU locale "ru_RU.UTF-8"
> CREATE DATABASE
> postgres=# \c test_db_1;
> You are now connected to database "test_db_1" as user "postgres".
> test_db_1=# select initcap('ЧиЮ А.Ю.');
> initcap
> ----------
> Чию А.ю.
> (1 row)
> test_db_1=# select initcap('joHn d.e.');
> initcap
> -----------
> John D.e.
> (1 row)
> postgres=# create database test_db_2 locale_provider=libc locale="ru_RU.UTF-8" template=template0;
> CREATE DATABASE
> postgres=# \c test_db_2
> You are now connected to database "test_db_2" as user "postgres".
> test_db_2=# select initcap('ЧиЮ А.Ю.');
> initcap
> ----------
> Чию А.Ю.
> (1 row)
> test_db_2=# select initcap('joHn d.e.');
> initcap
> -----------
> John D.E.
> (1 row)
>
> And an easier reproduction (should work for REL_12_STABLE and up)
>
> postgres=# SELECT initcap('first.second' COLLATE "en-x-icu");
> initcap
> --------------
> First.second
> (1 row)
> postgres=# SELECT initcap('first.second' COLLATE "en_US");
> initcap
> --------------
> First.Second
> (1 row)
>
> This behaviour is reproducible on REL_12_STABLE and up to master
>
> I don't believe that this is an erroneous behaviour, just a differing one, hence
> just a documentation change proposition
>
> I suggest adding a clarification that this function works differently with libc
> and ICU providers because there is a difference in what a "word" is between them
>
> In libc a word is a sequence of alphanumeric characters, separated by
> non-alphanumeric characters (as it is written in documentation right now)
> In ICU words are divided according to Unicode® Standard Annex #29 [1]
>
> Similar issue was briefly discussed in [2]
>
> The suggested documentation patch is attached (versions for REL_13_STABLE+ and
> for REL_12_STABLE only)
>
> [1]: https://www.unicode.org/reports/tr29/#Word_Boundaries
> [2]: https://www.postgresql.org/message-id/CAEwbS1R8pwhRkwRo3XsPt24ErBNtFWuReAZhVPJwA3oqo148tA%40mail.gmail.com
>
> Oleg Tselebrovskiy, Postgres Professional<v1-0001-string-functions.patch><v1-0002-string-functions-REL_12.patch>
>
>
> I can confirm inicap works with libc and libicu as you stated.  The documentation patch looks good to me.  I’ve
writtena commit message.  The REL_12_STABLE branch is not relevant anymore as it’s out of support.  I’m going to push
thisif no objections. 

I'm sorry for these many messages.  My email client just gone crazy.
Must be fixed now.

------
Regards,
Alexander Korotkov
Supabase



В списке pgsql-docs по дате отправления: