Re: Windows default locale vs initdb

Поиск
Список
Период
Сортировка
От Juan José Santamaría Flecha
Тема Re: Windows default locale vs initdb
Дата
Msg-id CAC+AXB10p+mnJ6wrAEm6jb51+8=BfYzD=w6ftHRbMjMuSFN3kQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Windows default locale vs initdb  (Thomas Munro <thomas.munro@gmail.com>)
Ответы Re: Windows default locale vs initdb  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-hackers

On Wed, Jul 20, 2022 at 1:44 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Wed, Jul 20, 2022 at 10:27 PM Juan José Santamaría Flecha
<juanjo.santamaria@gmail.com> wrote:
> Still, WIN1252 is not the wrong answer for what we are asking. Even if you enable UTF-8 support [1], the system will use the current default Windows ANSI code page (ACP) for the locale and UTF-8 for the code page.

I'm still confused about what that means.  Suppose we decided to
insist by adding a ".UTF-8" suffix to the name, as that page says we
can now that we're on Windows 10+, when building the default locale
name (see experimental 0002 patch, attached).  It initially seemed to
have the right effect:

The database cluster will be initialized with locale "en-US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Let me try to explain this using the "Beta: Use Unicode UTF-8 for worldwide language support" option [1]. 

- Currently in a system with the language settings of "English_United States" and that option disabled, when executing initdb you get:

The database cluster will be initialized with locale "English_United States.1252".
The default database encoding has accordingly been set to "WIN1252".
The default text search configuration will be set to "english".

And as a test for psql:

SET lc_time='tr_tr.utf8';
SET
SELECT to_char('2000-2-01'::date, 'tmmonth');
ERROR:  character with byte sequence 0xc5 0x9f in encoding "UTF8" has no equivalent in encoding "WIN1252"

We get this error even if the database encoding is UTF8, and is caused by the tr_tr locales being encoded in WIN1254. We can discuss this in another thread, and I can propose a patch.

- If we enable the UTF-8 support option, then the same test goes as:

The database cluster will be initialized with locale "English_United States.utf8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

And for psql:

SET lc_time='tr_tr.utf8';
SET
SELECT to_char('2000-2-01'::date, 'tmmonth');
 to_char
---------
 şubat
(1 row)

In this case the Windows locales are actually UTF8 encoded.

TL;DR; What I want to show through this example is that Windows ACP is not modified by setlocale(), it can only be done through the Windows registry and only in recent releases.
 
But then the Turkish i test in contrib/citext/sql/citext_utf8.sql failed[1]:

SELECT 'i'::citext = 'İ'::citext AS t;
 t
 ---
- t
+ f
 (1 row)

This is current state of affairs:

- Windows:

SELECT U&'\0131' latin_small_dotless,U&'\0069' latin_small
,U&'\0049' latin_capital, lower(U&'\0049')
,U&'\0130' latin_capital_dotted, lower(U&'\0130');
 latin_small_dotless | latin_small | latin_capital | lower | latin_capital_dotted | lower
---------------------+-------------+---------------+-------+----------------------+-------
 ı                   | i           | I             | i     | İ                    | İ

- Linux:

SELECT U&'\0131' latin_small_dotless,U&'\0069' latin_small
,U&'\0049' latin_capital, lower(U&'\0049')
,U&'\0130' latin_capital_dotted, lower(U&'\0130');
 latin_small_dotless | latin_small | latin_capital | lower | latin_capital_dotted | lower
---------------------+-------------+---------------+-------+----------------------+-------
 ı                   | i           | I             | i     | İ                    | i

Latin_capital_dotted doesn't have the same lower value.
 

Regards,

Juan José Santamaría Flecha

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Aleksander Alekseev
Дата:
Сообщение: Re: Pluggable toaster
Следующее
От: Robert Haas
Дата:
Сообщение: Re: explain analyze rows=%.0f