Обсуждение: BUG #16068: Collate of 'Norwegian Bokmål' is problematic

Поиск
Список
Период
Сортировка

BUG #16068: Collate of 'Norwegian Bokmål' is problematic

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      16068
Logged by:          Robert Ford
Email address:      robfordww@gmail.com
PostgreSQL version: 12.0
Operating system:   Windows
Description:

Hi,

I want to point to an issue I discovered when installing v12.0 for windows.
The installer sets the Collate in the config file to 'Norwegian Bokmål.1251'
(or something similar, but notice the 'å')  This seems to trigger all kind
of bugs. For instance "select * from pg_settings" results in an utf8 decode
error.  PgAdmin also returns a lot of utf8 decode errors.  The problem
seemed to go away when I changed the collation to "nb_NO" and ran initdb.
This error should only occur in countries where the installer creates
Collation names with non-ascii characters, Norway is one of them.


Re: BUG #16068: Collate of 'Norwegian Bokmål' is problematic

От
Tom Lane
Дата:
PG Bug reporting form <noreply@postgresql.org> writes:
> I want to point to an issue I discovered when installing v12.0 for windows.

Um, Windows-what exactly?

> The installer sets the Collate in the config file to 'Norwegian Bokmål.1251'
> (or something similar, but notice the 'å')  This seems to trigger all kind
> of bugs.

That collation name has given us trouble before, cf

https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=db29620d4

https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=aa1d2fc5e

I wonder whether Microsoft changed it again :-(

            regards, tom lane



Re: Re: BUG #16068: Collate of 'Norwegian Bokmål' is problematic

От
Tom Lane
Дата:
[ please keep the list cc'd ]

Robert Ford <robfordww@gmail.com> writes:
> On Sat, Oct 19, 2019, 22:03 Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> The installer sets the Collate in the config file to 'Norwegian
>>> Bokmål.1251' (or something similar, but notice the 'å')

>> Um, Windows-what exactly?
>> That collation name has given us trouble before, cf
>> https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=db29620d4
>> https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=aa1d2fc5e
>> I wonder whether Microsoft changed it again :-(

> Windows server 2012

Hm, that's not very new, and it's certainly a version we've tested.
In fact I'd have guessed the above-mentioned patches were tested
against that.

Anyway, my first thought about this is that the mapping installed by
db29620d4 looks like it will recognize 'Norwegian (Bokmål)' but not
'Norwegian Bokmål'.  Could you be more precise about exactly what
you're seeing in the config file?

            regards, tom lane



Re: Re: BUG #16068: Collate of 'Norwegian Bokmål' is problematic

От
Robert Ford
Дата:
Sorry about this, but the version was "Windows 2016 standard".  I let the installer stay on "Default locale" while installing.  This results is a config file with the following values:

# These settings are initialized by initdb, but they can be changed.
lc_messages = 'Norwegian Bokmål_Norway.1252' # locale for system error message
# strings
lc_monetary = 'Norwegian Bokmål_Norway.1252' # locale for monetary formatting
lc_numeric = 'Norwegian Bokmål_Norway.1252' # locale for number formatting
lc_time = 'Norwegian Bokmål_Norway.1252' # locale for time formatting

Then:

C:\Program Files\PostgreSQL\12\bin>psql postgres postgres
Password for user postgres:
psql (12.0)
WARNING: Console code page (850) differs from Windows code page (1252)
         8-bit characters might not work correctly. See psql reference
         page "Notes for Windows users" for details.
Type "help" for help.

postgres=# select * from pg_settings;
ERROR:  invalid byte sequence for encoding "UTF8": 0xe5 0x6c 0x5f


søn. 20. okt. 2019 kl. 23:57 skrev Tom Lane <tgl@sss.pgh.pa.us>:
[ please keep the list cc'd ]

Robert Ford <robfordww@gmail.com> writes:
> On Sat, Oct 19, 2019, 22:03 Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> The installer sets the Collate in the config file to 'Norwegian
>>> Bokmål.1251' (or something similar, but notice the 'å')

>> Um, Windows-what exactly?
>> That collation name has given us trouble before, cf
>> https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=db29620d4
>> https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=aa1d2fc5e
>> I wonder whether Microsoft changed it again :-(

> Windows server 2012

Hm, that's not very new, and it's certainly a version we've tested.
In fact I'd have guessed the above-mentioned patches were tested
against that.

Anyway, my first thought about this is that the mapping installed by
db29620d4 looks like it will recognize 'Norwegian (Bokmål)' but not
'Norwegian Bokmål'.  Could you be more precise about exactly what
you're seeing in the config file?

                        regards, tom lane

Re: Re: Re: BUG #16068: Collate of 'Norwegian Bokmål' is problematic

От
Tom Lane
Дата:
Robert Ford <robfordww@gmail.com> writes:
> Sorry about this, but the version was *"Windows 2016 standard*".  I let the
> installer stay on "Default locale" while installing.  This results is a
> config file with the following values:

> # These settings are initialized by initdb, but they can be changed.
> lc_messages = 'Norwegian Bokmål_Norway.1252' # locale for system error
> message

Okay, so we need to translate that string to 'Norwegian_Norway' too.
That's an easy fix, but as far as I can tell from the past discussions
about this, the bugs it'll fix are distinct from what you're complaining
about here:

> WARNING: Console code page (850) differs from Windows code page (1252)
>          8-bit characters might not work correctly. See psql reference
>          page "Notes for Windows users" for details.

We don't have any support for Windows code page 850.  Looking at the
wikipedia page about that doesn't make me much inclined to add it
either: wikipedia says that (a) it's largely been obsoleted by 1252,
and (b) there's confusion about what the code page's contents are,
specifically whether it contains a euro sign.  So my recommendation
here is just to switch your console code page to 1252.

> *postgres=# select * from pg_settings;ERROR:  invalid byte sequence for
> encoding "UTF8": 0xe5 0x6c 0x5f*

That hex sequence looks suspiciously like "ål_" in CP1252, so this is an
encoding confusion problem.  I think it'd go away if you simplified
these postgresql.conf entries to 'Norwegian_Norway.1252' and restarted.
What I don't remember offhand is where the funny locale name spelling
might've propagated besides these entries.

            regards, tom lane