Обсуждение: BUG #16068: Collate of 'Norwegian Bokmål' is problematic
The following bug has been logged on the website: Bug reference: 16068 Logged by: Robert Ford Email address: robfordww@gmail.com PostgreSQL version: 12.0 Operating system: Windows Description: Hi, I want to point to an issue I discovered when installing v12.0 for windows. The installer sets the Collate in the config file to 'Norwegian Bokmål.1251' (or something similar, but notice the 'å') This seems to trigger all kind of bugs. For instance "select * from pg_settings" results in an utf8 decode error. PgAdmin also returns a lot of utf8 decode errors. The problem seemed to go away when I changed the collation to "nb_NO" and ran initdb. This error should only occur in countries where the installer creates Collation names with non-ascii characters, Norway is one of them.
PG Bug reporting form <noreply@postgresql.org> writes: > I want to point to an issue I discovered when installing v12.0 for windows. Um, Windows-what exactly? > The installer sets the Collate in the config file to 'Norwegian Bokmål.1251' > (or something similar, but notice the 'å') This seems to trigger all kind > of bugs. That collation name has given us trouble before, cf https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=db29620d4 https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=aa1d2fc5e I wonder whether Microsoft changed it again :-( regards, tom lane
[ please keep the list cc'd ] Robert Ford <robfordww@gmail.com> writes: > On Sat, Oct 19, 2019, 22:03 Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> The installer sets the Collate in the config file to 'Norwegian >>> Bokmål.1251' (or something similar, but notice the 'å') >> Um, Windows-what exactly? >> That collation name has given us trouble before, cf >> https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=db29620d4 >> https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=aa1d2fc5e >> I wonder whether Microsoft changed it again :-( > Windows server 2012 Hm, that's not very new, and it's certainly a version we've tested. In fact I'd have guessed the above-mentioned patches were tested against that. Anyway, my first thought about this is that the mapping installed by db29620d4 looks like it will recognize 'Norwegian (Bokmål)' but not 'Norwegian Bokmål'. Could you be more precise about exactly what you're seeing in the config file? regards, tom lane
Sorry about this, but the version was "Windows 2016 standard". I let the installer stay on "Default locale" while installing. This results is a config file with the following values:
# These settings are initialized by initdb, but they can be changed.
lc_messages = 'Norwegian Bokmål_Norway.1252' # locale for system error message
# strings
lc_monetary = 'Norwegian Bokmål_Norway.1252' # locale for monetary formatting
lc_numeric = 'Norwegian Bokmål_Norway.1252' # locale for number formatting
lc_time = 'Norwegian Bokmål_Norway.1252' # locale for time formatting
lc_messages = 'Norwegian Bokmål_Norway.1252' # locale for system error message
# strings
lc_monetary = 'Norwegian Bokmål_Norway.1252' # locale for monetary formatting
lc_numeric = 'Norwegian Bokmål_Norway.1252' # locale for number formatting
lc_time = 'Norwegian Bokmål_Norway.1252' # locale for time formatting
Then:
C:\Program Files\PostgreSQL\12\bin>psql postgres postgres
Password for user postgres:
psql (12.0)
WARNING: Console code page (850) differs from Windows code page (1252)
8-bit characters might not work correctly. See psql reference
page "Notes for Windows users" for details.
Type "help" for help.
postgres=# select * from pg_settings;
ERROR: invalid byte sequence for encoding "UTF8": 0xe5 0x6c 0x5f
Password for user postgres:
psql (12.0)
WARNING: Console code page (850) differs from Windows code page (1252)
8-bit characters might not work correctly. See psql reference
page "Notes for Windows users" for details.
Type "help" for help.
postgres=# select * from pg_settings;
ERROR: invalid byte sequence for encoding "UTF8": 0xe5 0x6c 0x5f
søn. 20. okt. 2019 kl. 23:57 skrev Tom Lane <tgl@sss.pgh.pa.us>:
[ please keep the list cc'd ]
Robert Ford <robfordww@gmail.com> writes:
> On Sat, Oct 19, 2019, 22:03 Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> The installer sets the Collate in the config file to 'Norwegian
>>> Bokmål.1251' (or something similar, but notice the 'å')
>> Um, Windows-what exactly?
>> That collation name has given us trouble before, cf
>> https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=db29620d4
>> https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=aa1d2fc5e
>> I wonder whether Microsoft changed it again :-(
> Windows server 2012
Hm, that's not very new, and it's certainly a version we've tested.
In fact I'd have guessed the above-mentioned patches were tested
against that.
Anyway, my first thought about this is that the mapping installed by
db29620d4 looks like it will recognize 'Norwegian (Bokmål)' but not
'Norwegian Bokmål'. Could you be more precise about exactly what
you're seeing in the config file?
regards, tom lane
Robert Ford <robfordww@gmail.com> writes: > Sorry about this, but the version was *"Windows 2016 standard*". I let the > installer stay on "Default locale" while installing. This results is a > config file with the following values: > # These settings are initialized by initdb, but they can be changed. > lc_messages = 'Norwegian Bokmål_Norway.1252' # locale for system error > message Okay, so we need to translate that string to 'Norwegian_Norway' too. That's an easy fix, but as far as I can tell from the past discussions about this, the bugs it'll fix are distinct from what you're complaining about here: > WARNING: Console code page (850) differs from Windows code page (1252) > 8-bit characters might not work correctly. See psql reference > page "Notes for Windows users" for details. We don't have any support for Windows code page 850. Looking at the wikipedia page about that doesn't make me much inclined to add it either: wikipedia says that (a) it's largely been obsoleted by 1252, and (b) there's confusion about what the code page's contents are, specifically whether it contains a euro sign. So my recommendation here is just to switch your console code page to 1252. > *postgres=# select * from pg_settings;ERROR: invalid byte sequence for > encoding "UTF8": 0xe5 0x6c 0x5f* That hex sequence looks suspiciously like "ål_" in CP1252, so this is an encoding confusion problem. I think it'd go away if you simplified these postgresql.conf entries to 'Norwegian_Norway.1252' and restarted. What I don't remember offhand is where the funny locale name spelling might've propagated besides these entries. regards, tom lane