Обсуждение: Importing a Windows database (in en_GB.CP1252) to linux

Поиск

Список

Период

Сортировка

Importing a Windows database (in en_GB.CP1252) to linux

От

Jean-Christophe BOGGIO

Дата:

01 декабря, 16:37:43

Hello,

I have a (custom) backup created on a Windows machine in en_GB.CP1252 encoding. And of course, some characters can't be imported because they don't exist in UTF-8.

So I created a new cluster on PG18 port 5433 initialized in WIN1252 encoding:

$ pg_restore -p 5433 -t csakafl -d imlocal imlocal20251127.backup
pg_restore: error: COPY failed for table "csakafl": ERROR: invalid byte sequence for encoding "UTF8": 0x92
CONTEXT: COPY csakafl, line 298

So pg_restore still thinks I want to use UTF8.

So I created an en_GB.CP1252 locales definitions folder like this:

mkdir ~/locales
cd ~/locales
cp cp /usr/share/i18n/charmaps/CP1252.gz .
gunzip CP1252.gz
localedef -cv -i en_GB -f CP1252 ./en_GB.CP1252

Then I now want to use this to restore my database so I tried setting:

setenv LOCPATH ~/locales
setenv LC_ALL en_GB.CP1252
setenv LANG en_GB.CP1252

And even forcing LC_ALL when launching pg_restore:

$ LC_ALL=en_GB.CP1252 pg_restore -p 5433 -t csakafl -d imlocal imlocal20251127.backup
pg_restore: error: COPY failed for table "csakafl": ERROR: invalid byte sequence for encoding "UTF8": 0x92
CONTEXT: COPY csakafl, line 298

What am I doing wrong?

Thanks for your help,

Re: Importing a Windows database (in en_GB.CP1252) to linux

От

Laurenz Albe

Дата:

01 декабря, 18:07:30

On Mon, 2025-12-01 at 14:37 +0100, Jean-Christophe BOGGIO wrote:
> I have a (custom) backup created on a Windows machine in en_GB.CP1252 encoding.
> And of course, some characters can't be imported because they don't exist in UTF-8.

Hm?  Which character can be encoded in WINDOWS-1252, but not in UTF-8?
I don't think that can be the problem.

> So I created a new cluster on PG18 port 5433 initialized in WIN1252 encoding:
>
> $ \l imlocal
>                                             List of databases
>   Name   | Owner | Encoding | Locale Provider | Collate | Ctype | Locale | ICU Rules | Access privileges  
> ---------+-------+----------+-----------------+---------+-------+--------+-----------+-------------------
>  imlocal | cat   | WIN1252  | libc            | C       | C     | ∅    | ∅       | ∅
>  (1 row)
>
>   I am now trying to import the data in that database but I keep getting this error:
>
> $ pg_restore -p 5433 -t csakafl -d imlocal imlocal20251127.backup
>  pg_restore: error: COPY failed for table "csakafl": ERROR:  invalid byte sequence for encoding "UTF8": 0x92
>  CONTEXT:  COPY csakafl, line 298
>
>  So pg_restore still thinks I want to use UTF8.

That looks like pg_restore sets a wrong client_encoding, which is weird.

What do you get for

  pg_restore -p 5433 -t csakafl -s -f - imlocal20251127.backup | grep client_encoding

If the dump was taken from a WINDOWS-1252 encoded database, that line should
read

  SET client_encoding = 'WIN1252';

and everything should work fine.  But apparently, the client_encoding is set to
UTF-8 in your case.

How did that happen?  How exactly did you take that dump?
Did you do anything (like an encoding conversion) with the dump after you took it?

Yours,
Laurenz Albe

Re: Importing a Windows database (in en_GB.CP1252) to linux

От

Jean-Christophe BOGGIO

Дата:

02 декабря, 12:39:04

Laurenz,

That looks like pg_restore sets a wrong client_encoding, which is weird.

What do you get for
  pg_restore -p 5433 -t csakafl -s -f - imlocal20251127.backup | grep client_encoding

SET client_encoding = 'UTF8';

If the dump was taken from a WINDOWS-1252 encoded database, that line should
read
  SET client_encoding = 'WIN1252';

and everything should work fine.  But apparently, the client_encoding is set to
UTF-8 in your case.

Yes, exactly.

How did that happen?  How exactly did you take that dump?

This backup is a transfer from an iSeries DB2 database. It has been a nightmare to get this working (and took around 10 days to finalize). We set up a FDW Server using odbc_fdw, recreated all the tables (around 2k) and INSERTed the DB2 data to the PG tables.

Then we used PgAdmin that came with PostgreSQL 17 on the Windows machine.

I double-checked with the client: the database is in en_GB.CP1252.

BTW, if I try to import the same backup on a UTF8 database I get the exact same error (invalid byte sequence 0x92)

Did you do anything (like an encoding conversion) with the dump after you took it?

No, the backup is in custom format so I can't touch it (or at least I don't know how I could).

Where can I go from here?

Re: Importing a Windows database (in en_GB.CP1252) to linux

От

Laurenz Albe

Дата:

02 декабря, 14:26:54

On Tue, 2025-12-02 at 10:39 +0100, Jean-Christophe BOGGIO wrote:
> > That looks like pg_restore sets a wrong client_encoding, which is weird.
> >
> > What do you get for
> >
> >   pg_restore -p 5433 -t csakafl -s -f - imlocal20251127.backup | grep client_encoding
>
> SET client_encoding = 'UTF8';
>
>
> > How did that happen?  How exactly did you take that dump?
>
> This backup is a transfer from an iSeries DB2 database. It has been a nightmare to
> get this working (and took around 10 days to finalize). We set up a FDW Server using
> odbc_fdw, recreated all the tables (around 2k) and INSERTed the DB2 data to the PG tables.
>
> Then we used PgAdmin that came with PostgreSQL 17 on the Windows machine.
>
> I double-checked with the client: the database is in en_GB.CP1252.

The DB2 database or the PostgreSQL database?

It must be the DB2 database, because otherwise the dump would contain

  SET client_encoding = 'WIN1252';

That is, unless you created the dump with

  pg_dump --encoding=UTF8

But then, the dump couldn't contain non-UTF-8 characters.

Having used pgAdmin, you probably don't know the pg_dump command line that was used.


My best guess is that odbc_fdw has a bug that does not check if the strings are
properly encoded, and you somehow got corrupted data in your PostgreSQL database.
But I am not sure.

> > Did you do anything (like an encoding conversion) with the dump after you took it?
>
> No, the backup is in custom format so I can't touch it (or at least I don't know how I could).
>
> Where can I go from here?

You can try the following:

- convert the custom format dump into an SQL script with

  pg_restore -f script.sql imlocal20251127.backup

- edit script.sql and change the line to read

  SET client_encoding = 'WIN1252';

- restore that dump with "psql":

  psql -f script.sql -d newdb

That should work if *all* the strings are in WINDOWS-1252 encoding.

Yours,
Laurenz Albe

Re: Importing a Windows database (in en_GB.CP1252) to linux

От

Jean-Christophe BOGGIO

Дата:

02 декабря, 16:15:15

Laurenz,

Thanks a lot for your quick reply.

Le 02/12/2025 à 12:26, Laurenz Albe a écrit :
>> I double-checked with the client: the database is in en_GB.CP1252.
> The DB2 database or the PostgreSQL database?
>
> It must be the DB2 database, because otherwise the dump would contain

No: the PostgreSQL one. I guess the DB2 is encoded (internally) in 
EBCDIC... Yes... At least that's what CSV exports directly made from the 
DB shows.

> My best guess is that odbc_fdw has a bug that does not check if the strings are
> properly encoded, and you somehow got corrupted data in your PostgreSQL database.
> But I am not sure.

I don't know really but I could have the client do some tests on tables 
that we know contain invalid byte sequences. Maybe I could try doing 
manual pg_dumps with different -E parameter to verify that it's not 
something enforced by pgAdmin.

> You can try the following:
>
> - convert the custom format dump into an SQL script with
>
>    pg_restore -f script.sql imlocal20251127.backup
>
> - edit script.sql and change the line to read
>
>    SET client_encoding = 'WIN1252';
>
> - restore that dump with "psql":
>
>    psql -f script.sql -d newdb
>
> That should work if *all* the strings are in WINDOWS-1252 encoding.

Thanks a lot for the proposal. It seems to be doing the job!

It's running right now and so far no "bad encoding" error appeared (it's 
a 311Gb backup so it will take some time). I'll let you know.

Gratefully,

JC

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Importing a Windows database (in en_GB.CP1252) to linux

Importing a Windows database (in en_GB.CP1252) to linux

Re: Importing a Windows database (in en_GB.CP1252) to linux

Re: Importing a Windows database (in en_GB.CP1252) to linux

Re: Importing a Windows database (in en_GB.CP1252) to linux

Re: Importing a Windows database (in en_GB.CP1252) to linux