Обсуждение: Invalid byte sequence errors on DB restore

Поиск
Список
Период
Сортировка

Invalid byte sequence errors on DB restore

От
Samuel Smith
Дата:
My current DB backup routine is just to call pg_dump and pipe to gzip. 
We recently started to get a failure to restore (which is basically just 
using psql -f on the pg_dump file) with the following errors:

invalid byte sequence for encoding "UTF8": 0xa0
  and
invalid byte sequence for encoding "UTF8": 0xd7 0x20


This is on a pg 9.2.24 instance. Any tips to troubleshoot?

Regards,
Samuel Smith



Re: Invalid byte sequence errors on DB restore

От
Laurenz Albe
Дата:
On Sun, 2020-03-15 at 23:18 -0500, Samuel Smith wrote:
> My current DB backup routine is just to call pg_dump and pipe to gzip. 
> We recently started to get a failure to restore (which is basically just 
> using psql -f on the pg_dump file) with the following errors:
> 
> invalid byte sequence for encoding "UTF8": 0xa0
>   and
> invalid byte sequence for encoding "UTF8": 0xd7 0x20
> 
> 
> This is on a pg 9.2.24 instance. Any tips to troubleshoot?

Simple.  Fix the offending string and upgrade.

Failure to enforce correct encoding is a bug in PostgreSQL, and a
number of such bugs have been fixed over the decades, so you might
be happier with a less ancient version.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: Invalid byte sequence errors on DB restore

От
Adrian Klaver
Дата:
On 3/15/20 9:18 PM, Samuel Smith wrote:
> My current DB backup routine is just to call pg_dump and pipe to gzip. 
> We recently started to get a failure to restore (which is basically just 
> using psql -f on the pg_dump file) with the following errors:
> 
> invalid byte sequence for encoding "UTF8": 0xa0
>   and
> invalid byte sequence for encoding "UTF8": 0xd7 0x20
> 
> 
> This is on a pg 9.2.24 instance. Any tips to troubleshoot?

What are the locale and encodings set to for the instance and databases 
in it?

> 
> Regards,
> Samuel Smith
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: Invalid byte sequence errors on DB restore

От
Samuel Smith
Дата:
On 3/16/20 2:49 AM, Laurenz Albe wrote:
> On Sun, 2020-03-15 at 23:18 -0500, Samuel Smith wrote:
>> My current DB backup routine is just to call pg_dump and pipe to gzip.
>> We recently started to get a failure to restore (which is basically just
>> using psql -f on the pg_dump file) with the following errors:
>>
>> invalid byte sequence for encoding "UTF8": 0xa0
>>    and
>> invalid byte sequence for encoding "UTF8": 0xd7 0x20
>>
>>
>> This is on a pg 9.2.24 instance. Any tips to troubleshoot?
> 
> Simple.  Fix the offending string and upgrade.
> 
> Failure to enforce correct encoding is a bug in PostgreSQL, and a
> number of such bugs have been fixed over the decades, so you might
> be happier with a less ancient version.
> 
> Yours,
> Laurenz Albe
> 



Yes I thought I had seen some info on bug fixes in newer versions. I 
will pass this on to the system admins/stakeholders.

Thanks,
Samuel Smith



Re: Invalid byte sequence errors on DB restore

От
Samuel Smith
Дата:
On 3/16/20 9:33 AM, Adrian Klaver wrote:
> On 3/15/20 9:18 PM, Samuel Smith wrote:
>> My current DB backup routine is just to call pg_dump and pipe to gzip. 
>> We recently started to get a failure to restore (which is basically 
>> just using psql -f on the pg_dump file) with the following errors:
>>
>> invalid byte sequence for encoding "UTF8": 0xa0
>>   and
>> invalid byte sequence for encoding "UTF8": 0xd7 0x20
>>
>>
>> This is on a pg 9.2.24 instance. Any tips to troubleshoot?
> 
> What are the locale and encodings set to for the instance and databases 
> in it?
> 
>>
>> Regards,
>> Samuel Smith
>>
>>
> 
> 


The server is in UTF8. The file made with pg_dump used 'SQL_ASCII', but 
setting it to UTF8 (via SET client_encoding ) did not help either. 
Having the pg_dump encoding set to 'latin1' seems to allow the file 
created it by it to be loaded via psql -f and everything seems to work. 
Is there any bad side to setting the encoding on pg_dump to latin1?


For the record, the problem characters are:
https://www.htmlsymbols.xyz/unicode/U+00D7
and
https://www.htmlsymbols.xyz/unicode/U+00A0

But those characters were in many places and not all were issues. They 
only fail depending on the characters that precede it which makes it 
complicated.

Thanks,
Samuel Smith




Re: Invalid byte sequence errors on DB restore

От
Samuel Smith
Дата:
On 3/15/20 11:18 PM, Samuel Smith wrote:
> My current DB backup routine is just to call pg_dump and pipe to gzip. 
> We recently started to get a failure to restore (which is basically just 
> using psql -f on the pg_dump file) with the following errors:
> 
> invalid byte sequence for encoding "UTF8": 0xa0
>   and
> invalid byte sequence for encoding "UTF8": 0xd7 0x20
> 
> 
> This is on a pg 9.2.24 instance. Any tips to troubleshoot?
> 
> Regards,
> Samuel Smith
> 
> 

Our issue actually turned out to be that a couple of our production 
database had the encoding set to SQL_ASCII while all of our development 
servers had UTF-8. This meant in some cases where we would restore a 
production database into development (for testing or bug hunting), there 
would be a failure to parse the backup file. A similar issue to this 
blog post: 
https://www.endpoint.com/blog/2017/07/21/postgres-migrating-sqlascii-to-utf-8

So our fix was to dump the affected production databases using the 
LATIN1 encoding for pg_dump followed by destroying and recreating the 
database and setting its new encoding to UTF-8. Then we could restore 
the data using the pg_dump file with LATIN1 encoding.

Regards,
Samuel Smith