Обсуждение: Invalid byte sequence errors on DB restore
My current DB backup routine is just to call pg_dump and pipe to gzip. We recently started to get a failure to restore (which is basically just using psql -f on the pg_dump file) with the following errors: invalid byte sequence for encoding "UTF8": 0xa0 and invalid byte sequence for encoding "UTF8": 0xd7 0x20 This is on a pg 9.2.24 instance. Any tips to troubleshoot? Regards, Samuel Smith
On Sun, 2020-03-15 at 23:18 -0500, Samuel Smith wrote: > My current DB backup routine is just to call pg_dump and pipe to gzip. > We recently started to get a failure to restore (which is basically just > using psql -f on the pg_dump file) with the following errors: > > invalid byte sequence for encoding "UTF8": 0xa0 > and > invalid byte sequence for encoding "UTF8": 0xd7 0x20 > > > This is on a pg 9.2.24 instance. Any tips to troubleshoot? Simple. Fix the offending string and upgrade. Failure to enforce correct encoding is a bug in PostgreSQL, and a number of such bugs have been fixed over the decades, so you might be happier with a less ancient version. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
On 3/15/20 9:18 PM, Samuel Smith wrote: > My current DB backup routine is just to call pg_dump and pipe to gzip. > We recently started to get a failure to restore (which is basically just > using psql -f on the pg_dump file) with the following errors: > > invalid byte sequence for encoding "UTF8": 0xa0 > and > invalid byte sequence for encoding "UTF8": 0xd7 0x20 > > > This is on a pg 9.2.24 instance. Any tips to troubleshoot? What are the locale and encodings set to for the instance and databases in it? > > Regards, > Samuel Smith > > -- Adrian Klaver adrian.klaver@aklaver.com
On 3/16/20 2:49 AM, Laurenz Albe wrote: > On Sun, 2020-03-15 at 23:18 -0500, Samuel Smith wrote: >> My current DB backup routine is just to call pg_dump and pipe to gzip. >> We recently started to get a failure to restore (which is basically just >> using psql -f on the pg_dump file) with the following errors: >> >> invalid byte sequence for encoding "UTF8": 0xa0 >> and >> invalid byte sequence for encoding "UTF8": 0xd7 0x20 >> >> >> This is on a pg 9.2.24 instance. Any tips to troubleshoot? > > Simple. Fix the offending string and upgrade. > > Failure to enforce correct encoding is a bug in PostgreSQL, and a > number of such bugs have been fixed over the decades, so you might > be happier with a less ancient version. > > Yours, > Laurenz Albe > Yes I thought I had seen some info on bug fixes in newer versions. I will pass this on to the system admins/stakeholders. Thanks, Samuel Smith
On 3/16/20 9:33 AM, Adrian Klaver wrote: > On 3/15/20 9:18 PM, Samuel Smith wrote: >> My current DB backup routine is just to call pg_dump and pipe to gzip. >> We recently started to get a failure to restore (which is basically >> just using psql -f on the pg_dump file) with the following errors: >> >> invalid byte sequence for encoding "UTF8": 0xa0 >> and >> invalid byte sequence for encoding "UTF8": 0xd7 0x20 >> >> >> This is on a pg 9.2.24 instance. Any tips to troubleshoot? > > What are the locale and encodings set to for the instance and databases > in it? > >> >> Regards, >> Samuel Smith >> >> > > The server is in UTF8. The file made with pg_dump used 'SQL_ASCII', but setting it to UTF8 (via SET client_encoding ) did not help either. Having the pg_dump encoding set to 'latin1' seems to allow the file created it by it to be loaded via psql -f and everything seems to work. Is there any bad side to setting the encoding on pg_dump to latin1? For the record, the problem characters are: https://www.htmlsymbols.xyz/unicode/U+00D7 and https://www.htmlsymbols.xyz/unicode/U+00A0 But those characters were in many places and not all were issues. They only fail depending on the characters that precede it which makes it complicated. Thanks, Samuel Smith
On 3/15/20 11:18 PM, Samuel Smith wrote: > My current DB backup routine is just to call pg_dump and pipe to gzip. > We recently started to get a failure to restore (which is basically just > using psql -f on the pg_dump file) with the following errors: > > invalid byte sequence for encoding "UTF8": 0xa0 > and > invalid byte sequence for encoding "UTF8": 0xd7 0x20 > > > This is on a pg 9.2.24 instance. Any tips to troubleshoot? > > Regards, > Samuel Smith > > Our issue actually turned out to be that a couple of our production database had the encoding set to SQL_ASCII while all of our development servers had UTF-8. This meant in some cases where we would restore a production database into development (for testing or bug hunting), there would be a failure to parse the backup file. A similar issue to this blog post: https://www.endpoint.com/blog/2017/07/21/postgres-migrating-sqlascii-to-utf-8 So our fix was to dump the affected production databases using the LATIN1 encoding for pg_dump followed by destroying and recreating the database and setting its new encoding to UTF-8. Then we could restore the data using the pg_dump file with LATIN1 encoding. Regards, Samuel Smith