Обсуждение: Postgresql 9.4.4 - ERROR: invalid byte sequence for encoding "UTF8": 0x92

Поиск
Список
Период
Сортировка

Postgresql 9.4.4 - ERROR: invalid byte sequence for encoding "UTF8": 0x92

От
Prasanth Reddy
Дата:
Hi,

I have posted a question about this same issue on JDBC thinking it is a driver issue. I was told this error is
generatedby the back-end itself rather than by the driver so posting the question in 
admin forum. See discussion on this here http://www.postgresql.org/list/pgsql-jdbc/since/201508080000/

I am currently running 9.1.9 and trying to upgrade to 9.4. I have done a dump and restore, when I start my java
applicationI am getting the below error. The server uses SQL_ASCII encoding and the 
client encoding is UTF8. There are some invalid characters in the database but this has not caused a problem in the
currentversion or 9.3 (tried a restore in 9.3 and the application works fine). 

 ERROR:  invalid byte sequence for encoding "UTF8": 0x92
 STATEMENT:  SELECT * FROM client_data WHERE status_code = 0 ORDER BY client_name, description


org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x92
>     at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2270)
>     at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1998)
>     at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
>     at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:570)
>     at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:420)
>     at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:305)
>     at com.sun.rowset.JdbcRowSetImpl.execute(JdbcRowSetImpl.java:567)


Same error with postgresql-9.4-1201.jdbc4.jar & postgresql-9.1-902.jdbc4.jar.

Appreciate your help.

Thanks,
Prasanth



Re: Postgresql 9.4.4 - ERROR: invalid byte sequence for encoding "UTF8": 0x92

От
Scott Ribe
Дата:
On Aug 11, 2015, at 8:59 AM, Prasanth Reddy <dbadmin@nqadmin.com> wrote:
>
> The server uses SQL_ASCII encoding and the
> client encoding is UTF8. There are some invalid characters in the database but this has not caused a problem in the
currentversion or 9.3 (tried a restore in 9.3 and the application works fine). 

Later versions of PostgreSQL do better checking of UTF-8, and disallow invalid sequences.

You're going to have to straighten out your encoding conflicts.

--
Scott Ribe
scott_ribe@elevated-dev.com
http://www.elevated-dev.com/
https://www.linkedin.com/in/scottribe/
(303) 722-0567 voice







Re: Postgresql 9.4.4 - ERROR: invalid byte sequence for encoding "UTF8": 0x92

От
Tom Lane
Дата:
Prasanth Reddy <dbadmin@nqadmin.com> writes:
> I am currently running 9.1.9 and trying to upgrade to 9.4. I have done a dump and restore, when I start my java
applicationI am getting the below error. The server uses SQL_ASCII encoding and the 
> client encoding is UTF8. There are some invalid characters in the database but this has not caused a problem in the
currentversion or 9.3 (tried a restore in 9.3 and the application works fine). 

>  ERROR:  invalid byte sequence for encoding "UTF8": 0x92
>  STATEMENT:  SELECT * FROM client_data WHERE status_code = 0 ORDER BY client_name, description

You need to fix the encoding errors in your data.  9.4 is intentionally
less lax about that than prior versions.

Or, if you really want the database to be totally encoding-ignorant,
use SQL_ASCII as both client and server encoding.  But if you have the
client declared to use UTF8, the server will try not to send anything
that isn't valid UTF8.

I believe the specific change that's biting you is

    Author: Tom Lane <tgl@sss.pgh.pa.us>
    Branch: master Release: REL9_4_BR [49c817eab] 2014-02-23 15:22:50 -0500

    Plug some more holes in encoding conversion.

    Various places assume that pg_do_encoding_conversion() and
    pg_server_to_any() will ensure encoding validity of their results;
    but they failed to do so in the case that the source encoding is SQL_ASCII
    while the destination is not.  We cannot perform any actual "conversion"
    in that scenario, but we should still validate the string according to the
    destination encoding.  Per bug #9210 from Digoal Zhou.

but there were some others of the same ilk in 9.4.

            regards, tom lane


Re: Postgresql 9.4.4 - ERROR: invalid byte sequence for encoding "UTF8": 0x92

От
Prasanth Reddy
Дата:
Thanks for the prompt response.  I was playing with it a bit more and seems like any character with value less than
65533is working fine, guessing that is all Unicode characters. Does the server also 
reject an insert/update when there are invalid characters? I took a character that is supposed to be invalid (displayed
asa small box, from application using 9.1 version) and pasted it in to 
application using 9.4 version of postgresql and I was able to save it to database.  Should this have failed?

If I find and fix all these characters (which would be a huge task), I want to make sure that the database is not going
totake any new invalid characters. Please let me know if there is some setting 
I can change in the configuration to do this. Another option I was thinking of is may be change the encoding of the
databaseitself to UTF8. Before the pg_restore used to fail when I tried the 
database encoding of UTF8 may be if I fix the invalid characters and then do a dump it would work.

Thanks,
Prasanth


Prasanth Reddy <dbadmin(at)nqadmin(dot)com> writes:
> I am currently running 9.1.9 and trying to upgrade to 9.4. I have done a dump and restore, when I start my java
applicationI am getting the below error. The server uses SQL_ASCII encoding and the 
> client encoding is UTF8. There are some invalid characters in the database but this has not caused a problem in the
currentversion or 9.3 (tried a restore in 9.3 and the application works fine). 

>  ERROR:  invalid byte sequence for encoding "UTF8": 0x92
>  STATEMENT:  SELECT * FROM client_data WHERE status_code = 0 ORDER BY client_name, description

You need to fix the encoding errors in your data.  9.4 is intentionally
less lax about that than prior versions.

Or, if you really want the database to be totally encoding-ignorant,
use SQL_ASCII as both client and server encoding.  But if you have the
client declared to use UTF8, the server will try not to send anything
that isn't valid UTF8.

I believe the specific change that's biting you is

    Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
    Branch: master Release: REL9_4_BR [49c817eab] 2014-02-23 15:22:50 -0500

    Plug some more holes in encoding conversion.

    Various places assume that pg_do_encoding_conversion() and
    pg_server_to_any() will ensure encoding validity of their results;
    but they failed to do so in the case that the source encoding is SQL_ASCII
    while the destination is not.  We cannot perform any actual "conversion"
    in that scenario, but we should still validate the string according to the
    destination encoding.  Per bug #9210 from Digoal Zhou.

but there were some others of the same ilk in 9.4.

            regards, tom lane
Prasanth Reddy <dbadmin(at)nqadmin(dot)com> writes:
> I am currently running 9.1.9 and trying to upgrade to 9.4. I have done a dump and restore, when I start my java
applicationI am getting the below error. The server uses SQL_ASCII encoding and the 
> client encoding is UTF8. There are some invalid characters in the database but this has not caused a problem in the
currentversion or 9.3 (tried a restore in 9.3 and the application works fine). 

>  ERROR:  invalid byte sequence for encoding "UTF8": 0x92
>  STATEMENT:  SELECT * FROM client_data WHERE status_code = 0 ORDER BY client_name, description

You need to fix the encoding errors in your data.  9.4 is intentionally
less lax about that than prior versions.

Or, if you really want the database to be totally encoding-ignorant,
use SQL_ASCII as both client and server encoding.  But if you have the
client declared to use UTF8, the server will try not to send anything
that isn't valid UTF8.

I believe the specific change that's biting you is

    Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
    Branch: master Release: REL9_4_BR [49c817eab] 2014-02-23 15:22:50 -0500

    Plug some more holes in encoding conversion.

    Various places assume that pg_do_encoding_conversion() and
    pg_server_to_any() will ensure encoding validity of their results;
    but they failed to do so in the case that the source encoding is SQL_ASCII
    while the destination is not.  We cannot perform any actual "conversion"
    in that scenario, but we should still validate the string according to the
    destination encoding.  Per bug #9210 from Digoal Zhou.

but there were some others of the same ilk in 9.4.

            regards, tom lane


On 08/11/2015 09:59 AM, Prasanth Reddy wrote:
> Hi,
>
> I have posted a question about this same issue on JDBC thinking it is a driver issue. I was told this error is
generatedby the back-end itself rather than by the driver so posting the question in 
> admin forum. See discussion on this here http://www.postgresql.org/list/pgsql-jdbc/since/201508080000/
>
> I am currently running 9.1.9 and trying to upgrade to 9.4. I have done a dump and restore, when I start my java
applicationI am getting the below error. The server uses SQL_ASCII encoding and the 
> client encoding is UTF8. There are some invalid characters in the database but this has not caused a problem in the
currentversion or 9.3 (tried a restore in 9.3 and the application works fine). 
>
>  ERROR:  invalid byte sequence for encoding "UTF8": 0x92
>  STATEMENT:  SELECT * FROM client_data WHERE status_code = 0 ORDER BY client_name, description
>
>
> org.postgresql.util.PSQLException: ERROR: invalid byte sequence for encoding "UTF8": 0x92
>>     at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2270)
>>     at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1998)
>>     at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
>>     at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:570)
>>     at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:420)
>>     at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:305)
>>     at com.sun.rowset.JdbcRowSetImpl.execute(JdbcRowSetImpl.java:567)
>
> Same error with postgresql-9.4-1201.jdbc4.jar & postgresql-9.1-902.jdbc4.jar.
>
> Appreciate your help.
>
> Thanks,
> Prasanth
>