Обсуждение: libpq and mysterious "invalid byte sequence for encoding UTF8".

Поиск
Список
Период
Сортировка

libpq and mysterious "invalid byte sequence for encoding UTF8".

От
Jiří Pavlovský
Дата:
I'm stuck trying to find a cause for

> invalid byte sequence for encoding "UTF8".

It is an C program using libpq. I'm using `PQexecParams` to
execute the SQL query.

The offending byte sequence is completely random, sometimes the command
even runs ok. I thought I must have a memory allocation issue somewhere,
but even if I specify all the parameters as static strings, I still
receive the error with a random byte sequence.
What's more, the same query with the same parameters runs ok when I
create a small test program.
So I'm completely stuck. I verified all the possible sources for the
error like client_encoding etc, but could not find the source of the error.
What is confusing me is that the offending byte sequence is random, even
though the query parameters don't change.
Moreover, when I check the postgres log, the query and its parameters
appear to be correct.

I'm trying to update a record in the following table:

    CREATE TABLE public.contacts
    (
        contactid integer NOT NULL DEFAULT
nextval('contacts_contactid_seq'::regclass),
        paperid integer,
        pos character varying(50) COLLATE pg_catalog."default",
        title character varying(10) COLLATE pg_catalog."default",
        firstname character varying(20) COLLATE pg_catalog."default",
        lastname character varying(25) COLLATE pg_catalog."default",
        func character varying(25) COLLATE pg_catalog."default",
        tel1 text COLLATE pg_catalog."default",
        tel2 text COLLATE pg_catalog."default",
        fax1 text COLLATE pg_catalog."default",
        fax2 text COLLATE pg_catalog."default",
        email1 character varying(50) COLLATE pg_catalog."default",
        email2 character varying(50) COLLATE pg_catalog."default",
        maincontact boolean DEFAULT false,
        publdatacontact boolean DEFAULT false,
        invcontact boolean DEFAULT false,
        queries_recipient boolean,
        contact_log text COLLATE pg_catalog."default",
        salesforceid character(18) COLLATE pg_catalog."default",
        fakelastname boolean NOT NULL DEFAULT false,
        CONSTRAINT contacts_pk PRIMARY KEY (contactid),
        CONSTRAINT contacts_paperid_fkey FOREIGN KEY (paperid)
            REFERENCES public.papers (paperid) MATCH SIMPLE
            ON UPDATE CASCADE
            ON DELETE CASCADE
    );

Here is an actual code:

        const char* pparams[16] = {
    NULL,
    NULL,
    "1702",
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    NULL,
    "14340"
        };
   

   
    gchar *query="UPDATE contacts SET

Pos=$1::varchar,Title=$2::varchar,PaperID=$3::int,FirstName=$4::varchar,LastName=$5::varchar,Func=$6::varchar,Tel1=$7::text,Fax1=$8::text,Email1=$9::varchar,Tel2=$10::text,Fax2=$11::text,Email2=$12::varchar,MainContact=$13::boolean,PublDataContact=$14::boolean,InvContact=$15::boolean
WHERE ContactID=$16::int";
   
          result = PQexecParams(conn, query, 16, NULL, pparams, ssizes,
bbinary, 0);

An excerpt from Postgres log:

    Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [113-1] 2021-01-26
09:40:57.505 CET [11334] jira@project-syndicate LOG:  execute <unnamed>:
   
        UPDATE contacts SET Pos = $1::varchar, Title = $2::varchar,
PaperID = $3::int, FirstName = $4::varchar, LastName = $5::varchar, Func
= $6::varchar, Tel1 = $7::text, Fax1 = $8::text, Email1 = $9::varchar,
Tel2 = $10::text, Fax2 = $11::text, Email2 = $12::varchar, MainContact =
$13::boolean, PublDataContact = $14::boolean, InvContact = $15::boolean
WHERE ContactID = $16::int
        Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [113-2]
2021-01-26 09:40:57.505 CET [11334] jira@project-syndicate DETAIL: 
parameters: $1 = NULL, $2 = NULL, $3 = '1702', $4 = NULL, $5 = NULL, $6
= NULL, $7 = NULL, $8 = NULL, $9 = NULL, $10 = NULL, $11 = NULL, $12 =
NULL, $13 = NULL, $14 =  NULL, $15 = NULL, $16 = '14340'
        Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [114-1]
2021-01-26 09:40:57.544 CET [11334] jira@project-syndicate ERROR: 
invalid byte sequence for encoding "UTF8": 0x80

Any ideas as to what could be a cause of the error?




Re: libpq and mysterious "invalid byte sequence for encoding UTF8".

От
Tom Lane
Дата:
=?UTF-8?B?SmnFmcOtIFBhdmxvdnNrw70=?= <jiri@pavlovsky.eu> writes:
>     Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [113-1] 2021-01-26
> 09:40:57.505 CET [11334] jira@project-syndicate LOG:  execute <unnamed>:
>    
>         UPDATE contacts SET Pos = $1::varchar, Title = $2::varchar,
> PaperID = $3::int, FirstName = $4::varchar, LastName = $5::varchar, Func
> = $6::varchar, Tel1 = $7::text, Fax1 = $8::text, Email1 = $9::varchar,
> Tel2 = $10::text, Fax2 = $11::text, Email2 = $12::varchar, MainContact =
> $13::boolean, PublDataContact = $14::boolean, InvContact = $15::boolean
> WHERE ContactID = $16::int
>         Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [113-2]
> 2021-01-26 09:40:57.505 CET [11334] jira@project-syndicate DETAIL: 
> parameters: $1 = NULL, $2 = NULL, $3 = '1702', $4 = NULL, $5 = NULL, $6
> = NULL, $7 = NULL, $8 = NULL, $9 = NULL, $10 = NULL, $11 = NULL, $12 =
> NULL, $13 = NULL, $14 =  NULL, $15 = NULL, $16 = '14340'
>         Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [114-1]
> 2021-01-26 09:40:57.544 CET [11334] jira@project-syndicate ERROR: 
> invalid byte sequence for encoding "UTF8": 0x80

This log entry shows that your query text and parameters all made it to
the backend just fine.  So I don't think your issue is on the client side.
I'm wondering if the error could be from (say) triggers doing fancy data
manipulations.  Noting your use of ON UPDATE CASCADE, it'd be worth
checking indirectly-affected tables as well.

            regards, tom lane



Re: libpq and mysterious "invalid byte sequence for encoding UTF8".

От
Jiří Pavlovský
Дата:
On 01.02.2021 15:55, Tom Lane wrote:
> =?UTF-8?B?SmnFmcOtIFBhdmxvdnNrw70=?= <jiri@pavlovsky.eu> writes:
>>      Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [113-1] 2021-01-26
>> 09:40:57.505 CET [11334] jira@project-syndicate LOG:  execute <unnamed>:
>>     
>>          UPDATE contacts SET Pos = $1::varchar, Title = $2::varchar,
>> PaperID = $3::int, FirstName = $4::varchar, LastName = $5::varchar, Func
>> = $6::varchar, Tel1 = $7::text, Fax1 = $8::text, Email1 = $9::varchar,
>> Tel2 = $10::text, Fax2 = $11::text, Email2 = $12::varchar, MainContact =
>> $13::boolean, PublDataContact = $14::boolean, InvContact = $15::boolean
>> WHERE ContactID = $16::int
>>          Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [113-2]
>> 2021-01-26 09:40:57.505 CET [11334] jira@project-syndicate DETAIL:
>> parameters: $1 = NULL, $2 = NULL, $3 = '1702', $4 = NULL, $5 = NULL, $6
>> = NULL, $7 = NULL, $8 = NULL, $9 = NULL, $10 = NULL, $11 = NULL, $12 =
>> NULL, $13 = NULL, $14 =  NULL, $15 = NULL, $16 = '14340'
>>          Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [114-1]
>> 2021-01-26 09:40:57.544 CET [11334] jira@project-syndicate ERROR:
>> invalid byte sequence for encoding "UTF8": 0x80
> This log entry shows that your query text and parameters all made it to
> the backend just fine.  So I don't think your issue is on the client side.
> I'm wondering if the error could be from (say) triggers doing fancy data
> manipulations.  Noting your use of ON UPDATE CASCADE, it'd be worth
> checking indirectly-affected tables as well.
>
>             regards, tom lane


Thanks for the answer.


I don't think triggers or constraints are the issue.

The exact same code works if I create a small test program with the same 
query and the same parameters.

But when used in the context of the whole application it gives the error.

  I don't know what should I try. I tried all possible combinations of 
parameters.

I have just even tried to disable triggers and remove any constraints 
from the table. Same error.




Re: libpq and mysterious "invalid byte sequence for encoding UTF8".

От
rob stone
Дата:
Hello,

On Mon, 2021-02-01 at 18:03 +0100, Jiří Pavlovský wrote:
> 
> 
> Thanks for the answer.
> 
> 
> I don't think triggers or constraints are the issue.
> 
> The exact same code works if I create a small test program with the
> same 
> query and the same parameters.
> 
> But when used in the context of the whole application it gives the
> error.
> 
>   I don't know what should I try. I tried all possible combinations
> of 
> parameters.
> 
> I have just even tried to disable triggers and remove any constraints
> from the table. Same error.
> 
Columns:- maincontact boolean DEFAULT false,
        publdatacontact boolean DEFAULT false,
        invcontact boolean DEFAULT false,
        queries_recipient boolean,
        fakelastname boolean NOT NULL DEFAULT false,

are defined as booleans and all except one have defaults declared, yet
you are passing NULL's which the insert tries to cast to boolean.
Maybe this is the cause of the error??

My 2 cents worth.







Re: libpq and mysterious "invalid byte sequence for encoding UTF8".

От
Tom Lane
Дата:
rob stone <floriparob@gmail.com> writes:
> Columns:- maincontact boolean DEFAULT false,
>         publdatacontact boolean DEFAULT false,
>         invcontact boolean DEFAULT false,
>         queries_recipient boolean,
>         fakelastname boolean NOT NULL DEFAULT false,

> are defined as booleans and all except one have defaults declared, yet
> you are passing NULL's which the insert tries to cast to boolean.
> Maybe this is the cause of the error??

Oooh ... if we posit that the statement is expected to throw an error,
maybe what this error is complaining about is an inability to translate
a localized error message to the client's encoding.  This could reflect
an incorrectly-encoded .po file, for example.  So what we need to know
next is what lc_messages setting Jiří is using, and also the server
and client encodings.

            regards, tom lane



Re: libpq and mysterious "invalid byte sequence for encoding UTF8".

От
Jiří Pavlovský
Дата:
On 02.02.2021 3:53, Tom Lane wrote:
> rob stone <floriparob@gmail.com> writes:
>> Columns:- maincontact boolean DEFAULT false,
>>         publdatacontact boolean DEFAULT false,
>>         invcontact boolean DEFAULT false,
>>         queries_recipient boolean,
>>         fakelastname boolean NOT NULL DEFAULT false,
>> are defined as booleans and all except one have defaults declared, yet
>> you are passing NULL's which the insert tries to cast to boolean.
>> Maybe this is the cause of the error??
> Oooh ... if we posit that the statement is expected to throw an error,
> maybe what this error is complaining about is an inability to translate
> a localized error message to the client's encoding.  This could reflect
> an incorrectly-encoded .po file, for example.  So what we need to know
> next is what lc_messages setting Jiří is using, and also the server
> and client encodings.


Still, this does not explain why it sometimes work and why the byte
sequence in the error message changes on every run, no?

I'm getting the error even when I changes the boolean fields from NULL
to FALSE.

Client and server encoding is UTF8. Locales are set to C.UTF8.




Re: libpq and mysterious "invalid byte sequence for encoding UTF8".

От
Tom Lane
Дата:
=?UTF-8?B?SmnFmcOtIFBhdmxvdnNrw70=?= <jiri@pavlovsky.eu> writes:
> Client and server encoding is UTF8. Locales are set to C.UTF8.

If lc_messages is C then no translation of error strings would happen,
so that theory goes out the window.  Oh well.

Perhaps you could attach gdb to the backend and get a stack trace
from the call to errfinish?

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

            regards, tom lane