Обсуждение: libpq and mysterious "invalid byte sequence for encoding UTF8".
I'm stuck trying to find a cause for > invalid byte sequence for encoding "UTF8". It is an C program using libpq. I'm using `PQexecParams` to execute the SQL query. The offending byte sequence is completely random, sometimes the command even runs ok. I thought I must have a memory allocation issue somewhere, but even if I specify all the parameters as static strings, I still receive the error with a random byte sequence. What's more, the same query with the same parameters runs ok when I create a small test program. So I'm completely stuck. I verified all the possible sources for the error like client_encoding etc, but could not find the source of the error. What is confusing me is that the offending byte sequence is random, even though the query parameters don't change. Moreover, when I check the postgres log, the query and its parameters appear to be correct. I'm trying to update a record in the following table: CREATE TABLE public.contacts ( contactid integer NOT NULL DEFAULT nextval('contacts_contactid_seq'::regclass), paperid integer, pos character varying(50) COLLATE pg_catalog."default", title character varying(10) COLLATE pg_catalog."default", firstname character varying(20) COLLATE pg_catalog."default", lastname character varying(25) COLLATE pg_catalog."default", func character varying(25) COLLATE pg_catalog."default", tel1 text COLLATE pg_catalog."default", tel2 text COLLATE pg_catalog."default", fax1 text COLLATE pg_catalog."default", fax2 text COLLATE pg_catalog."default", email1 character varying(50) COLLATE pg_catalog."default", email2 character varying(50) COLLATE pg_catalog."default", maincontact boolean DEFAULT false, publdatacontact boolean DEFAULT false, invcontact boolean DEFAULT false, queries_recipient boolean, contact_log text COLLATE pg_catalog."default", salesforceid character(18) COLLATE pg_catalog."default", fakelastname boolean NOT NULL DEFAULT false, CONSTRAINT contacts_pk PRIMARY KEY (contactid), CONSTRAINT contacts_paperid_fkey FOREIGN KEY (paperid) REFERENCES public.papers (paperid) MATCH SIMPLE ON UPDATE CASCADE ON DELETE CASCADE ); Here is an actual code: const char* pparams[16] = { NULL, NULL, "1702", NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, "14340" }; gchar *query="UPDATE contacts SET Pos=$1::varchar,Title=$2::varchar,PaperID=$3::int,FirstName=$4::varchar,LastName=$5::varchar,Func=$6::varchar,Tel1=$7::text,Fax1=$8::text,Email1=$9::varchar,Tel2=$10::text,Fax2=$11::text,Email2=$12::varchar,MainContact=$13::boolean,PublDataContact=$14::boolean,InvContact=$15::boolean WHERE ContactID=$16::int"; result = PQexecParams(conn, query, 16, NULL, pparams, ssizes, bbinary, 0); An excerpt from Postgres log: Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [113-1] 2021-01-26 09:40:57.505 CET [11334] jira@project-syndicate LOG: execute <unnamed>: UPDATE contacts SET Pos = $1::varchar, Title = $2::varchar, PaperID = $3::int, FirstName = $4::varchar, LastName = $5::varchar, Func = $6::varchar, Tel1 = $7::text, Fax1 = $8::text, Email1 = $9::varchar, Tel2 = $10::text, Fax2 = $11::text, Email2 = $12::varchar, MainContact = $13::boolean, PublDataContact = $14::boolean, InvContact = $15::boolean WHERE ContactID = $16::int Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [113-2] 2021-01-26 09:40:57.505 CET [11334] jira@project-syndicate DETAIL: parameters: $1 = NULL, $2 = NULL, $3 = '1702', $4 = NULL, $5 = NULL, $6 = NULL, $7 = NULL, $8 = NULL, $9 = NULL, $10 = NULL, $11 = NULL, $12 = NULL, $13 = NULL, $14 = NULL, $15 = NULL, $16 = '14340' Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [114-1] 2021-01-26 09:40:57.544 CET [11334] jira@project-syndicate ERROR: invalid byte sequence for encoding "UTF8": 0x80 Any ideas as to what could be a cause of the error?
=?UTF-8?B?SmnFmcOtIFBhdmxvdnNrw70=?= <jiri@pavlovsky.eu> writes: > Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [113-1] 2021-01-26 > 09:40:57.505 CET [11334] jira@project-syndicate LOG: execute <unnamed>: > > UPDATE contacts SET Pos = $1::varchar, Title = $2::varchar, > PaperID = $3::int, FirstName = $4::varchar, LastName = $5::varchar, Func > = $6::varchar, Tel1 = $7::text, Fax1 = $8::text, Email1 = $9::varchar, > Tel2 = $10::text, Fax2 = $11::text, Email2 = $12::varchar, MainContact = > $13::boolean, PublDataContact = $14::boolean, InvContact = $15::boolean > WHERE ContactID = $16::int > Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [113-2] > 2021-01-26 09:40:57.505 CET [11334] jira@project-syndicate DETAIL: > parameters: $1 = NULL, $2 = NULL, $3 = '1702', $4 = NULL, $5 = NULL, $6 > = NULL, $7 = NULL, $8 = NULL, $9 = NULL, $10 = NULL, $11 = NULL, $12 = > NULL, $13 = NULL, $14 = NULL, $15 = NULL, $16 = '14340' > Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [114-1] > 2021-01-26 09:40:57.544 CET [11334] jira@project-syndicate ERROR: > invalid byte sequence for encoding "UTF8": 0x80 This log entry shows that your query text and parameters all made it to the backend just fine. So I don't think your issue is on the client side. I'm wondering if the error could be from (say) triggers doing fancy data manipulations. Noting your use of ON UPDATE CASCADE, it'd be worth checking indirectly-affected tables as well. regards, tom lane
On 01.02.2021 15:55, Tom Lane wrote: > =?UTF-8?B?SmnFmcOtIFBhdmxvdnNrw70=?= <jiri@pavlovsky.eu> writes: >> Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [113-1] 2021-01-26 >> 09:40:57.505 CET [11334] jira@project-syndicate LOG: execute <unnamed>: >> >> UPDATE contacts SET Pos = $1::varchar, Title = $2::varchar, >> PaperID = $3::int, FirstName = $4::varchar, LastName = $5::varchar, Func >> = $6::varchar, Tel1 = $7::text, Fax1 = $8::text, Email1 = $9::varchar, >> Tel2 = $10::text, Fax2 = $11::text, Email2 = $12::varchar, MainContact = >> $13::boolean, PublDataContact = $14::boolean, InvContact = $15::boolean >> WHERE ContactID = $16::int >> Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [113-2] >> 2021-01-26 09:40:57.505 CET [11334] jira@project-syndicate DETAIL: >> parameters: $1 = NULL, $2 = NULL, $3 = '1702', $4 = NULL, $5 = NULL, $6 >> = NULL, $7 = NULL, $8 = NULL, $9 = NULL, $10 = NULL, $11 = NULL, $12 = >> NULL, $13 = NULL, $14 = NULL, $15 = NULL, $16 = '14340' >> Jan 26 08:40:57 ip-172-16-10-94 postgres[11334]: [114-1] >> 2021-01-26 09:40:57.544 CET [11334] jira@project-syndicate ERROR: >> invalid byte sequence for encoding "UTF8": 0x80 > This log entry shows that your query text and parameters all made it to > the backend just fine. So I don't think your issue is on the client side. > I'm wondering if the error could be from (say) triggers doing fancy data > manipulations. Noting your use of ON UPDATE CASCADE, it'd be worth > checking indirectly-affected tables as well. > > regards, tom lane Thanks for the answer. I don't think triggers or constraints are the issue. The exact same code works if I create a small test program with the same query and the same parameters. But when used in the context of the whole application it gives the error. I don't know what should I try. I tried all possible combinations of parameters. I have just even tried to disable triggers and remove any constraints from the table. Same error.
Hello, On Mon, 2021-02-01 at 18:03 +0100, Jiří Pavlovský wrote: > > > Thanks for the answer. > > > I don't think triggers or constraints are the issue. > > The exact same code works if I create a small test program with the > same > query and the same parameters. > > But when used in the context of the whole application it gives the > error. > > I don't know what should I try. I tried all possible combinations > of > parameters. > > I have just even tried to disable triggers and remove any constraints > from the table. Same error. > Columns:- maincontact boolean DEFAULT false, publdatacontact boolean DEFAULT false, invcontact boolean DEFAULT false, queries_recipient boolean, fakelastname boolean NOT NULL DEFAULT false, are defined as booleans and all except one have defaults declared, yet you are passing NULL's which the insert tries to cast to boolean. Maybe this is the cause of the error?? My 2 cents worth.
rob stone <floriparob@gmail.com> writes: > Columns:- maincontact boolean DEFAULT false, > publdatacontact boolean DEFAULT false, > invcontact boolean DEFAULT false, > queries_recipient boolean, > fakelastname boolean NOT NULL DEFAULT false, > are defined as booleans and all except one have defaults declared, yet > you are passing NULL's which the insert tries to cast to boolean. > Maybe this is the cause of the error?? Oooh ... if we posit that the statement is expected to throw an error, maybe what this error is complaining about is an inability to translate a localized error message to the client's encoding. This could reflect an incorrectly-encoded .po file, for example. So what we need to know next is what lc_messages setting Jiří is using, and also the server and client encodings. regards, tom lane
On 02.02.2021 3:53, Tom Lane wrote: > rob stone <floriparob@gmail.com> writes: >> Columns:- maincontact boolean DEFAULT false, >> publdatacontact boolean DEFAULT false, >> invcontact boolean DEFAULT false, >> queries_recipient boolean, >> fakelastname boolean NOT NULL DEFAULT false, >> are defined as booleans and all except one have defaults declared, yet >> you are passing NULL's which the insert tries to cast to boolean. >> Maybe this is the cause of the error?? > Oooh ... if we posit that the statement is expected to throw an error, > maybe what this error is complaining about is an inability to translate > a localized error message to the client's encoding. This could reflect > an incorrectly-encoded .po file, for example. So what we need to know > next is what lc_messages setting Jiří is using, and also the server > and client encodings. Still, this does not explain why it sometimes work and why the byte sequence in the error message changes on every run, no? I'm getting the error even when I changes the boolean fields from NULL to FALSE. Client and server encoding is UTF8. Locales are set to C.UTF8.
=?UTF-8?B?SmnFmcOtIFBhdmxvdnNrw70=?= <jiri@pavlovsky.eu> writes: > Client and server encoding is UTF8. Locales are set to C.UTF8. If lc_messages is C then no translation of error strings would happen, so that theory goes out the window. Oh well. Perhaps you could attach gdb to the backend and get a stack trace from the call to errfinish? https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend regards, tom lane