Обсуждение: ERROR: translation failed from server encoding to wchar_t
I am using tsearch2 with pgsql 8.2.5 and get the following error when calling to_tsvector : "translation failed from server encoding to wchar_t" My database is UTF8 encoded and the data sent to to_tsvector comes from a bytea column converted to text with encode(COLUMN, 'escape'). In 8.1 with tsearch2 it worked perfectly ... Thanks for you help, ilan
http://pastebin.ca/845670 This url provides a testcase ... fisrt pass : => UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple', encode(messageblk, 'escape')) where messageblk_idnr = 12949; ERROR: translation failed from server encoding to wchar_t second pass : => UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple', encode(messageblk, 'escape')) where messageblk_idnr = 12949; UPDATE 1 The to_tsvector seems to accept the row at random ... On Jan 7, 9:16 pm, ila...@gmail.com wrote: > I am using tsearch2 with pgsql 8.2.5 and get the following error when > calling to_tsvector : > "translation failed from server encoding to wchar_t" > > My database is UTF8 encoded and the data sent to to_tsvector comes > from a bytea column converted to text with > encode(COLUMN, 'escape'). > > In 8.1 with tsearch2 it worked perfectly ... > > Thanks for you help, > > ilan
http://pastebin.ca/845696 added schema ... On Jan 7, 10:21 pm, ila...@gmail.com wrote: > http://pastebin.ca/845670 > > This url provides a testcase ... > > fisrt pass : > => UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple', > encode(messageblk, 'escape')) where messageblk_idnr = 12949; > ERROR: translation failed from server encoding to wchar_t > > second pass : > => UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple', > encode(messageblk, 'escape')) where messageblk_idnr = 12949; > UPDATE 1 > > The to_tsvector seems to accept the row at random ... > > On Jan 7, 9:16 pm, ila...@gmail.com wrote: > > > I am using tsearch2 with pgsql 8.2.5 and get the following error when > > calling to_tsvector : > > "translation failed from server encoding to wchar_t" > > > My database is UTF8 encoded and the data sent to to_tsvector comes > > from a bytea column converted to text with > > encode(COLUMN, 'escape'). > > > In 8.1 with tsearch2 it worked perfectly ... > > > Thanks for you help, > > > ilan
Found something interesting with this testcase.
update fails after SELECT query.
Can anyone confirm this ???
dbname=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
UPDATE 1
dbname=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
UPDATE 1
dbname=> select * from dbmail_messageblks where messageblk_idnr =
12949;messageblk_idnr | physmessage_id
|
messageblk
| blocksize | is_header |
idxfti
-----------------+----------------
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+-----------+-----------
+------------------------------------------------------------------------------------------------- 12949 |
6319 | l'\351quipe de Casinos-park a bien
re\347u votre messsage. \012\012Vous aurez une r\351ponse d\350s que
l'un de nos responsables aura pris connaissance de votre envoi.
\012\012cordialement\012\012l'\351quipe de casinos-park.
\012\012====================\012\012The team of Casinos-park received
your messsage.\012\012You will have an answer as soon as one of our
persons in charge takes note of your sending. \012\012Best regards
\012\012The team of casinos-park.\012 | 398 | 0 | '=':3
'e':5 'h':11 'i':2,10 'k':12 'l':1 'o':7 'p':9,16 'r':8 's':6 't':13
'u':15 'y':14 'rk':4
(1 row)
dbname=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
ERROR: translation failed from server encoding to wchar_t
dbname=> UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
encode(messageblk, 'escape')) where messageblk_idnr = 12949;
UPDATE 1
On Jan 7, 10:21 pm, ila...@gmail.com wrote:
> http://pastebin.ca/845670
>
> This url provides a testcase ...
>
> fisrt pass :
> => UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
> encode(messageblk, 'escape')) where messageblk_idnr = 12949;
> ERROR: translation failed from server encoding to wchar_t
>
> second pass :
> => UPDATE dbmail_messageblks SET idxFTI=to_tsvector('simple',
> encode(messageblk, 'escape')) where messageblk_idnr = 12949;
> UPDATE 1
>
> The to_tsvector seems to accept the row at random ...
>
> On Jan 7, 9:16 pm, ila...@gmail.com wrote:
>
> > I am using tsearch2 with pgsql 8.2.5 and get the following error when
> > calling to_tsvector :
> > "translation failed from server encoding to wchar_t"
>
> > My database is UTF8 encoded and the data sent to to_tsvector comes
> > from a bytea column converted to text with
> > encode(COLUMN, 'escape').
>
> > In 8.1 with tsearch2 it worked perfectly ...
>
> > Thanks for you help,
>
> > ilan
ilanco@gmail.com writes:
> I am using tsearch2 with pgsql 8.2.5 and get the following error when
> calling to_tsvector :
> "translation failed from server encoding to wchar_t"
> My database is UTF8 encoded and the data sent to to_tsvector comes
> from a bytea column converted to text with
> encode(COLUMN, 'escape').
Two likely theories:
1. Your database encoding is UTF-8, but your locale (LC_CTYPE) assumes
some other encoding.
2. The encode() is yielding something that isn't valid UTF-8.
PG 8.3 contains checks that should complain about both of these
scenarios, but IIRC 8.2 does not.
regards, tom lane
On Jan 8, 4:14 am, t...@sss.pgh.pa.us (Tom Lane) wrote: > ila...@gmail.com writes: > > I am using tsearch2 with pgsql 8.2.5 and get the following error when > > calling to_tsvector : > > "translation failed from server encoding to wchar_t" > > My database is UTF8 encoded and the data sent to to_tsvector comes > > from a bytea column converted to text with > > encode(COLUMN, 'escape'). > > Two likely theories: > > 1. Your database encoding is UTF-8, but your locale (LC_CTYPE) assumes > some other encoding. > > 2. The encode() is yielding something that isn't valid UTF-8. > > PG 8.3 contains checks that should complain about both of these > scenarios, but IIRC 8.2 does not. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majord...@postgresql.org so that your > message can get through to the mailing list cleanly Dear Tom, Thanks for your reply. This is the output of `locale` on my system : # locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL= As for your second scenario I guess you are right, it's possible encode does not return all UTF8 characters. But to_tsvector() succeeds and fails at random with this kind of characters... So how can I sanitize output from encode before I pipe it to to_tsvector() ? Regards, Ilan
On Jan 8, 10:43 am, ila...@gmail.com wrote:
> On Jan 8, 4:14 am, t...@sss.pgh.pa.us (Tom Lane) wrote:
>
>
>
> > ila...@gmail.com writes:
> > > I am using tsearch2 with pgsql 8.2.5 and get the following error when
> > > calling to_tsvector :
> > > "translation failed from server encoding to wchar_t"
> > > My database is UTF8 encoded and the data sent to to_tsvector comes
> > > from a bytea column converted to text with
> > > encode(COLUMN, 'escape').
>
> > Two likely theories:
>
> > 1. Your database encoding is UTF-8, but your locale (LC_CTYPE) assumes
> > some other encoding.
>
> > 2. The encode() is yielding something that isn't valid UTF-8.
>
> > PG 8.3 contains checks that should complain about both of these
> > scenarios, but IIRC 8.2 does not.
>
> > regards, tom lane
>
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to majord...@postgresql.org so that your
> > message can get through to the mailing list cleanly
>
> Dear Tom,
>
> Thanks for your reply.
> This is the output of `locale` on my system :
> # locale
> LANG=en_US.UTF-8
> LC_CTYPE="en_US.UTF-8"
> LC_NUMERIC="en_US.UTF-8"
> LC_TIME="en_US.UTF-8"
> LC_COLLATE="en_US.UTF-8"
> LC_MONETARY="en_US.UTF-8"
> LC_MESSAGES="en_US.UTF-8"
> LC_PAPER="en_US.UTF-8"
> LC_NAME="en_US.UTF-8"
> LC_ADDRESS="en_US.UTF-8"
> LC_TELEPHONE="en_US.UTF-8"
> LC_MEASUREMENT="en_US.UTF-8"
> LC_IDENTIFICATION="en_US.UTF-8"
> LC_ALL=
>
> As for your second scenario I guess you are right, it's possible
> encode does not return all UTF8 characters.
> But to_tsvector() succeeds and fails at random with this kind of
> characters...
> So how can I sanitize output from encode before I pipe it to
> to_tsvector() ?
>
> Regards,
>
> Ilan
Tom,
To get around the non-UTF8 chars I used following function :
CREATE OR REPLACE FUNCTION u_messageblk_idxfti() RETURNS "trigger"
AS $$
DECLARE
BEGIN RAISE NOTICE '[DBMAIL] Trying ID %', NEW.messageblk_idnr; BEGIN NEW.idxFTI := to_tsvector('simple',
encode($x$E$x$||
NEW.messageblk, 'escape')); RAISE NOTICE '[DBMAIL] Ended ID %', NEW.messageblk_idnr; RETURN NEW; EXCEPTION WHEN
character_not_in_repertoireTHEN RAISE WARNING '[DBMAIL] character_not_in_repertoire ID %',
NEW.messageblk_idnr; NEW.idxFTI := to_tsvector('simple',
'character_not_in_repertoire: This email contains illegal
characters.'); RETURN NEW; END;
END;
$$
LANGUAGE plpgsql;
Hope this helps others with DBmail and tsearch2 on postgres 8.2
Thanks for your help Tom,
ilan
ilanco@gmail.com wrote:
> NEW.idxFTI := to_tsvector('simple', encode($x$E$x$||
> NEW.messageblk, 'escape'));
>
>
I strongly doubt that this does what you think it does - I would check
the results if I were you. The $x$E$x$ should almost certainly not be
there - if you are trying to get E'foo' behaviour, that is purely for
literals. All you are doing here is to prepend a literal 'E' to your value.
cheers
andrew