Re: COPY fails on 8.1 with invalid byte sequences in text

Поиск
Список
Период
Сортировка
От Birju Prajapati
Тема Re: COPY fails on 8.1 with invalid byte sequences in text
Дата
Msg-id e02b8c250610290427y34d8c4d7i49314ba597f12c0@mail.gmail.com
обсуждение исходный текст
Ответ на Re: COPY fails on 8.1 with invalid byte sequences in text  ("Thomas H." <me@alternize.com>)
Список pgsql-bugs
On 27/10/06, Thomas H. <me@alternize.com> wrote:
> FYI, prior to 8.2, there is another source of bad UTF8 byte sequences:
>
> when using tsearch2 on utf8 content in <8.2, tsearch2 was generating bad
> utf8 sequences. as tsearch2 does lowercase each char in the text its
> indexing, it did also do so with multibyte-characters... unfortunately
> taking each byte separately, so it seems. the unicode-representation of
> german umlauts (=E4=F6=FC) are some examples of charcodes, that where tur=
ned into
> invalid sequences.
>
> this data could be successfully pg_dump'ed, but not pg_restore'd. in 8.2,
> this looks fixed. to upgrade from 8.1.5 to 8.2b1 we had to remove all
> tsearch2 index data, dump the db, restore the db in 8.2 and recreate the
> indices.

You need to initdb with utf8 and then install tsearch2 with utf8. Both
need utf8. I had a similar problem. Perhaps your 8.1 postgres cluster
wasn't utf8?

>
> - thomas
>
>
>
> ----- Original Message -----
> From: "Jeff Davis" <pgsql@j-davis.com>
> To: <pgsql-bugs@postgresql.org>
> Sent: Saturday, October 28, 2006 12:38 AM
> Subject: Re: [BUGS] COPY fails on 8.1 with invalid byte sequences in text
>
>
> > On Fri, 2006-10-27 at 14:42 -0700, Jeff Davis wrote:
> >> It seems to be essentially a data corruption issue if applications
> >> insert binary data in text fields using escape sequences. Shouldn't
> >> PostgreSQL reject an invalid UTF8 sequence in any text type?
> >>
> >
> > Another note: PostgreSQL rejects invalid UTF8 sequences in other
> > contexts. For instance, if you use PQexecParams() and insert using type
> > text and any format (text or binary), it will reject invalid sequences.
> > It will of course allow anything to be sent when the type is bytea.
> >
> > Also, I thought I'd publish the workaround that I'm using.
> >
> > I created a function that seems to work for validating text data as
> > being valid UTF8.
> >
> > CREATE OR REPLACE FUNCTION valid_utf8(TEXT) returns BOOLEAN
> > LANGUAGE plperlu AS
> > $valid_utf8$
> > use utf8;
> > return utf8::decode($_[0]) ? 1 : 0;
> > $valid_utf8$;
> >
> > I just add a check constraint on all of my text attributes in all of my
> > tables. Not fun, but it works.
> >
> > Regards,
> > Jeff Davis
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 6: explain analyze is your friend
> >
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

В списке pgsql-bugs по дате отправления:

Предыдущее
От: "Thomas H."
Дата:
Сообщение: Re: COPY fails on 8.1 with invalid byte sequences in text
Следующее
От: "Henrique Lavnis // iKernelTech"
Дата:
Сообщение: Drive ODBC