Обсуждение: BUG #5219: Segfault in to_tsvector

Поиск
Список
Период
Сортировка

BUG #5219: Segfault in to_tsvector

От
"Kenaniah Cerny"
Дата:
The following bug has been logged online:

Bug reference:      5219
Logged by:          Kenaniah Cerny
Email address:      kenaniah@gmail.com
PostgreSQL version: 8.4.1
Operating system:   Centos5.2 -- Linux 2.6.18-92.1.10.el5 #1 SMP i686 athlon
i386 GNU/Linux
Description:        Segfault in to_tsvector
Details:

Full backtrace: http://pgsql.privatepaste.com/5411abf8f3

The issue takes place running this query:
http://pgsql.privatepaste.com/35064cbba8

Crash is attributed to this index definition:
CREATE INDEX "anime_titles_idx_name_simple_text" ON "public"."anime_titles"
  USING gin ((to_tsvector('simple'::regconfig, name)));

I believe the issue is caused by possibly non-UTF-8 data. Both the server
and the client (a PHP script using PDO's pgsql driver) are using UTF-8. The
string causing this issue is stored in the database in a text field and
looks like this:
http://s801.photobucket.com/albums/yy299/kenaniah972/?action=view¤t=is
sue.png

After output into an HTML input field and resubmission through firefox, the
string that is passed through to the DB looks like this:
http://s801.photobucket.com/albums/yy299/kenaniah972/?action=view¤t=su
bmitted.png

(The &# characters were manually omitted in submission)

I don't profess to know anything about encodings, but I don't think this is
valid UTF-8 input. I might be wrong. All I do know is that this causes the
to_tsvector part of the gin index to throw a segfault in the insert
statement, rather than returning an invalid UTF-8 input error or just plain
working.

Re: BUG #5219: Segfault in to_tsvector

От
Tom Lane
Дата:
"Kenaniah Cerny" <kenaniah@gmail.com> writes:
> Description:        Segfault in to_tsvector
> Full backtrace: http://pgsql.privatepaste.com/5411abf8f3

This looks like the known problem that ts_stat fails on an empty
tsvector.  Can you try this patch
http://archives.postgresql.org/pgsql-committers/2009-10/msg00056.php
or just pick up 8.4 branch tip from CVS?

If that does fix it, I don't think this is an encoding problem,
but rather that the name doesn't contain anything that is recognized
as a word by the textsearch configuration you're using.

            regards, tom lane

Re: BUG #5219: Segfault in to_tsvector

От
Kenaniah Cerny
Дата:
Thanks,

The patch took some massaging, but took care of the issue when applied to
the 8.4.1 source.

Kenaniah Cerny

On Sat, Nov 28, 2009 at 7:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> "Kenaniah Cerny" <kenaniah@gmail.com> writes:
> > Description:        Segfault in to_tsvector
> > Full backtrace: http://pgsql.privatepaste.com/5411abf8f3
>
> This looks like the known problem that ts_stat fails on an empty
> tsvector.  Can you try this patch
> http://archives.postgresql.org/pgsql-committers/2009-10/msg00056.php
> or just pick up 8.4 branch tip from CVS?
>
> If that does fix it, I don't think this is an encoding problem,
> but rather that the name doesn't contain anything that is recognized
> as a word by the textsearch configuration you're using.
>
>                        regards, tom lane
>