Обсуждение: Slow fulltext query plan

Поиск
Список
Период
Сортировка

Slow fulltext query plan

От
Benoit Delbosc
Дата:
Hi,

I would like to understand why the following query execution don't use
any fulltext indexes
and takes more than 300s (using lot of temporary files):

   EXPLAIN ANALYZE SELECT hierarchy.id
   FROM hierarchy
   JOIN fulltext ON fulltext.id = hierarchy.id,
   TO_TSQUERY('whatever') query1,
   TO_TSQUERY('whatever') query2
   WHERE (query1 @@ nx_to_tsvector(fulltext.fulltext)) OR (query2 @@
nx_to_tsvector(fulltext.fulltext_title));

The query plan is here:
   http://explain.depesz.com/s/YgP

While if I replace the query2 by query1 in the second clause:

   EXPLAIN ANALYZE SELECT hierarchy.id
   FROM hierarchy
   JOIN fulltext ON fulltext.id = hierarchy.id,
   TO_TSQUERY('whatever') query1,
   TO_TSQUERY('whatever') query2
   WHERE (query1 @@ nx_to_tsvector(fulltext.fulltext)) OR (query1 @@
nx_to_tsvector(fulltext.fulltext_title));

It is 5 order of magniude faster (15ms) using the gin indexes:
   http://explain.depesz.com/s/RLa

The nx_to_tsvector is an immutable function with the following code:
   SELECT TO_TSVECTOR('english', SUBSTR($1, 1, 250000))

Here is the list of indexes:
   hierarchy: "hierarchy_pk" PRIMARY KEY, btree (id)
   fulltext: "fulltext_fulltext_idx" gin
(nx_to_tsvector(fulltext::character varying))
   fulltext: "fulltext_fulltext_title_idx" gin
(nx_to_tsvector(fulltext_title::character varying))

fulltext and fulltext_title are text type.

And some PostgreSQL configuration:
   PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu
   shared_buffers: 4GB
   effective_cache_size: 10GB
   work_mem: 20MB

Thanks for your work and enlightenment

ben

Re: Slow fulltext query plan

От
Tom Lane
Дата:
Benoit Delbosc <bdelbosc@nuxeo.com> writes:
>    EXPLAIN ANALYZE SELECT hierarchy.id
>    FROM hierarchy
>    JOIN fulltext ON fulltext.id = hierarchy.id,
>    TO_TSQUERY('whatever') query1,
>    TO_TSQUERY('whatever') query2
>    WHERE (query1 @@ nx_to_tsvector(fulltext.fulltext)) OR (query2 @@
> nx_to_tsvector(fulltext.fulltext_title));

Is there a reason why you're writing the query in such a
non-straightforward way, rather than just

   EXPLAIN ANALYZE SELECT hierarchy.id
   FROM hierarchy
   JOIN fulltext ON fulltext.id = hierarchy.id
   WHERE (TO_TSQUERY('whatever') @@ nx_to_tsvector(fulltext.fulltext))
      OR (TO_TSQUERY('whatever') @@ nx_to_tsvector(fulltext.fulltext_title));

?

            regards, tom lane

Re: Slow fulltext query plan

От
Benoit Delbosc
Дата:
On 13/04/2012 00:25, Tom Lane wrote:
> Benoit Delbosc<bdelbosc@nuxeo.com>  writes:
>>     EXPLAIN ANALYZE SELECT hierarchy.id
>>     FROM hierarchy
>>     JOIN fulltext ON fulltext.id = hierarchy.id,
>>     TO_TSQUERY('whatever') query1,
>>     TO_TSQUERY('whatever') query2
>>     WHERE (query1 @@ nx_to_tsvector(fulltext.fulltext)) OR (query2 @@
>> nx_to_tsvector(fulltext.fulltext_title));
> Is there a reason why you're writing the query in such a
> non-straightforward way, rather than just
>
>     EXPLAIN ANALYZE SELECT hierarchy.id
>     FROM hierarchy
>     JOIN fulltext ON fulltext.id = hierarchy.id
>     WHERE (TO_TSQUERY('whatever') @@ nx_to_tsvector(fulltext.fulltext))
>        OR (TO_TSQUERY('whatever') @@ nx_to_tsvector(fulltext.fulltext_title));
>
> ?
>
This query is written by a framework, also I thought that is a common
pattern that can be found in the documentation:

   http://www.postgresql.org/docs/9.1/interactive/textsearch-controls.html

if you think this a wrong way to do it then I will try to fix the framework.

btw your version takes 15ms :)

Thanks

ben

Re: Slow fulltext query plan

От
Tom Lane
Дата:
Benoit Delbosc <bdelbosc@nuxeo.com> writes:
> On 13/04/2012 00:25, Tom Lane wrote:
>> Is there a reason why you're writing the query in such a
>> non-straightforward way, rather than just
>>
>> EXPLAIN ANALYZE SELECT hierarchy.id
>> FROM hierarchy
>> JOIN fulltext ON fulltext.id = hierarchy.id
>> WHERE (TO_TSQUERY('whatever') @@ nx_to_tsvector(fulltext.fulltext))
>> OR (TO_TSQUERY('whatever') @@ nx_to_tsvector(fulltext.fulltext_title));

> This query is written by a framework, also I thought that is a common
> pattern that can be found in the documentation:
>    http://www.postgresql.org/docs/9.1/interactive/textsearch-controls.html

Well, "common pattern" would be stretching it.  Anyway I've concluded
that this is in fact a planner bug.  There will be a fix in 9.2, but I'm
not going to take the risk of back-patching it, so you might want to
think about changing that framework.

            regards, tom lane

Re: Slow fulltext query plan

От
Florent Guillaume
Дата:
Tom Lane wrote:
> Benoit Delbosc<bdelbosc@nuxeo.com>  writes:
>>  On 13/04/2012 00:25, Tom Lane wrote:
>>>  Is there a reason why you're writing the query in such a
>>>  non-straightforward way, rather than just
>>>
>>>  EXPLAIN ANALYZE SELECT hierarchy.id
>>>  FROM hierarchy
>>>  JOIN fulltext ON fulltext.id = hierarchy.id
>>>  WHERE (TO_TSQUERY('whatever') @@ nx_to_tsvector(fulltext.fulltext))
>>>  OR (TO_TSQUERY('whatever') @@ nx_to_tsvector(fulltext.fulltext_title));
>>
>>  This query is written by a framework, also I thought that is a common
>>  pattern that can be found in the documentation:
>>    http://www.postgresql.org/docs/9.1/interactive/textsearch-controls.html
>
> Well, "common pattern" would be stretching it.  Anyway I've concluded
> that this is in fact a planner bug.  There will be a fix in 9.2, but I'm
> not going to take the risk of back-patching it, so you might want to
> think about changing that framework.

FYI the reason why we have queries that look like what Benoit
describes is that we often use the query alias twice, once for
TO_TSVECTOR and once for TS_RANK_CD, for instance:

  SELECT hierarchy.id, TS_RANK_CD(fulltext, query1, 32) as nxscore
  FROM hierarchy
  JOIN fulltext ON fulltext.id = hierarchy.id,
  TO_TSQUERY('whatever') query1,
  TO_TSQUERY('whatever') query2
  WHERE (query1 @@ nx_to_tsvector(fulltext.fulltext)) OR (query2 @@
nx_to_tsvector(fulltext.fulltext_title))
  ORDER BY nxscore DESC;

(as is also described in the doc mentioned btw).

Florent

--
Florent Guillaume, Director of R&D, Nuxeo
Open Source, Java EE based, Enterprise Content Management (ECM)
http://www.nuxeo.com   http://www.nuxeo.org   +33 1 40 33 79 87