Re: FTS performance issue - planner problem identified (but only partially resolved)

Поиск
Список
Период
Сортировка
От Stefan Keller
Тема Re: FTS performance issue - planner problem identified (but only partially resolved)
Дата
Msg-id CAFcOn29yXTb+wsgh0LpurCuH_rJWe0sVWLVd_xLC5L3Gtn1ibg@mail.gmail.com
обсуждение исходный текст
Ответ на FTS performance issue - planner problem identified (but only partially resolved)  (Stefan Keller <sfkeller@gmail.com>)
Список pgsql-performance
Hi

Sorry, referring to GIST index in my mail before was no good idea.

The bottom line still is, that the query (as recommended by the docs)
and the planner don't choose the index which makes it slow - unless
the original query...

> select id,title,left(content,100)
> from fulltextsearch
> where plainto_tsquery('pg_catalog.english','good') @@
> to_tsvector('pg_catalog.english',content);

is reformulated by this

> select id,title,left(content,100)
> from fulltextsearch, plainto_tsquery('pg_catalog.english','good') query
> where query @@
> to_tsvector('pg_catalog.english',content);

... using default values for enable_seqscan and set random_page_cost.

Yours, S.


2013/7/19 Stefan Keller <sfkeller@gmail.com>:
> Hi
>
> At 2013/2/8 I wrote:
>> I have problems with the performance of FTS in a query like this:
>>
>> SELECT * FROM FullTextSearch WHERE content_tsv_gin @@
>> plainto_tsquery('english', 'good');
>>
>> It's slow (> 30 sec.) for some GB (27886 html files, originally 73 MB zipped).
>> The planner obviously always chooses table scan
>
> Now, I've identified (but only partially resolved) the issue: Here are
> my comments:
>
> Thats the query in question (see commented log below):
>
> select id,title,left(content,100)
> from fulltextsearch
> where plainto_tsquery('pg_catalog.english','good') @@
> to_tsvector('pg_catalog.english',content);
>
> After having created the GIN index, the FTS query unexpectedly is fast
> because planner chooses "Bitmap Index Scan". After the index
> statistics have been updated, the same query becomes slow. Only when
> using the "trick" with the function in the WHERE clause. I think GIST
> does'nt change anything.
>
> select id,title,left(content,100)
> from fulltextsearch, plainto_tsquery('pg_catalog.english','good') query
> where query @@ to_tsvector('pg_catalog.english',content);
>
> => This hint should mentioned in the docs!
>
> Then, setting enable_seqscan to off makes original query fast again.
> But that's a setting I want to avoid in a multi-user database.
> Finally, setting random_page_cost to 1 helps also - but I don't like
> this setting neither.
>
> => To me the planner should be updated to recognize immutable
> plainto_tsquery() function in the WHERE clause and choose "Bitmap
> Index Scan" at the first place.
>
> What do you think?
>
> Yours, Stefan
>
>
> ----
> Lets look at table fulltextsearch:
>
> movies=# \d fulltextsearch
>                           Table "public.fulltextsearch"
>  Column  |  Type   |                          Modifiers
> ---------+---------+-------------------------------------------------------------
>  id      | integer | not null default nextval('fulltextsearch_id_seq'::regclass)
>  docid   | integer | default 0
>  title   | text    |
>  content | text    | not null
>
> movies=# CREATE INDEX fulltextsearch_gincontent ON fulltextsearch
> USING gin(to_tsvector('pg_catalog.english',content));
>
> movies=# SELECT * FROM pg_class c WHERE relname LIKE 'fullt%';
>   oid   |           name            | kind |   tuples    | pages |
> allvisible | toastrelid | hasindex
> --------+---------------------------+------+-------------+-------+------------+------------+----------
>  476289 | fulltextsearch            | r    |       27886 |   555 |
>      0 |     476293 | t
>  503080 | fulltextsearch_gincontent | i    | 8.97135e+06 | 11133 |
>      0 |          0 | f
>  476296 | fulltextsearch_id_seq     | S    |           1 |     1 |
>      0 |          0 | f
>  503075 | fulltextsearch_pkey       | i    |       27886 |    79 |
>      0 |          0 | f
> (4 rows)
>
> => fulltextsearch_gincontent has an arbitrary large number of tuples
> (statistics is wrong and not yet updated)
>
> movies=#
> explain (analyze,costs,timing,buffers)
> select id,title,left(content,100)
> from fulltextsearch
> where plainto_tsquery('pg_catalog.english','good') @@
> to_tsvector('pg_catalog.english',content);
> => Unexpectedly, the query is fast!
> See query plan http://explain.depesz.com/s/ewn
>
> Let's update the statistics:
>
> movies=# VACUUM ANALYZE VERBOSE fulltextsearch ;
>
> SELECT * FROM pg_class c WHERE relname LIKE 'fullt%';
>   oid   |           name            | kind | tuples | pages |
> allvisible | toastrelid | hasindex
> --------+---------------------------+------+--------+-------+------------+------------+----------
>  476289 | fulltextsearch            | r    |  27886 |   555 |
> 555 |     476293 | t
>  503080 | fulltextsearch_gincontent | i    |  27886 | 11133 |
> 0 |          0 | f
>  476296 | fulltextsearch_id_seq     | S    |      1 |     1 |
> 0 |          0 | f
>  503075 | fulltextsearch_pkey       | i    |  27886 |    79 |
> 0 |          0 | f
> (4 rows)
>
> => Now after having update statistics (see especially tuples of
> fulltextsearch_gincontent ) the original query is slow!
> See query plan http://explain.depesz.com/s/MQ60
>
> Now, let's reformulate the original query and move the function call
> to plainto_tsquery to the FROM clause:
>
> movies=# explain (analyze,costs,timing,buffers)
> select id,title,left(content,100)
> from fulltextsearch, plainto_tsquery('pg_catalog.english','good') query
> where query @@ to_tsvector('pg_catalog.english',content);
> => This special query is fast again!  See query plan
> http://explain.depesz.com/s/FVT
>
> Setting enable_seqscan to off makes query fast again: See query plan
> http://explain.depesz.com/s/eOr
>
> Finally, setting random_page_cost to 1 helps also (default is 4):
>
> movies=# set enable_seqscan to default;
> movies=# set random_page_cost to 1.0;
> => Query is fast. See query plan http://explain.depesz.com/s/M5Ke
>
> ----


В списке pgsql-performance по дате отправления:

Предыдущее
От: Stefan Keller
Дата:
Сообщение: Re: How to properly index hstore tags column to faster search for keys
Следующее
От: Skarsol
Дата:
Сообщение: Fwd: Relatively high planner overhead on partitions?