Re: Ugh - bad plan with LIMIT in a complex SELECT, any way to fix this?

Поиск
Список
Период
Сортировка
От Jesper Krogh
Тема Re: Ugh - bad plan with LIMIT in a complex SELECT, any way to fix this?
Дата
Msg-id 4BE65BFD.5080100@krogh.cc
обсуждение исходный текст
Ответ на Ugh - bad plan with LIMIT in a complex SELECT, any way to fix this?  (Karl Denninger <karl@denninger.net>)
Список pgsql-performance
Overal comment.. Try reading hrough these old threads most of your
problem is the same issue:

http://article.gmane.org/gmane.comp.db.postgresql.performance/22395/match=gin
http://thread.gmane.org/gmane.comp.db.postgresql.performance/22331/focus=22434


>                                        Table "public.post"
>    Column   |           Type           |
> Modifiers
> -----------+--------------------------+--------------------------------------------------------
> subject   | text                     |
>   message   | text                     |
>   inserted  | timestamp with time zone |
>   modified  | timestamp with time zone |
>   replied   | timestamp with time zone |
>   ordinal   | integer                  | not null default
> nextval('post_ordinal_seq'::regclass)
>
> Indexes:
>      "post_pkey" PRIMARY KEY, btree (ordinal)
>      "idx_message" gin (to_tsvector('english'::text, message))
>      "idx_subject" gin (to_tsvector('english'::text, subject))
>
> There's a bunch of other stuff in the table and many more indices, plus
> foreign references, but stripping the table down to JUST THIS shows the
> problem.
>
> ticker=# explain analyze select * from post where to_tsvector('english',
> message) @@ to_tsquery('violence') order by modified desc;
>                                                              QUERY
> PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------
>   Sort  (cost=31795.16..31819.68 rows=9808 width=436) (actual
> time=14.222..17.213 rows=3421 loops=1)
>     Sort Key: modified
>     Sort Method:  quicksort  Memory: 3358kB
>     ->   Bitmap Heap Scan on post  (cost=1418.95..31144.90 rows=9808
> width=436) (actual time=1.878..7.514 rows=3421 loops=1)
>           Recheck Cond: (to_tsvector('english'::text, message) @@
> to_tsquery('violence'::text))
>           ->   Bitmap Index Scan on idx_message  (cost=0.00..1416.49
> rows=9808 width=0) (actual time=1.334..1.334 rows=3434 loops=1)
>                 Index Cond: (to_tsvector('english'::text, message) @@
> to_tsquery('violence'::text))
>   Total runtime: 20.547 ms
> (8 rows)
>
> Ok, very nice.  20ms.  I like that.
>
> Now lets limit the return to 100 items:
>
> ticker=# explain analyze select * from post where to_tsvector('english',
> message) @@ to_tsquery('violence') order by modified desc limit 100;
>
> QUERY
> PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------
>   Limit  (cost=0.00..5348.69 rows=100 width=436) (actual
> time=198.047..2607.077 rows=100 loops=1)
>     ->   Index Scan Backward using post_modified on post
> (cost=0.00..524599.31 rows=9808 width=436) (actual
> time=198.043..2606.864 rows=100 loops=1)
>           Filter: (to_tsvector('english'::text, message) @@
> to_tsquery('violence'::text))
>   Total runtime: 2607.231 ms
> (4 rows)
>
> Bad.  Notice that the optimizer decided it was going to do an index scan
> with an internal filter on it!  That's BACKWARD; what I want is for the
> planner to first execute the index scan on the GIN index, then order the
> return and limit the returned data set.
>
> But it gets much worse - let's use something that's NOT in the message
> base (the table in question has some ~2m rows by the way and consumes
> several gigabytes on disk - anything that actually READS the table is
> instant "bad news!")
>

The one problem is that the query-planner doesn't have any
specific knowlege about the cost of the gin-index search. Thats
mentioned in one of the above threads.

The other problem is that the cost of "to_tsvector" and "ts_match_vq"
are set way to conservative in the default installation. Bumping those
up will increase your amount of correct plans, but it doesnt solve all
of it
since the above problem is also interferring. But try upping the cost
of those two functions significantly.

alter function ts_match_vq(tsvector,tsquery) cost 500
(upping the cost times 500 for that one). I've I've got it right it is "more in
the correct ballpark" it more or less translates to "how much more expensive the function
is compared to really simple operators").

Another thing you can do, that favours the running time of the queries
using to_tsvector() is to specifically store the tsvector in the table and
create an index on that. That will at run-time translate into fewer
calls (0 to be precise) of to_tsvector and only costing the ts_match_vq
at run-time.

> Why is the planner "taking into consideration" the LIMIT (I know the
> docs say it does) and choosing to sequentially scan a table of nearly 2
> million rows?!  I don't see how that makes sense.... irrespective of the
> query being LIMITed.
>
> If it matters setting enable_seqscan OFF does not impact the results.
>

No, because you end up in index-scans on non-gin indexes in that
situtaion.. so turning seqscan off has no effect.


--
Jesper

>
>
>


В списке pgsql-performance по дате отправления:

Предыдущее
От: Karl Denninger
Дата:
Сообщение: Ugh - bad plan with LIMIT in a complex SELECT, any way to fix this?
Следующее
От: Dimitri
Дата:
Сообщение: Re: 8K recordsize bad on ZFS?