Re: Ugh - bad plan with LIMIT in a complex SELECT, any way to fix this?

Поиск
Список
Период
Сортировка
От Oleg Bartunov
Тема Re: Ugh - bad plan with LIMIT in a complex SELECT, any way to fix this?
Дата
Msg-id Pine.LNX.4.64.1005091331040.25553@sn.sai.msu.ru
обсуждение исходный текст
Ответ на Ugh - bad plan with LIMIT in a complex SELECT, any way to fix this?  (Karl Denninger <karl@denninger.net>)
Список pgsql-performance
Karl,

estimation for gin indexes currently is very-very bad, so I don't surprise
your result. We'll discuss on pgcon our new ginconstetimate function,
which could improve planner. Can you provide us dump of your db (with
relevant columns) and test queries, so we could check our new patch.

Oleg
On Sat, 8 May 2010, Karl Denninger wrote:

> I have a message posted in pgsql-general that outlines what I thought
> was an indexing problem - it's not, so I'm bringing it here.
>
> I dumped the table from our production system and stuffed it into a test
> machine, then started refining and playing with the query until I was
> able to get it to the "de-minimus" that showed the issue.  Note that the
> actual query is frequently MUCH more complicated, but without the LIMIT
> shown below the planner seems to do a decent job of figuring out how to
> "get it done."
>
> The actual table in question has ~2m rows totaling several gigabytes of
> space.
>
> Here's an abstract of the schema:
>
>                                      Table "public.post"
>  Column   |           Type           |
> Modifiers
> -----------+--------------------------+--------------------------------------------------------
> subject   | text                     |
> message   | text                     |
> inserted  | timestamp with time zone |
> modified  | timestamp with time zone |
> replied   | timestamp with time zone |
> ordinal   | integer                  | not null default
> nextval('post_ordinal_seq'::regclass)
>
> Indexes:
>    "post_pkey" PRIMARY KEY, btree (ordinal)
>    "idx_message" gin (to_tsvector('english'::text, message))
>    "idx_subject" gin (to_tsvector('english'::text, subject))
>
> There's a bunch of other stuff in the table and many more indices, plus
> foreign references, but stripping the table down to JUST THIS shows the
> problem.
>
> ticker=# explain analyze select * from post where to_tsvector('english',
> message) @@ to_tsquery('violence') order by modified desc;
>                                                            QUERY
> PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------
> Sort  (cost=31795.16..31819.68 rows=9808 width=436) (actual
> time=14.222..17.213 rows=3421 loops=1)
>   Sort Key: modified
>   Sort Method:  quicksort  Memory: 3358kB
>   ->  Bitmap Heap Scan on post  (cost=1418.95..31144.90 rows=9808
> width=436) (actual time=1.878..7.514 rows=3421 loops=1)
>         Recheck Cond: (to_tsvector('english'::text, message) @@
> to_tsquery('violence'::text))
>         ->  Bitmap Index Scan on idx_message  (cost=0.00..1416.49
> rows=9808 width=0) (actual time=1.334..1.334 rows=3434 loops=1)
>               Index Cond: (to_tsvector('english'::text, message) @@
> to_tsquery('violence'::text))
> Total runtime: 20.547 ms
> (8 rows)
>
> Ok, very nice.  20ms.  I like that.
>
> Now lets limit the return to 100 items:
>
> ticker=# explain analyze select * from post where to_tsvector('english',
> message) @@ to_tsquery('violence') order by modified desc limit 100;
>
> QUERY
> PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=0.00..5348.69 rows=100 width=436) (actual
> time=198.047..2607.077 rows=100 loops=1)
>   ->  Index Scan Backward using post_modified on post
> (cost=0.00..524599.31 rows=9808 width=436) (actual
> time=198.043..2606.864 rows=100 loops=1)
>         Filter: (to_tsvector('english'::text, message) @@
> to_tsquery('violence'::text))
> Total runtime: 2607.231 ms
> (4 rows)
>
> Bad.  Notice that the optimizer decided it was going to do an index scan
> with an internal filter on it!  That's BACKWARD; what I want is for the
> planner to first execute the index scan on the GIN index, then order the
> return and limit the returned data set.
>
> But it gets much worse - let's use something that's NOT in the message
> base (the table in question has some ~2m rows by the way and consumes
> several gigabytes on disk - anything that actually READS the table is
> instant "bad news!")
>
>
> ticker=# explain analyze select * from post where to_tsvector('english',
> message) @@ to_tsquery('hosehead') order by modified;
>                                                           QUERY
> PLAN
>
--------------------------------------------------------------------------------------------------------------------------------
> Sort  (cost=31795.16..31819.68 rows=9808 width=436) (actual
> time=0.407..0.407 rows=0 loops=1)
>   Sort Key: modified
>   Sort Method:  quicksort  Memory: 25kB
>   ->  Bitmap Heap Scan on post  (cost=1418.95..31144.90 rows=9808
> width=436) (actual time=0.402..0.402 rows=0 loops=1)
>         Recheck Cond: (to_tsvector('english'::text, message) @@
> to_tsquery('hosehead'::text))
>         ->  Bitmap Index Scan on idx_message  (cost=0.00..1416.49
> rows=9808 width=0) (actual time=0.399..0.399 rows=0 loops=1)
>               Index Cond: (to_tsvector('english'::text, message) @@
> to_tsquery('hosehead'::text))
> Total runtime: 0.441 ms
> (8 rows)
>
>
> Very fast, as you'd expect - it returned nothing.  Now let's try it with
> a "LIMIT":
>
> ticker=# explain analyze select * from post where to_tsvector('english',
> message) @@ to_tsquery('hosehead') order by modified limit 100;
> NOTICE:  word is too long to be indexed
> DETAIL:  Words longer than 2047 characters are ignored.
>                                                                  QUERY
> PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=0.00..5348.69 rows=100 width=436) (actual
> time=254217.850..254217.850 rows=0 loops=1)
>   ->  Index Scan using post_modified on post  (cost=0.00..524599.31
> rows=9808 width=436) (actual time=254217.847..254217.847 rows=0 loops=1)
>         Filter: (to_tsvector('english'::text, message) @@
> to_tsquery('hosehead'::text))
> Total runtime: 254217.891 ms
> (4 rows)
>
> ticker=#
>
> Oh crap.  It actually went through and looked at the entire freaking
> table - one message at a time.
>
> An attempt to re-write the query into something that FORCES the planner
> to do the right thing fails too.  For example:
>
> ticker=# explain analyze select * from post where ordinal in (select
> ordinal from post where to_tsvector('english', message) @@
> to_tsquery('hosehead')) order by modified;
>                                                                 QUERY
> PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------
> Sort  (cost=94886.44..94910.96 rows=9808 width=436) (actual
> time=0.884..0.884 rows=0 loops=1)
>   Sort Key: public.post.modified
>   Sort Method:  quicksort  Memory: 25kB
>   ->  Nested Loop  (cost=31173.42..94236.19 rows=9808 width=436)
> (actual time=0.879..0.879 rows=0 loops=1)
>         ->  HashAggregate  (cost=31173.42..31271.50 rows=9808 width=4)
> (actual time=0.877..0.877 rows=0 loops=1)
>               ->  Bitmap Heap Scan on post  (cost=1422.95..31148.90
> rows=9808 width=4) (actual time=0.850..0.850 rows=0 loops=1)
>                     Recheck Cond: (to_tsvector('english'::text,
> message) @@ to_tsquery('hosehead'::text))
>                     ->  Bitmap Index Scan on idx_message
> (cost=0.00..1420.50 rows=9808 width=0) (actual time=0.848..0.848 rows=0
> loops=1)
>                           Index Cond: (to_tsvector('english'::text,
> message) @@ to_tsquery('hosehead'::text))
>         ->  Index Scan using post_ordinal on post  (cost=0.00..6.41
> rows=1 width=436) (never executed)
>               Index Cond: (public.post.ordinal = public.post.ordinal)
> Total runtime: 0.985 ms
> (12 rows)
>
> Fast, if convoluted.
>
>
> ticker=# explain analyze select * from post where ordinal in (select
> ordinal from post where to_tsvector('english', message) @@
> to_tsquery('hosehead')) order by modified limit 100;
> NOTICE:  word is too long to be indexed
> DETAIL:  Words longer than 2047 characters are ignored.
>
> QUERY
> PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=0.00..19892.88 rows=100 width=436) (actual
> time=270563.091..270563.091 rows=0 loops=1)
>   ->  Nested Loop Semi Join  (cost=0.00..1951093.77 rows=9808
> width=436) (actual time=270563.088..270563.088 rows=0 loops=1)
>         ->  Index Scan using post_modified on post
> (cost=0.00..509887.63 rows=1961557 width=436) (actual
> time=0.015..3427.627 rows=1953674 loops=1)
>         ->  Index Scan using post_ordinal on post  (cost=0.00..0.73
> rows=1 width=4) (actual time=0.134..0.134 rows=0 loops=1953674)
>               Index Cond: (public.post.ordinal = public.post.ordinal)
>               Filter: (to_tsvector('english'::text,
> public.post.message) @@ to_tsquery('hosehead'::text))
> Total runtime: 270563.147 ms
> (7 rows)
>
> ticker=#
>
> Ok, that didn't work either.
>
> Interestingly enough, if I crank up the limit to 500, it starts behaving!
>
> ticker=# explain analyze select * from post where ordinal in (select
> ordinal from post where to_tsvector('english', message) @@
> to_tsquery('hosehead')) order by modified limit 500;
>
> QUERY
> PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------------
> Limit  (cost=94724.91..94726.16 rows=500 width=436) (actual
> time=1.475..1.475 rows=0 loops=1)
>   ->  Sort  (cost=94724.91..94749.43 rows=9808 width=436) (actual
> time=1.473..1.473 rows=0 loops=1)
>         Sort Key: public.post.modified
>         Sort Method:  quicksort  Memory: 25kB
>         ->  Nested Loop  (cost=31173.43..94236.19 rows=9808 width=436)
> (actual time=1.468..1.468 rows=0 loops=1)
>               ->  HashAggregate  (cost=31173.43..31271.51 rows=9808
> width=4) (actual time=1.466..1.466 rows=0 loops=1)
>                     ->  Bitmap Heap Scan on post
> (cost=1422.95..31148.91 rows=9808 width=4) (actual time=1.440..1.440
> rows=0 loops=1)
>                           Recheck Cond: (to_tsvector('english'::text,
> message) @@ to_tsquery('hosehead'::text))
>                           ->  Bitmap Index Scan on idx_message
> (cost=0.00..1420.50 rows=9808 width=0) (actual time=1.438..1.438 rows=0
> loops=1)
>                                 Index Cond:
> (to_tsvector('english'::text, message) @@ to_tsquery('hosehead'::text))
>               ->  Index Scan using post_ordinal on post
> (cost=0.00..6.41 rows=1 width=436) (never executed)
>                     Index Cond: (public.post.ordinal = public.post.ordinal)
> Total runtime: 1.600 ms
> (13 rows)
>
> Why is the planner "taking into consideration" the LIMIT (I know the
> docs say it does) and choosing to sequentially scan a table of nearly 2
> million rows?!  I don't see how that makes sense.... irrespective of the
> query being LIMITed.
>
> If it matters setting enable_seqscan OFF does not impact the results.
>
> -- Karl
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

В списке pgsql-performance по дате отправления:

Предыдущее
От: Dimitri
Дата:
Сообщение: Re: 8K recordsize bad on ZFS?
Следующее
От: Greg Smith
Дата:
Сообщение: Re: Dell Perc HX00 RAID controllers: What's inside?