Re: Interesting query plan change linked to the LIMIT parameter

Поиск
Список
Период
Сортировка
От Yannick Le Guédart
Тема Re: Interesting query plan change linked to the LIMIT parameter
Дата
Msg-id 4f5470ad0901202337m7f080bcdw555ed871d20070f@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Interesting query plan change linked to the LIMIT parameter  ("David Wilson" <david.t.wilson@gmail.com>)
Список pgsql-performance
Thanks for the rapid response.

I can understand the way the planner makes its guess, but as a matter of fact, he'll be nearly always wrong, just becausethe most commented articles have only around 5000 or so comments.  I ran the explain  analyze tonight and got this results :

EXPLAIN ANALYZE SELECT _comment.id,
        (get_comment_response(_comment.id)).id AS r_id
FROM   _comment
INNER JOIN _article
        ON _article.id = _comment.parent_id
WHERE  _comment.parent_id = '17355952'
ORDER BY _comment.id ASC
OFFSET 0
LIMIT 1;
                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..3588.42 rows=1 width=8) (actual time=498597.115..498597.116 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..28524312.40 rows=7949 width=8) (actual time=498597.114..498597.114 rows=1 loops=1)
         ->  Index Scan using _comment_pkey on _comment  (cost=0.00..28448324.73 rows=7949 width=16) (actual time=498473.360..498473.360 rows=1 loops=1)
               Filter: (parent_id = 17355952::bigint)
         ->  Index Scan using _article_pkey on _article  (cost=0.00..9.55 rows=1 width=8) (actual time=63.465..63.465 rows=1 loops=1)
               Index Cond: (_article.id = 17355952::bigint)
 Total runtime: 498615.230 ms
(7 rows)

EXPLAIN ANALYZE SELECT _comment.id,
        (get_comment_response(_comment.id)).id AS r_id
FROM   _comment
INNER JOIN _article
        ON _article.id = _comment.parent_id
WHERE  _comment.parent_id = '17355952'
ORDER BY _comment.id ASC
OFFSET 0
LIMIT 1000;
                                                                         QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=10261.19..10263.69 rows=1000 width=8) (actual time=127.037..127.267 rows=1000 loops=1)
   ->  Sort  (cost=10261.19..10281.06 rows=7949 width=8) (actual time=127.036..127.128 rows=1000 loops=1)
         Sort Key: _comment.id
         Sort Method:  top-N heapsort  Memory: 95kB
         ->  Nested Loop  (cost=0.00..9825.35 rows=7949 width=8) (actual time=0.472..122.986 rows=4674 loops=1)
               ->  Index Scan using _article_pkey on _article  (cost=0.00..9.55 rows=1 width=8) (actual time=0.011..0.013 rows=1 loops=1)
                     Index Cond: (id = 17355952::bigint)
               ->  Index Scan using idx_comment_parent_id on _comment  (cost=0.00..9716.44 rows=7949 width=16) (actual time=0.235..32.869 rows=4674 loops=1)
                     Index Cond: (_comment.parent_id = 17355952::bigint)
 Total runtime: 127.410 ms
(10 rows)

As you can see, the time is dramaticaly longuer with the LIMIT 1 (or in our case, LIMIT 2).

Yannick.

2009/1/20 David Wilson <david.t.wilson@gmail.com>
On Tue, Jan 20, 2009 at 10:45 AM, Yannick Le Guédart
<yannick@over-blog.com> wrote:

>
> The second query scans the whole comment table which is very dangerous for
> production servers.

That's not quite true. The second does an index scan- the planner
seems to be guessing that it'll fulfill the required limit early in
the index scan; only with a pathologically bad case would it actually
have to scan the entire thing. Basically, the second query is
optimized to spit out the first few rows quickly, since that's all you
asked for with the limit.

Note that your first query has a final cost estimate of "Limit
(cost=10261.19..10263.69 rows=1000 width=8)", indicating an estimated
10261.19 to emit the first row; the second has "Limit
(cost=0.00..3588.42 rows=1 width=8)" estimating 0.00 (basically,
instant) to emit the first - and only desired - row.

That all said, an explain analyze would give us a better idea of
what's going on- we can't tell if the planner is making bad estimates
without the knowledge of what the real timing and row count results of
plan stages were.


--
- David T. Wilson
david.t.wilson@gmail.com

В списке pgsql-performance по дате отправления:

Предыдущее
От: "David Wilson"
Дата:
Сообщение: Re: Interesting query plan change linked to the LIMIT parameter
Следующее
От: Dave Youatt
Дата:
Сообщение: linux, memory (mis)accounting/reporting, and the planner/optimizer