512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?

Поиск
Список
Период
Сортировка
От Bryce Nesbitt
Тема 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?
Дата
Msg-id 4B723042.7090406@obviously.com
обсуждение исходный текст
Ответы Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?  (Bryce Nesbitt <bryce2@obviously.com>)
Re: 512,600ms query becomes 7500ms... but why? Postgres 8.3 query planner quirk?  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Список pgsql-performance
I've got a very slow query, which I can make faster by doing something
seemingly trivial.
The query has been trouble for years (always slow, sometimes taking hours):

 512,600ms Original, filter on articles.indexed (622 results)
   7,500ms Remove "AND articles.indexed" (726 results, undesirable).
   7,675ms Extra join for "AND articles.indexed" (622 results, same as
original).

Hardware is Postgres 8.3 on a Sunfire X4240 under Debian Lenny, with a
fresh ANALYZE. What I don't understand is why the improvement? Is the
second way of doing things actually superior, or is this just a query
planner edge case?


Original (512,600ms)
---------------------
EXPLAIN ANALYZE
SELECT contexts.context_key FROM contexts
JOIN articles ON (articles.context_key=contexts.context_key)
JOIN matview_82034 ON (contexts.context_key=matview_82034.context_key)
WHERE contexts.context_key IN
  (SELECT context_key FROM article_words JOIN words using (word_key)
WHERE word = 'insider'
   INTERSECT
   SELECT context_key FROM article_words JOIN words using (word_key)
WHERE word = 'trading')
AND contexts.context_key IN
  (SELECT a.context_key FROM virtual_ancestors a JOIN bp_categories ON
(a.ancestor_key = bp_categories.context_key)
   WHERE lower(bp_categories.category) = 'law')
AND articles.indexed;

Extra join (7,675ms)
---------------------------------------------
EXPLAIN ANALYZE
SELECT contexts.context_key FROM contexts JOIN articles using (context_key)
WHERE contexts.context_key IN
(
SELECT contexts.context_key FROM contexts
JOIN matview_82034 ON (contexts.context_key=matview_82034.context_key)
WHERE contexts.context_key IN
    (SELECT context_key FROM article_words JOIN words using (word_key)
WHERE word = 'insider'
     INTERSECT
     SELECT context_key FROM article_words JOIN words using (word_key)
WHERE word = 'trading')
AND contexts.context_key IN
    (SELECT a.context_key FROM virtual_ancestors a JOIN bp_categories ON
(a.ancestor_key = bp_categories.context_key)
     WHERE lower(bp_categories.category) = 'law')
)
AND articles.indexed;



# select indexed,count(*) from articles group by indexed;
 indexed | count
---------+--------
 t       | 354605
 f       | 513552


QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=131663.39..140005.40 rows=4769 width=4) (actual
time=511893.241..512599.348 rows=622 loops=1)
   Hash Cond: (matview_82034.context_key = articles.context_key)
   ->  Seq Scan on matview_82034  (cost=0.00..6596.00 rows=465600
width=4) (actual time=0.019..463.278 rows=438220 loops=1)
   ->  Hash  (cost=131663.38..131663.38 rows=1 width=16) (actual
time=511659.671..511659.671 rows=622 loops=1)
         ->  Nested Loop IN Join  (cost=46757.70..131663.38 rows=1
width=16) (actual time=1142.789..511656.211 rows=622 loops=1)
               Join Filter: (a.context_key = articles.context_key)
               ->  Nested Loop  (cost=46757.70..46789.06 rows=2
width=12) (actual time=688.057..839.297 rows=1472 loops=1)
                     ->  Nested Loop  (cost=46757.70..46780.26 rows=2
width=8) (actual time=688.022..799.945 rows=1472 loops=1)
                           ->  Subquery Scan "IN_subquery"
(cost=46757.70..46757.97 rows=5 width=4) (actual time=687.963..743.587
rows=1652 loops=1)
                                 ->  SetOp Intersect
(cost=46757.70..46757.93 rows=5 width=4) (actual time=687.961..738.955
rows=1652 loops=1)
                                       ->  Sort
(cost=46757.70..46757.81 rows=46 width=4) (actual time=687.943..709.972
rows=19527 loops=1)
                                             Sort Key: "*SELECT*
1".context_key
                                             Sort Method:  quicksort
Memory: 1684kB
                                             ->  Append
(cost=0.00..46756.43 rows=46 width=4) (actual time=8.385..657.839
rows=19527 loops=1)
                                                   ->  Subquery Scan
"*SELECT* 1"  (cost=0.00..23378.21 rows=23 width=4) (actual
time=8.383..215.613 rows=4002 loops=1)
                                                         ->  Nested
Loop  (cost=0.00..23377.98 rows=23 width=4) (actual time=8.380..207.499
rows=4002 loops=1)
                                                               ->  Index
Scan using words_word on words  (cost=0.00..5.47 rows=1 width=4) (actual
time=0.102..0.105 rows=1 loops=1)

Index Cond: ((word)::text = 'insider'::text)
                                                               ->  Index
Scan using article_words_wc on article_words  (cost=0.00..23219.17
rows=12268 width=8) (actual time=8.272..199.224 rows=4002 loops=1)

Index Cond: (public.article_words.word_key = public.words.word_key)
                                                   ->  Subquery Scan
"*SELECT* 2"  (cost=0.00..23378.21 rows=23 width=4) (actual
time=5.397..404.164 rows=15525 loops=1)
                                                         ->  Nested
Loop  (cost=0.00..23377.98 rows=23 width=4) (actual time=5.394..372.883
rows=15525 loops=1)
                                                               ->  Index
Scan using words_word on words  (cost=0.00..5.47 rows=1 width=4) (actual
time=0.054..0.056 rows=1 loops=1)

Index Cond: ((word)::text = 'trading'::text)
                                                               ->  Index
Scan using article_words_wc on article_words  (cost=0.00..23219.17
rows=12268 width=8) (actual time=5.331..341.535 rows=15525 loops=1)

Index Cond: (public.article_words.word_key = public.words.word_key)
                           ->  Index Scan using article_key_idx on
articles  (cost=0.00..4.44 rows=1 width=4) (actual time=0.026..0.029
rows=1 loops=1652)
                                 Index Cond: (articles.context_key =
"IN_subquery".context_key)
                                 Filter: articles.indexed
                     ->  Index Scan using contexts_pkey on contexts
(cost=0.00..4.39 rows=1 width=4) (actual time=0.018..0.021 rows=1
loops=1472)
                           Index Cond: (contexts.context_key =
articles.context_key)
               ->  Nested Loop  (cost=0.00..111539.51 rows=1261757
width=4) (actual time=0.019..306.679 rows=39189 loops=1472)
                     ->  Seq Scan on bp_categories  (cost=0.00..1315.59
rows=16613 width=4) (actual time=0.008..57.960 rows=14529 loops=1472)
                           Filter: (lower(category) = 'law'::text)
                     ->  Index Scan using virtual_ancestor_key_idx on
virtual_ancestors a  (cost=0.00..5.18 rows=116 width=8) (actual
time=0.005..0.010 rows=3 loops=21386112)
                           Index Cond: (a.ancestor_key =
bp_categories.context_key)
 Total runtime: 512600.354 ms
(37 rows)


В списке pgsql-performance по дате отправления:

Предыдущее
От: Jayadevan M
Дата:
Сообщение: PostgreSQL - case studies
Следующее
От: Amitabh Kant
Дата:
Сообщение: Re: PostgreSQL - case studies