Query plan, nested EXISTS

Поиск
Список
Период
Сортировка
От Matt Daw
Тема Query plan, nested EXISTS
Дата
Msg-id CAA2LLOFzQDqnpvxxf1EAUzKXrJx5VUN3P2jgon3zTNzdtpGZQQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: Query plan, nested EXISTS
Список pgsql-performance
Howdy, I've been debugging a client's slow query today and I'm curious about the query plan. It's picking a plan that hashes lots of rows from the versions table (on v9.0.10)...

EXPLAIN ANALYZE
SELECT COUNT(*) FROM notes a WHERE
a.project_id = 114 AND
EXISTS (
    SELECT 1 FROM note_links b
    WHERE
    b.note_id = a.id AND
    b.entity_type = 'Version' AND
    EXISTS (
        SELECT 1 FROM versions c
        WHERE 
        c.id = b.entity_id AND
        c.code ILIKE '%comp%' AND
        c.retirement_date IS NULL
    ) AND
    b.retirement_date IS NULL
)

                                                                           QUERY PLAN                                                                            
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=833177.30..833177.31 rows=1 width=0) (actual time=10806.416..10806.416 rows=1 loops=1)
   ->  Hash Semi Join  (cost=747004.15..833154.86 rows=8977 width=0) (actual time=10709.343..10806.344 rows=894 loops=1)
         Hash Cond: (a.id = b.note_id)
         ->  Index Scan using notes_retirement_date_project on notes a  (cost=0.00..66725.10 rows=12469 width=4) (actual time=12.213..71.199 rows=12469 loops=1)
               Index Cond: (project_id = 114)
         ->  Hash  (cost=723749.35..723749.35 rows=1417424 width=4) (actual time=10696.192..10696.192 rows=227261 loops=1)
               Buckets: 65536  Batches: 4  Memory Usage: 2016kB
               ->  Hash Semi Join  (cost=620007.75..723749.35 rows=1417424 width=4) (actual time=8953.460..10645.714 rows=227261 loops=1)
                     Hash Cond: (b.entity_id = c.id)
                     ->  Seq Scan on note_links b  (cost=0.00..71849.56 rows=1417424 width=8) (actual time=0.075..628.183 rows=1509795 loops=1)
                           Filter: ((retirement_date IS NULL) AND ((entity_type)::text = 'Version'::text))
                     ->  Hash  (cost=616863.62..616863.62 rows=251530 width=4) (actual time=8953.327..8953.327 rows=300115 loops=1)
                           Buckets: 32768  Batches: 1  Memory Usage: 10551kB
                           ->  Seq Scan on versions c  (cost=0.00..616863.62 rows=251530 width=4) (actual time=176.590..8873.588 rows=300115 loops=1)
                                 Filter: ((retirement_date IS NULL) AND ((code)::text ~~* '%comp%'::text))
 Total runtime: 10810.479 ms
(16 rows)

However, I can trick it into a better plan by adding LIMIT 1 into the inner EXISTS:

EXPLAIN ANALYZE
SELECT COUNT(*) FROM notes a WHERE
a.project_id = 114 AND
EXISTS (
    SELECT 1 FROM note_links b
    WHERE
    b.note_id = a.id AND
    b.entity_type = 'Version' AND
    EXISTS (
        SELECT 1 FROM versions c
        WHERE 
        c.id = b.entity_id AND
        c.code ILIKE '%comp%' AND
        c.retirement_date IS NULL
        LIMIT 1
    ) AND
    b.retirement_date IS NULL
)

                                                                           QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=372820.37..372820.38 rows=1 width=0) (actual time=139.430..139.430 rows=1 loops=1)
   ->  Nested Loop Semi Join  (cost=0.00..372809.15 rows=4488 width=0) (actual time=9.735..139.333 rows=894 loops=1)
         ->  Index Scan using notes_retirement_date_project on notes a  (cost=0.00..66725.10 rows=12469 width=4) (actual time=9.699..67.263 rows=12469 loops=1)
               Index Cond: (project_id = 114)
         ->  Index Scan using note_links_note on note_links b  (cost=0.00..24.54 rows=1 width=4) (actual time=0.006..0.006 rows=0 loops=12469)
               Index Cond: (b.note_id = a.id)
               Filter: ((b.retirement_date IS NULL) AND ((b.entity_type)::text = 'Version'::text) AND (SubPlan 1))
               SubPlan 1
                 ->  Limit  (cost=0.00..9.04 rows=1 width=0) (actual time=0.003..0.003 rows=0 loops=11794)
                       ->  Index Scan using versions_pkey on versions c  (cost=0.00..9.04 rows=1 width=0) (actual time=0.003..0.003 rows=0 loops=11794)
                             Index Cond: (id = $0)
                             Filter: ((retirement_date IS NULL) AND ((code)::text ~~* '%comp%'::text))
 Total runtime: 139.465 ms
(13 rows)


Unfortunately, a couple other queries I tested got slower by adding the LIMIT so I don't think that's going to be a good workaround. It doesn't appear to be related to ILIKE, because I tried a straight equals against another un-indexed column of versions and still get a slow plan (and adding the LIMIT to this one made it fast too):

EXPLAIN ANALYZE
SELECT COUNT(*) FROM notes a WHERE
a.project_id = 114 AND
EXISTS (
    SELECT 1 FROM note_links b
    WHERE
    b.note_id = a.id AND
    b.entity_type = 'Version' AND
    EXISTS (
        SELECT 1 FROM versions c
        WHERE 
        c.id = b.entity_id AND
        c.sg_status_list = 'ip' AND
        c.retirement_date IS NULL
    ) AND
    b.retirement_date IS NULL
)

                                                                          QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=821544.18..821544.19 rows=1 width=0) (actual time=5046.492..5046.492 rows=1 loops=1)
   ->  Hash Semi Join  (cost=735371.03..821521.73 rows=8977 width=0) (actual time=4941.968..5045.968 rows=7116 loops=1)
         Hash Cond: (a.id = b.note_id)
         ->  Index Scan using notes_retirement_date_project on notes a  (cost=0.00..66725.10 rows=12469 width=4) (actual time=9.639..68.751 rows=12469 loops=1)
               Index Cond: (project_id = 114)
         ->  Hash  (cost=712116.23..712116.23 rows=1417424 width=4) (actual time=4931.956..4931.956 rows=297401 loops=1)
               Buckets: 65536  Batches: 4  Memory Usage: 2633kB
               ->  Hash Join  (cost=620484.32..712116.23 rows=1417424 width=4) (actual time=3362.472..4864.816 rows=297401 loops=1)
                     Hash Cond: (b.entity_id = c.id)
                     ->  Seq Scan on note_links b  (cost=0.00..71849.56 rows=1417424 width=8) (actual time=0.079..622.277 rows=1509795 loops=1)
                           Filter: ((retirement_date IS NULL) AND ((entity_type)::text = 'Version'::text))
                     ->  Hash  (cost=618673.97..618673.97 rows=144828 width=4) (actual time=3362.337..3362.337 rows=155834 loops=1)
                           Buckets: 16384  Batches: 1  Memory Usage: 5479kB
                           ->  HashAggregate  (cost=617225.69..618673.97 rows=144828 width=4) (actual time=3289.861..3335.344 rows=155834 loops=1)
                                 ->  Seq Scan on versions c  (cost=0.00..616863.62 rows=144828 width=4) (actual time=217.080..3133.870 rows=155834 loops=1)
                                       Filter: ((retirement_date IS NULL) AND ((sg_status_list)::text = 'ip'::text))
 Total runtime: 5051.414 ms
(17 rows)


Does anything come to mind that would help me debug why this plan is being chosen? Thanks!

Matt


В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Possible Performance Regression with Transitive Comparisons vs. Constants
Следующее
От: Shaun Thomas
Дата:
Сообщение: Re: Possible Performance Regression with Transitive Comparisons vs. Constants