Query plan, nested EXISTS

Поиск

Список

Период

Сортировка

От	Matt Daw
Тема	Query plan, nested EXISTS
Дата	29 сентября 2012 г. 00:04:09
Msg-id	CAA2LLOFzQDqnpvxxf1EAUzKXrJx5VUN3P2jgon3zTNzdtpGZQQ@mail.gmail.com обсуждение исходный текст
Ответы	Re: Query plan, nested EXISTS
Список	pgsql-performance

Дерево обсуждения

Howdy, I've been debugging a client's slow query today and I'm curious about the query plan. It's picking a plan that hashes lots of rows from the versions table (on v9.0.10)...

EXPLAIN ANALYZE

SELECT COUNT(*) FROM notes a WHERE

a.project_id = 114 AND

EXISTS (

SELECT 1 FROM note_links b

WHERE

b.note_id = a.id AND

b.entity_type = 'Version' AND

EXISTS (

SELECT 1 FROM versions c

WHERE

c.id = b.entity_id AND

c.code ILIKE '%comp%' AND

c.retirement_date IS NULL

) AND

b.retirement_date IS NULL

)

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=833177.30..833177.31 rows=1 width=0) (actual time=10806.416..10806.416 rows=1 loops=1)

-> Hash Semi Join (cost=747004.15..833154.86 rows=8977 width=0) (actual time=10709.343..10806.344 rows=894 loops=1)

Hash Cond: (a.id = b.note_id)

-> Index Scan using notes_retirement_date_project on notes a (cost=0.00..66725.10 rows=12469 width=4) (actual time=12.213..71.199 rows=12469 loops=1)

Index Cond: (project_id = 114)

-> Hash (cost=723749.35..723749.35 rows=1417424 width=4) (actual time=10696.192..10696.192 rows=227261 loops=1)

Buckets: 65536 Batches: 4 Memory Usage: 2016kB

-> Hash Semi Join (cost=620007.75..723749.35 rows=1417424 width=4) (actual time=8953.460..10645.714 rows=227261 loops=1)

Hash Cond: (b.entity_id = c.id)

-> Seq Scan on note_links b (cost=0.00..71849.56 rows=1417424 width=8) (actual time=0.075..628.183 rows=1509795 loops=1)

Filter: ((retirement_date IS NULL) AND ((entity_type)::text = 'Version'::text))

-> Hash (cost=616863.62..616863.62 rows=251530 width=4) (actual time=8953.327..8953.327 rows=300115 loops=1)

Buckets: 32768 Batches: 1 Memory Usage: 10551kB

-> Seq Scan on versions c (cost=0.00..616863.62 rows=251530 width=4) (actual time=176.590..8873.588 rows=300115 loops=1)

Filter: ((retirement_date IS NULL) AND ((code)::text ~~* '%comp%'::text))

Total runtime: 10810.479 ms

(16 rows)

However, I can trick it into a better plan by adding LIMIT 1 into the inner EXISTS:

EXPLAIN ANALYZE

SELECT COUNT(*) FROM notes a WHERE

a.project_id = 114 AND

EXISTS (

SELECT 1 FROM note_links b

WHERE

b.note_id = a.id AND

b.entity_type = 'Version' AND

EXISTS (

SELECT 1 FROM versions c

WHERE

c.id = b.entity_id AND

c.code ILIKE '%comp%' AND

c.retirement_date IS NULL

LIMIT 1

) AND

b.retirement_date IS NULL

)

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=372820.37..372820.38 rows=1 width=0) (actual time=139.430..139.430 rows=1 loops=1)

-> Nested Loop Semi Join (cost=0.00..372809.15 rows=4488 width=0) (actual time=9.735..139.333 rows=894 loops=1)

-> Index Scan using notes_retirement_date_project on notes a (cost=0.00..66725.10 rows=12469 width=4) (actual time=9.699..67.263 rows=12469 loops=1)

Index Cond: (project_id = 114)

-> Index Scan using note_links_note on note_links b (cost=0.00..24.54 rows=1 width=4) (actual time=0.006..0.006 rows=0 loops=12469)

Index Cond: (b.note_id = a.id)

Filter: ((b.retirement_date IS NULL) AND ((b.entity_type)::text = 'Version'::text) AND (SubPlan 1))

SubPlan 1

-> Limit (cost=0.00..9.04 rows=1 width=0) (actual time=0.003..0.003 rows=0 loops=11794)

-> Index Scan using versions_pkey on versions c (cost=0.00..9.04 rows=1 width=0) (actual time=0.003..0.003 rows=0 loops=11794)

Index Cond: (id = $0)

Filter: ((retirement_date IS NULL) AND ((code)::text ~~* '%comp%'::text))

Total runtime: 139.465 ms

(13 rows)

Unfortunately, a couple other queries I tested got slower by adding the LIMIT so I don't think that's going to be a good workaround. It doesn't appear to be related to ILIKE, because I tried a straight equals against another un-indexed column of versions and still get a slow plan (and adding the LIMIT to this one made it fast too):

EXPLAIN ANALYZE

SELECT COUNT(*) FROM notes a WHERE

a.project_id = 114 AND

EXISTS (

SELECT 1 FROM note_links b

WHERE

b.note_id = a.id AND

b.entity_type = 'Version' AND

EXISTS (

SELECT 1 FROM versions c

WHERE

c.id = b.entity_id AND

c.sg_status_list = 'ip' AND

c.retirement_date IS NULL

) AND

b.retirement_date IS NULL

)

QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=821544.18..821544.19 rows=1 width=0) (actual time=5046.492..5046.492 rows=1 loops=1)

-> Hash Semi Join (cost=735371.03..821521.73 rows=8977 width=0) (actual time=4941.968..5045.968 rows=7116 loops=1)

Hash Cond: (a.id = b.note_id)

-> Index Scan using notes_retirement_date_project on notes a (cost=0.00..66725.10 rows=12469 width=4) (actual time=9.639..68.751 rows=12469 loops=1)

Index Cond: (project_id = 114)

-> Hash (cost=712116.23..712116.23 rows=1417424 width=4) (actual time=4931.956..4931.956 rows=297401 loops=1)

Buckets: 65536 Batches: 4 Memory Usage: 2633kB

-> Hash Join (cost=620484.32..712116.23 rows=1417424 width=4) (actual time=3362.472..4864.816 rows=297401 loops=1)

Hash Cond: (b.entity_id = c.id)

-> Seq Scan on note_links b (cost=0.00..71849.56 rows=1417424 width=8) (actual time=0.079..622.277 rows=1509795 loops=1)

Filter: ((retirement_date IS NULL) AND ((entity_type)::text = 'Version'::text))

-> Hash (cost=618673.97..618673.97 rows=144828 width=4) (actual time=3362.337..3362.337 rows=155834 loops=1)

Buckets: 16384 Batches: 1 Memory Usage: 5479kB

-> HashAggregate (cost=617225.69..618673.97 rows=144828 width=4) (actual time=3289.861..3335.344 rows=155834 loops=1)

-> Seq Scan on versions c (cost=0.00..616863.62 rows=144828 width=4) (actual time=217.080..3133.870 rows=155834 loops=1)

Filter: ((retirement_date IS NULL) AND ((sg_status_list)::text = 'ip'::text))

Total runtime: 5051.414 ms

(17 rows)

Does anything come to mind that would help me debug why this plan is being chosen? Thanks!

Matt

В списке pgsql-performance по дате отправления:

Предыдущее

От: Tom Lane
Дата: 28 сентября 2012 г., 23:35:33
Сообщение: Re: Possible Performance Regression with Transitive Comparisons vs. Constants

Следующее

От: Shaun Thomas
Дата: 29 сентября 2012 г., 00:37:47
Сообщение: Re: Possible Performance Regression with Transitive Comparisons vs. Constants

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Query plan, nested EXISTS

Предыдущее

Следующее