Sort of a planner regression 8.3->8.4 (due to EXISTS inlining) and related stuff

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Sort of a planner regression 8.3->8.4 (due to EXISTS inlining) and related stuff
Дата
Msg-id 201005170107.54275.andres@anarazel.de
обсуждение исходный текст
Ответы Re: Sort of a planner regression 8.3->8.4 (due to EXISTS inlining) and related stuff  (Andres Freund <andres@anarazel.de>)
Re: Sort of a planner regression 8.3->8.4 (due to EXISTS inlining) and related stuff  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi all,

After having received several reports of worse plans on 8.4 compared to 8.3
and recently once more one from 'vaxerdec' on IRC I tried to investigate the
difference.
Reducing the (large and ugly, automatically generated queries) to a
reproducible testcase I ended up with the following pattern:

explain SELECT 1
FROM                                       c
WHERE   EXISTS (       SELECT *          FROM a           JOIN b USING (b_id)       WHERE b.c_id = c.c_id)   AND
c.value= 1; 

8.3 planned this to:
 Index Scan using c_value_key on c  (cost=0.00..24.83 rows=1 width=0)   Index Cond: (value = 1)   Filter: (subplan)
SubPlan    ->  Nested Loop  (cost=0.00..16.56 rows=1 width=12)           ->  Index Scan using b__c_id on b
(cost=0.00..8.27rows=1  
width=8)                 Index Cond: (c_id = $0)           ->  Index Scan using a__b_id on a  (cost=0.00..8.27 rows=1
width=8)                 Index Cond: (a.b_id = b.b_id)

Which is quite good for such a kind of query.

From 8.4 onwards this gets planned to
 Nested Loop Semi Join  (cost=1543.00..7708.29 rows=1 width=0)   Join Filter: (c.c_id = b.c_id)   ->  Index Scan using
c_value_keyon c  (cost=0.00..8.27 rows=1 width=4)         Index Cond: (value = 1)   ->  Hash Join
(cost=1543.00..7075.02rows=50000 width=4)         Hash Cond: (b.b_id = a.b_id)         ->  Seq Scan on b
(cost=0.00..2164.01rows=150001 width=8)         ->  Hash  (cost=722.00..722.00 rows=50000 width=4)               ->
SeqScan on a  (cost=0.00..722.00 rows=50000 width=4) 

which is the near-equivalent (with s/Semi/IN/) what 8.3 produces for the above
query with IN instead of EXISTS.

This kind of plan obviously is horrid.

The behavioral change was introduced in Tom's initial commit to support Semi
Joins "Implement SEMI and ANTI joins in the planner and executor." from
2008-08-14 (I tried the commits before and after).
Seeing that 8.3 didn't inline EXISTS but IN and showed the bad plan with IN
its pretty evident that the inlining is the problem and not the patch itself.

Unsurprisingly 8.4 produces a similar plan to 8.3 if one uses a volatile
function or a OFFSET 0 as that stops inlining.

Two questions:
1. Is there a reason this cannot be optimized? In the semi join case it
doesn't seem to be *that* complex to push down qualifiers resulting in a plan
like:


Nested Loop Semi Join  -> Index Scan using c_value_key on c       Index Cond: (value = 1)  -> Nested Loop          ->
IndexScan using b__c_id on b    Index Cond: (b.c_id = c.c_id)      -> Index Scan using a__b_id on a           Index
Cond:(a.b_id = b.b_id) 

or, a bit more complex:

Nested Loop Semi Join ->  Nested Loop Semi Join       ->  Index Scan using c_value_key on c             Index Cond:
(value= 1)       ->  Index Scan using b__c_id on b             Index Cond: (b.c_id = c.c_id) ->  Index Scan using
a__b_idon a       Index Cond: (a.b_id = b.b_id) 


2. I can construct several cases off the top of my head where avoiding
inlining might yield significantly better plans unless variables are pushed
down more aggressively into EXISTS/IN. While it would solve this issue I doubt
that generating a separate path without inlining is viable (code complexity,
plantime)?


I thinks its pretty annoying to have so much worse plans in 8.4 than earlier
in relatively obvious queries, but I don't see any good, low-impact fix?

Greetings,

Andres

PS: Tom: Do you like to get directly addressed on "bugs"/mails like this one
or not? I am not really sure whats the policy regarding that is on the pg
mailinglists. Is there one?


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Performance problem in textanycat/anytextcat
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Sort of a planner regression 8.3->8.4 (due to EXISTS inlining) and related stuff