Re: Join Filter vs. Index Cond (performance regression 9.1->9.2+/HEAD)

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Join Filter vs. Index Cond (performance regression 9.1->9.2+/HEAD)
Дата
Msg-id 1327.1433090999@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Join Filter vs. Index Cond (performance regression 9.1->9.2+/HEAD)  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Ответы Re: Join Filter vs. Index Cond (performance regression 9.1->9.2+/HEAD)  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Re: Join Filter vs. Index Cond (performance regression 9.1->9.2+/HEAD)  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> Obviously it makes little sense to use an (a,b,c) index to look up just
> (a,b) and then filter on c; the question is, what is the planner doing
> that leads it to get this so wrong?

It's not so astonishing as all that; compare

regression=# explain select * from t1 where a=3 and b=4;                              QUERY PLAN
      
 
------------------------------------------------------------------------Index Only Scan using t1_pkey on t1
(cost=0.28..8.29rows=1 width=12)  Index Cond: ((a = 3) AND (b = 4))
 
(2 rows)

regression=# explain select * from t1 where a=3 and b=4 and c=5;                              QUERY PLAN
              
 
------------------------------------------------------------------------Index Only Scan using t1_pkey on t1
(cost=0.28..8.30rows=1 width=12)  Index Cond: ((a = 3) AND (b = 4) AND (c = 5))
 
(2 rows)

Once you're down to an estimate of one row retrieved, adding additional
index conditions simply increases the cost (not by much, but it increases)
without delivering any visible benefit.

I believe what probably happened in this case is that the planner
considered both forms of the indexscan path and concluded that they were
fuzzily the same cost and rowcount, yet the path using only t2.a and t3.b
clearly dominated by requiring strictly fewer outer relations for
parameters.  So it threw away the path that also had the c = t4.c
comparison before it ever got to the join stage.  Even had it kept that
path, the join cost estimate wouldn't have looked any better than the one
for the join it did pick, so there would have been no certainty of picking
the "correct" plan.

The real problem in your example is thus the incorrect rowcount estimate;
with better rowcount estimates the two cases wouldn't have appeared to
have the same output rowcount.

For the toy data in your example, this can probably be blamed on the fact
that eqjoinsel_inner doesn't have any smarts for the case of having an MCV
list for only one side (though as noted in the comments, it's not obvious
what it should do instead).  However, it's not very clear what was
happening in the real-world case.
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: [CORE] postpone next week's release
Следующее
От: Joel Jacobson
Дата:
Сообщение: pg_xlog -> pg_xjournal?