Re: Make the qual cost on index Filter slightly higher than qual coston index Cond.

Поиск
Список
Период
Сортировка
От Andy Fan
Тема Re: Make the qual cost on index Filter slightly higher than qual coston index Cond.
Дата
Msg-id CAKU4AWrkL+1brQ1Sd-GotV=8=gYaCNARurPkJek76m9sPwJOGQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Make the qual cost on index Filter slightly higher than qual coston index Cond.  (Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com>)
Ответы Re: Make the qual cost on index Filter slightly higher than qualcost on index Cond.  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers


On Wed, May 27, 2020 at 8:01 PM Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com> wrote:


On Wed, 27 May 2020 at 04:43, Andy Fan <zhihui.fan1213@gmail.com> wrote:
You can use the attached sql to reproduce this issue, but I'm not sure you can
get the above result at the first time that is because when optimizer think the 
2 index scan have the same cost, it will choose the first one it found, the order
depends on RelationGetIndexList.  If so,  you may try drop and create j1_i_im5 index.

The sense behind this patch is we still use the cost based optimizer, just when we 
we find out the 2 index scans have the same cost,  we prefer to use the index which
have more qual filter on Index Cond.  This is implemented by adjust the qual cost 
on index filter slightly higher. 

Thanks for the example and the explanation.

The execution time difference in your example is pretty high to account for executing the filter on so many rows. My guess is this has to do with the heap access. For applying the filter the entire row needs to be fetched from the heap. So we should investigate this case from that angle. Another guess I have is the statistics is not correct and hence the cost is wrong.

 
I believe this is a statistics issue and then the cost is wrong.  More characters of this
issue are:  1).  If a data is out of range in the old statistics,  optimizer will given an 1 row
assumption.  2).  based on the 1 row assumption,  for query "col1=out_of_range_val AND
col2 = any_value"   Index (col1, col2) and (col1, col3) will have exactly same cost for current
cost model. 3).  If the statistics was wrong, (col1, col3) maybe a very bad plan as shown 
above, but index (col1, col2) should  always better/no worse than (col1, col3) in any case.
4). To expand the rule, for query "col1 = out_of_range_val AND col2 = any_value AND col3 = any_val",  
index are (col1, col2, col_m) and (col1, col_m, col_n),  the former index will aways has better/no worse
than the later one.  5). an statistics issue like this is not  uncommon, for example 
an log based application, creation_date is very easy to out of range in statistics. 

so we need to optimize the cost model for such case, the method is the patch I mentioned above. 
I can't have a solid data to prove oracle did something similar, but based on the talk with my
customer,  oracle is likely did something like this. 

--
Best Regards
Andy Fan

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Jonathan S. Katz"
Дата:
Сообщение: Re: password_encryption default
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: password_encryption default