Re: a wrong index choose when statistics is out of date

Поиск
Список
Период
Сортировка
От Andy Fan
Тема Re: a wrong index choose when statistics is out of date
Дата
Msg-id 87h6fmjj2u.fsf@163.com
обсуждение исходный текст
Ответ на Re: a wrong index choose when statistics is out of date  (Andy Fan <zhihuifan1213@163.com>)
Список pgsql-hackers
Andy Fan <zhihuifan1213@163.com> writes:

> Hello everyone,
>
>> After some more thoughts about the diference of the two ideas, then I
>> find we are resolving two different issues, just that in the wrong index
>> choose cases, both of them should work generally. 
>
> Here is the formal version for the attribute reloptions direction.

> commit 0d842e39275710a544b11033f5eec476147daf06 (HEAD -> force_generic)
> Author: yizhi.fzh <yizhi.fzh@alibaba-inc.com>
> Date:   Sun Mar 31 11:51:28 2024 +0800
>
>     Add a attopt to disable MCV when estimating for Var = Const
>     
>     As of current code, when calculating the selectivity for Var = Const,
>     planner first checks if the Const is an most common value and if not, it
>     takes out all the portions of MCV's selectivity and num of distinct
>     value, and treat the selectivity for Const equal for the rest
>     n_distinct.
>     
>     This logic works great when the optimizer statistic is up to date,
>     however if the known most common value has taken up most of the
>     selectivity at the last run of analyze, and the new most common value in
>     reality has not been gathered, the estimation for the new MCV will be
>     pretty bad. A common case for this would be created_at = {current_date};
>     
>     To overcome this issue, we provides a new syntax:
>     
>     ALTER TABLE tablename ALTER COLUMN created_at SET (force_generic=on);
>     
>     After this, planner ignores the value of MCV for this column when
>     estimating for Var = Const and treating all the values equally.
>     
>     This would cause some badness if the values for a column are pretty not
>     equal which is what MCV is designed for, however this patch just provide
>     one more option to user and let user make the decision.
>
> Here is an example about its user case.

...

Here are some add-ups for this feature:

- After the use this feature, we still to gather the MCV on these
  columns because they are still useful for the join case, see
  eqjoinsel_inner function.

- Will this feature make some cases worse since it relies on the fact
  that not using the MCV list for var = Const? That's is true in
  theory. But if user use this feature right, they will not use this
  feature for these columns. The feature is just designed for the user
  case in the commit message and the theory is exactly same as generic
  plan. If user uses it right, they may save the effort of run 'analyze'
  pretty frequently and get some better result on both index choose and
  rows estimation. Plus the patch is pretty not aggressive and it's easy
  to maintain.

- Is the 'force_generic' a good name for attribute option? Probably not,
  we can find out a good name after we agree on this direction.  

- Will it be conflicted with David's idea of certainty_factor? Probably
  not,even both of them can handle the index-choose-case. See my point
  on [1]

[1] https://www.postgresql.org/message-id/877cicao6e.fsf%40163.com 

-- 
Best Regards
Andy Fan




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andy Fan
Дата:
Сообщение: Re: using extended statistics to improve join estimates
Следующее
От: Joseph Koshakow
Дата:
Сообщение: Fix overflow hazard in timestamp_pl_interval