Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

Поиск
Список
Период
Сортировка
От Dean Rasheed
Тема Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation
Дата
Msg-id CAEZATCW4=er-KzHw4J8sPix45VoZ09vnKTwFXz25dfFz29ozzQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Ответы Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation
Список pgsql-hackers
On 26 July 2018 at 07:12, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> In the patch clauselist_selectivity() gets called repeatedly for every
> new qual added to the clause list. Instead, if we try to combine the
> cost/row estimation with order_qual_clauses() or
> clauselist_selectivity(), we might be able to what the current patch
> does in O(n). clauselist_selectivity() calculates combined selectivity
> of all the given quals in O(n). We should do something similar to that
> in this case.

Duplicating the logic of clauselist_selectivity() seems like quite a
lot of effort to go to just for improved filter cost estimates. Note
also that clauselist_selectivity() might get a good deal more complex
with multivariate stats.

Perhaps a reasonable simplification would be to just treat the clauses
as independent when estimating the filter cost, and so use the
following as a "good enough" estimate for the filter cost:

  cost(A) + sel(A) * cost(B) + sel(A) * sel(B) * cost(C) + ...

That would probably be an improvement over what we currently have. It
would be O(n) to compute, and it would probably use the cached
selectivity estimates for the clauses.

Note also that with this simplified expression for the filter cost, it
would be possible to improve order_qual_clauses() to take into account
the clause selectivities when sorting the clauses, and minimise the
cost of the filter by evaluating more selective clauses sooner, if
they're not too expensive. I believe that amounts to sorting by
cost/(1-sel) rather than just cost, except for clauses with sel==1,
which it makes sense to move to the end, and just sort by cost.

Regards,
Dean


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Speeding up INSERTs and UPDATEs to partitioned tables
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Online enabling of checksums