Re: Parameter for planner estimate of recursive queries

Поиск
Список
Период
Сортировка
От Kenaniah Cerny
Тема Re: Parameter for planner estimate of recursive queries
Дата
Msg-id CA+r_aq86dwdPSm740BN_8i2e2oTLYQb1uUTpnTr1JrQX7cApfA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Parameter for planner estimate of recursive queries  (Simon Riggs <simon.riggs@enterprisedb.com>)
Список pgsql-hackers
> The factor 10 should not be hardcoded in the planner, but should be settable, just as cursor_tuple_fraction is.

I feel considerably out of my depth here, but I like the idea of a working table size multiplier GUC, given the challenges of predicting the number of iterations (and any adjustments to cardinality per iteration). An exponential cost function may lead to increasingly pathological outcomes, especially when estimates for cte_rows are off. 

In the EXPLAINs, it looked like the estimates for knows_pkey were off by an order of magnitude or so. It's possible the planner would have chosen the Nested Loop plan if knows_pkey had estimated to rows=87 (as the WindowAgg would have estimated to roughly the same size as the second plan anyways, even with rel->tuples = 10 * cte_rows).

I also wonder if there's a better default than cte_rows * 10, but implementing a new GUC sounds like a reasonable solution to this as well.

--

Kenaniah

On Sat, Jan 22, 2022 at 1:58 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote:
On Wed, 27 Oct 2021 at 15:58, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

> The poor performance is traced to the planner cost estimates for
> recursive queries. Specifically, the cost of the recursive arm of the
> query is evaluated based upon both of these hardcoded assumptions:
>
> 1. The recursion will last for 10 loops
> 2. The average size of the worktable will be 10x the size of the
> initial query (non-recursive term).
>
> Taken together these assumptions lead to a very poor estimate of the
> worktable activity (in this case), which leads to the plan changing as
> a result.
>
> The factor 10 is a reasonably safe assumption and helps avoid worst
> case behavior in bigger graph queries. However, the factor 10 is way
> too large for many types of graph query, such as where the path
> through the data is tight, and/or the query is written to prune bushy
> graphs, e.g. shortest path queries. The factor 10 should not be
> hardcoded in the planner, but should be settable, just as
> cursor_tuple_fraction is.

If you think this should be derived without parameters, then we would
want a function that starts at 1 for 1 input row and gets much larger
for larger input. The thinking here is that Graph OLTP is often a
shortest path between two nodes, whereas Graph Analytics and so the
worktable will get much bigger.

So I'm, thinking we use

rel->tuples = min(1, cte_rows * cte_rows/2);

--
Simon Riggs                http://www.EnterpriseDB.com/




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dmitry Dolgov
Дата:
Сообщение: Re: MDAM techniques and Index Skip Scan patch
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Warning in geqo_main.c from clang 13