Re: Parameter for planner estimate of recursive queries

Поиск

Список

Период

Сортировка

От	Kenaniah Cerny
Тема	Re: Parameter for planner estimate of recursive queries
Дата	23 января 2022 г. 01:33:59
Msg-id	CA+r_aq86dwdPSm740BN_8i2e2oTLYQb1uUTpnTr1JrQX7cApfA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Parameter for planner estimate of recursive queries (Simon Riggs <simon.riggs@enterprisedb.com>)
Список	pgsql-hackers

Дерево обсуждения

> The factor 10 should not be hardcoded in the planner, but should be settable, just as cursor_tuple_fraction is.

I feel considerably out of my depth here, but I like the idea of a working table size multiplier GUC, given the challenges of predicting the number of iterations (and any adjustments to cardinality per iteration). An exponential cost function may lead to increasingly pathological outcomes, especially when estimates for cte_rows are off.

In the EXPLAINs, it looked like the estimates for knows_pkey were off by an order of magnitude or so. It's possible the planner would have chosen the Nested Loop plan if knows_pkey had estimated to rows=87 (as the WindowAgg would have estimated to roughly the same size as the second plan anyways, even with rel->tuples = 10 * cte_rows).

I also wonder if there's a better default than cte_rows * 10, but implementing a new GUC sounds like a reasonable solution to this as well.

Kenaniah

On Sat, Jan 22, 2022 at 1:58 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote:

On Wed, 27 Oct 2021 at 15:58, Simon Riggs <simon.riggs@enterprisedb.com> wrote:

> The poor performance is traced to the planner cost estimates for
> recursive queries. Specifically, the cost of the recursive arm of the
> query is evaluated based upon both of these hardcoded assumptions:
>
> 1. The recursion will last for 10 loops
> 2. The average size of the worktable will be 10x the size of the
> initial query (non-recursive term).
>
> Taken together these assumptions lead to a very poor estimate of the
> worktable activity (in this case), which leads to the plan changing as
> a result.
>
> The factor 10 is a reasonably safe assumption and helps avoid worst
> case behavior in bigger graph queries. However, the factor 10 is way
> too large for many types of graph query, such as where the path
> through the data is tight, and/or the query is written to prune bushy
> graphs, e.g. shortest path queries. The factor 10 should not be
> hardcoded in the planner, but should be settable, just as
> cursor_tuple_fraction is.

If you think this should be derived without parameters, then we would
want a function that starts at 1 for 1 input row and gets much larger
for larger input. The thinking here is that Graph OLTP is often a
shortest path between two nodes, whereas Graph Analytics and so the
worktable will get much bigger.

So I'm, thinking we use

rel->tuples = min(1, cte_rows * cte_rows/2);

--
Simon Riggs http://www.EnterpriseDB.com/

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Dmitry Dolgov
Дата: 23 января 2022 г., 00:37:19
Сообщение: Re: MDAM techniques and Index Skip Scan patch

Следующее

От: Tom Lane
Дата: 23 января 2022 г., 01:34:16
Сообщение: Re: Warning in geqo_main.c from clang 13

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Parameter for planner estimate of recursive queries

Предыдущее

Следующее