Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500

Поиск

Список

Период

Сортировка

От	Richard Guo
Тема	Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500
Дата	16 ноября 2023 г. 12:24:55
Msg-id	CAMbWs4-W_81Q3ff1Bk3NU_ChRzOJK0njAoJanW-kXnPF0wjd1Q@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500 (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500
Список	pgsql-hackers

Дерево обсуждения

On Thu, Nov 9, 2023 at 6:45 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

The existing RTE_SUBQUERY stanza has most of what we need for this,
so I experimented with extending that to also handle RTE_CTE. It
seems to work, though I soon found out that it needed tweaking for
the case where the CTE is INSERT/UPDATE/DELETE RETURNING.

The change looks good to me. To nitpick, should we modify the comment
of examine_simple_variable to also mention 'CTEs'?

* This is split out as a subroutine so that we can recurse to deal with
- * Vars referencing subqueries.
+ * Vars referencing subqueries or CTEs.

Interestingly, this does not change any existing regression test
results. I'd supposed there might be at least one place with a
visible plan change, but nope. Running a coverage test does show
that the new code paths are exercised, but I wonder if we ought
to try to devise a regression test that proves it more directly.

I think we ought to. Here is one regression test that proves that this
change improves query plans in some cases.

Unpatched:

explain (costs off)
with x as MATERIALIZED (select unique1 from tenk1 b)
select count(*) from tenk1 a where unique1 in (select * from x);
QUERY PLAN
------------------------------------------------------------
Aggregate
CTE x
-> Index Only Scan using tenk1_unique1 on tenk1 b
-> Nested Loop
-> HashAggregate
Group Key: x.unique1
-> CTE Scan on x
-> Index Only Scan using tenk1_unique1 on tenk1 a
Index Cond: (unique1 = x.unique1)
(9 rows)

Patched:

explain (costs off)
with x as MATERIALIZED (select unique1 from tenk1 b)
select count(*) from tenk1 a where unique1 in (select * from x);
QUERY PLAN
------------------------------------------------------------
Aggregate
CTE x
-> Index Only Scan using tenk1_unique1 on tenk1 b
-> Hash Semi Join
Hash Cond: (a.unique1 = x.unique1)
-> Index Only Scan using tenk1_unique1 on tenk1 a
-> Hash
-> CTE Scan on x
(8 rows)

I think the second plan (patched) makes more sense. In the first plan
(unpatched), the HashAggregate node actually does not reduce the the
number of rows because it groups by 'unique1', but planner does not know
that because it lacks statistics for Vars referencing the CTE.

Thanks
Richard

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Amit Langote
Дата: 16 ноября 2023 г., 11:48:55
Сообщение: Re: remaining sql/json patches

Следующее

От: Alvaro Herrera
Дата: 16 ноября 2023 г., 12:41:48
Сообщение: Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500

Предыдущее

Следующее