Обсуждение: Estimation rows of FULL JOIN

Поиск

Список

Период

Сортировка

Estimation rows of FULL JOIN

От

Danil Anisimow

Дата:

20 декабря 2023 г., 20:31:51

Hi.

In some cases, the planner underestimates FULL JOIN.

Example:

postgres=# CREATE TABLE t AS SELECT x AS a, null AS b FROM generate_series(1, 10) x;
postgres=# ANALYZE;
postgres=# EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t t1 FULL JOIN t t2 ON t1.b = t2.b;
QUERY PLAN
-------------------------------------------------------------------------------------------
Hash Full Join (cost=1.23..2.37 rows=10 width=72) (actual rows=20 loops=1)
Hash Cond: (t1.b = t2.b)
-> Seq Scan on t t1 (cost=0.00..1.10 rows=10 width=36) (actual rows=10 loops=1)
-> Hash (cost=1.10..1.10 rows=10 width=36) (actual rows=10 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on t t2 (cost=0.00..1.10 rows=10 width=36) (actual rows=10 loops=1)
Planning Time: 0.067 ms
Execution Time: 0.052 ms
(8 rows)

Are these simple changes enough to improve this situation?

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ef475d95a18..9cd43b778f3 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -5259,6 +5259,8 @@ calc_joinrel_size_estimate(PlannerInfo *root,
break;
case JOIN_FULL:
nrows = outer_rows * inner_rows * fkselec * jselec;
+ if (2 * nrows < outer_rows + inner_rows)
+ nrows = outer_rows + inner_rows - nrows;
if (nrows < outer_rows)
nrows = outer_rows;
if (nrows < inner_rows)

There is no error in the above case:

postgres=# EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t t1 FULL JOIN t t2 ON t1.b = t2.b;
QUERY PLAN
-------------------------------------------------------------------------------------------
Hash Full Join (cost=1.23..2.37 rows=20 width=72) (actual rows=20 loops=1)
Hash Cond: (t1.b = t2.b)
-> Seq Scan on t t1 (cost=0.00..1.10 rows=10 width=36) (actual rows=10 loops=1)
-> Hash (cost=1.10..1.10 rows=10 width=36) (actual rows=10 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on t t2 (cost=0.00..1.10 rows=10 width=36) (actual rows=10 loops=1)
Planning Time: 0.069 ms
Execution Time: 0.065 ms
(8 rows)

Re: Estimation rows of FULL JOIN

От

Tom Lane

Дата:

20 декабря 2023 г., 20:39:05

Danil Anisimow <anisimow.d@gmail.com> writes:
> In some cases, the planner underestimates FULL JOIN.

Perhaps.

> Are these simple changes enough to improve this situation?

This looks like an entirely ad-hoc change, one that could make as
many cases worse as it makes better.  If you want to argue for
merging it, you at least need to provide some evidence for thinking
it's a more accurate model of what will happen.  And adjust the
nearby comment to explain it.

            regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Estimation rows of FULL JOIN

Estimation rows of FULL JOIN

Re: Estimation rows of FULL JOIN