Обсуждение: Estimation rows of FULL JOIN
Hi.
In some cases, the planner underestimates FULL JOIN.
Example:
postgres=# CREATE TABLE t AS SELECT x AS a, null AS b FROM generate_series(1, 10) x;
postgres=# ANALYZE;
postgres=# EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t t1 FULL JOIN t t2 ON t1.b = t2.b;
QUERY PLAN
-------------------------------------------------------------------------------------------
Hash Full Join (cost=1.23..2.37 rows=10 width=72) (actual rows=20 loops=1)
Hash Cond: (t1.b = t2.b)
-> Seq Scan on t t1 (cost=0.00..1.10 rows=10 width=36) (actual rows=10 loops=1)
-> Hash (cost=1.10..1.10 rows=10 width=36) (actual rows=10 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on t t2 (cost=0.00..1.10 rows=10 width=36) (actual rows=10 loops=1)
Planning Time: 0.067 ms
Execution Time: 0.052 ms
(8 rows)
Are these simple changes enough to improve this situation?
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ef475d95a18..9cd43b778f3 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -5259,6 +5259,8 @@ calc_joinrel_size_estimate(PlannerInfo *root,
break;
case JOIN_FULL:
nrows = outer_rows * inner_rows * fkselec * jselec;
+ if (2 * nrows < outer_rows + inner_rows)
+ nrows = outer_rows + inner_rows - nrows;
if (nrows < outer_rows)
nrows = outer_rows;
if (nrows < inner_rows)
There is no error in the above case:
postgres=# EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t t1 FULL JOIN t t2 ON t1.b = t2.b;
QUERY PLAN
-------------------------------------------------------------------------------------------
Hash Full Join (cost=1.23..2.37 rows=20 width=72) (actual rows=20 loops=1)
Hash Cond: (t1.b = t2.b)
-> Seq Scan on t t1 (cost=0.00..1.10 rows=10 width=36) (actual rows=10 loops=1)
-> Hash (cost=1.10..1.10 rows=10 width=36) (actual rows=10 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on t t2 (cost=0.00..1.10 rows=10 width=36) (actual rows=10 loops=1)
Planning Time: 0.069 ms
Execution Time: 0.065 ms
(8 rows)
In some cases, the planner underestimates FULL JOIN.
Example:
postgres=# CREATE TABLE t AS SELECT x AS a, null AS b FROM generate_series(1, 10) x;
postgres=# ANALYZE;
postgres=# EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t t1 FULL JOIN t t2 ON t1.b = t2.b;
QUERY PLAN
-------------------------------------------------------------------------------------------
Hash Full Join (cost=1.23..2.37 rows=10 width=72) (actual rows=20 loops=1)
Hash Cond: (t1.b = t2.b)
-> Seq Scan on t t1 (cost=0.00..1.10 rows=10 width=36) (actual rows=10 loops=1)
-> Hash (cost=1.10..1.10 rows=10 width=36) (actual rows=10 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on t t2 (cost=0.00..1.10 rows=10 width=36) (actual rows=10 loops=1)
Planning Time: 0.067 ms
Execution Time: 0.052 ms
(8 rows)
Are these simple changes enough to improve this situation?
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ef475d95a18..9cd43b778f3 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -5259,6 +5259,8 @@ calc_joinrel_size_estimate(PlannerInfo *root,
break;
case JOIN_FULL:
nrows = outer_rows * inner_rows * fkselec * jselec;
+ if (2 * nrows < outer_rows + inner_rows)
+ nrows = outer_rows + inner_rows - nrows;
if (nrows < outer_rows)
nrows = outer_rows;
if (nrows < inner_rows)
There is no error in the above case:
postgres=# EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t t1 FULL JOIN t t2 ON t1.b = t2.b;
QUERY PLAN
-------------------------------------------------------------------------------------------
Hash Full Join (cost=1.23..2.37 rows=20 width=72) (actual rows=20 loops=1)
Hash Cond: (t1.b = t2.b)
-> Seq Scan on t t1 (cost=0.00..1.10 rows=10 width=36) (actual rows=10 loops=1)
-> Hash (cost=1.10..1.10 rows=10 width=36) (actual rows=10 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on t t2 (cost=0.00..1.10 rows=10 width=36) (actual rows=10 loops=1)
Planning Time: 0.069 ms
Execution Time: 0.065 ms
(8 rows)
Danil Anisimow <anisimow.d@gmail.com> writes: > In some cases, the planner underestimates FULL JOIN. Perhaps. > Are these simple changes enough to improve this situation? This looks like an entirely ad-hoc change, one that could make as many cases worse as it makes better. If you want to argue for merging it, you at least need to provide some evidence for thinking it's a more accurate model of what will happen. And adjust the nearby comment to explain it. regards, tom lane