Обсуждение: Estimation rows of FULL JOIN

Поиск
Список
Период
Сортировка

Estimation rows of FULL JOIN

От
Danil Anisimow
Дата:
Hi.

In some cases, the planner underestimates FULL JOIN.

Example:

postgres=# CREATE TABLE t AS SELECT x AS a, null AS b FROM generate_series(1, 10) x;
postgres=# ANALYZE;
postgres=# EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t t1 FULL JOIN t t2 ON t1.b = t2.b;
                                        QUERY PLAN                                        
-------------------------------------------------------------------------------------------
 Hash Full Join  (cost=1.23..2.37 rows=10 width=72) (actual rows=20 loops=1)
   Hash Cond: (t1.b = t2.b)
   ->  Seq Scan on t t1  (cost=0.00..1.10 rows=10 width=36) (actual rows=10 loops=1)
   ->  Hash  (cost=1.10..1.10 rows=10 width=36) (actual rows=10 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Seq Scan on t t2  (cost=0.00..1.10 rows=10 width=36) (actual rows=10 loops=1)
 Planning Time: 0.067 ms
 Execution Time: 0.052 ms
(8 rows)

Are these simple changes enough to improve this situation?

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index ef475d95a18..9cd43b778f3 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -5259,6 +5259,8 @@ calc_joinrel_size_estimate(PlannerInfo *root,
                        break;
                case JOIN_FULL:
                        nrows = outer_rows * inner_rows * fkselec * jselec;
+                       if (2 * nrows < outer_rows + inner_rows)
+                               nrows = outer_rows + inner_rows - nrows;
                        if (nrows < outer_rows)
                                nrows = outer_rows;
                        if (nrows < inner_rows)

There is no error in the above case:

postgres=# EXPLAIN (ANALYZE, TIMING OFF) SELECT * FROM t t1 FULL JOIN t t2 ON t1.b = t2.b;
                                        QUERY PLAN                                        
-------------------------------------------------------------------------------------------
 Hash Full Join  (cost=1.23..2.37 rows=20 width=72) (actual rows=20 loops=1)
   Hash Cond: (t1.b = t2.b)
   ->  Seq Scan on t t1  (cost=0.00..1.10 rows=10 width=36) (actual rows=10 loops=1)
   ->  Hash  (cost=1.10..1.10 rows=10 width=36) (actual rows=10 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Seq Scan on t t2  (cost=0.00..1.10 rows=10 width=36) (actual rows=10 loops=1)
 Planning Time: 0.069 ms
 Execution Time: 0.065 ms
(8 rows)

Re: Estimation rows of FULL JOIN

От
Tom Lane
Дата:
Danil Anisimow <anisimow.d@gmail.com> writes:
> In some cases, the planner underestimates FULL JOIN.

Perhaps.

> Are these simple changes enough to improve this situation?

This looks like an entirely ad-hoc change, one that could make as
many cases worse as it makes better.  If you want to argue for
merging it, you at least need to provide some evidence for thinking
it's a more accurate model of what will happen.  And adjust the
nearby comment to explain it.

            regards, tom lane