Corner-case improvement to eqjoinsel_semi

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Corner-case improvement to eqjoinsel_semi
Дата
Msg-id 31089.1480384713@sss.pgh.pa.us
обсуждение исходный текст
Список pgsql-hackers
There's a complaint in bug #14438 about poor estimation of join size
for a semijoin whose inner side is empty.  I think the root of it is that,
having no statistics for the empty table, eqjoinsel_semi distrusts its
estimate of the number of distinct values on the inner side, and falls
back to a conservative calculation that has little to do with reality.
However, it's not really true that the nd2 estimate is based on nothing at
all, because we clamped it to be no more than the estimated size of the
inner relation (which we knew to be small).  If we go ahead and use that
number as a valid estimate, we get a far better selectivity estimate ---
in the bug's example, the join size estimate goes from 16000-some to 2,
which is a tad closer to the correct value of 0.

Hence, I propose the attached patch.  This would kick in whenever the
inner side of a semi/antijoin has no statistics and an estimated size
of less than 200 rows.

Thoughts?

            regards, tom lane

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 56943f2..d331adf 100644
*** a/src/backend/utils/adt/selfuncs.c
--- b/src/backend/utils/adt/selfuncs.c
*************** eqjoinsel_semi(Oid operator,
*** 2511,2520 ****
       * We can apply this clamping both with respect to the base relation from
       * which the join variable comes (if there is just one), and to the
       * immediate inner input relation of the current join.
       */
      if (vardata2->rel)
!         nd2 = Min(nd2, vardata2->rel->rows);
!     nd2 = Min(nd2, inner_rel->rows);

      if (HeapTupleIsValid(vardata1->statsTuple))
      {
--- 2511,2532 ----
       * We can apply this clamping both with respect to the base relation from
       * which the join variable comes (if there is just one), and to the
       * immediate inner input relation of the current join.
+      *
+      * If we clamp, we can consider that nd2 is not a bogus default estimate.
       */
      if (vardata2->rel)
!     {
!         if (nd2 >= vardata2->rel->rows)
!         {
!             nd2 = vardata2->rel->rows;
!             isdefault2 = false;
!         }
!     }
!     if (nd2 >= inner_rel->rows)
!     {
!         nd2 = inner_rel->rows;
!         isdefault2 = false;
!     }

      if (HeapTupleIsValid(vardata1->statsTuple))
      {

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Christian Convey
Дата:
Сообщение: Re: Tackling JsonPath support
Следующее
От: Nico Williams
Дата:
Сообщение: Re: Tackling JsonPath support