Re: [HACKERS] New design for FK-based join selectivity estimation

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [HACKERS] New design for FK-based join selectivity estimation
Дата
Msg-id 16149.1481835103@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] New design for FK-based join selectivity estimation  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
I wrote:
> ronan.dunklau@dalibo.com writes:
>> If I understand it correctly and the above is right, I think we should ignore
>> SEMI or ANTI joins altogether when considering FKs, and keep the corresponding
>> restrictinfos for later processing since they are already special-cased later
>> on.

> That seems like an overreaction.  While the old code happens to get this
> example exactly right, eqjoinsel_semi is still full of assumptions and
> approximations, and it doesn't do very well at all if it lacks MCV lists
> for both sides.

> I'm inclined to think that what we want to have happen in this case is
> to estimate the fraction of outer rows having a match as equal to the
> selectivity of the inner query's WHERE clauses, ie the semijoin
> selectivity should be sizeof(inner result) divided by sizeof(inner
> relation).

After further study, I concluded that we can only easily estimate that
when the inner side of the SEMI or ANTI join is just the single referenced
table.  If the inner side is itself a join, it's not easy to determine
what fraction of the referenced table will survive the join clauses.

However, we can still be brighter than to just throw all the FK qual
clauses back into the pool: that would result in multiplying their
selectivity estimates together, which for a multi-column FK results in
exactly the drastic underestimation that 100340e2d intended to avoid.
What seems to make sense here is to take the minimum of the per-clause
selectivities, as we are doing for other outer-join cases.

Hence, I propose the attached patch.  This rearranges the existing code
slightly to avoid duplicating it.

            regards, tom lane

diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index e42895d..415edad 100644
*** a/src/backend/optimizer/path/costsize.c
--- b/src/backend/optimizer/path/costsize.c
*************** get_foreign_key_join_selectivity(Planner
*** 4085,4090 ****
--- 4085,4091 ----
      {
          ForeignKeyOptInfo *fkinfo = (ForeignKeyOptInfo *) lfirst(lc);
          bool        ref_is_outer;
+         bool        use_smallest_selectivity = false;
          List       *removedlist;
          ListCell   *cell;
          ListCell   *prev;
*************** get_foreign_key_join_selectivity(Planner
*** 4205,4213 ****
           * be double-counting the null fraction, and (2) it's not very clear
           * how to combine null fractions for multiple referencing columns.
           *
!          * In the first branch of the logic below, null derating is done
!          * implicitly by relying on clause_selectivity(); in the other two
!          * paths, we do nothing for now about correcting for nulls.
           *
           * XXX another point here is that if either side of an FK constraint
           * is an inheritance parent, we estimate as though the constraint
--- 4206,4214 ----
           * be double-counting the null fraction, and (2) it's not very clear
           * how to combine null fractions for multiple referencing columns.
           *
!          * In the use_smallest_selectivity code below, null derating is done
!          * implicitly by relying on clause_selectivity(); in the other cases,
!          * we do nothing for now about correcting for nulls.
           *
           * XXX another point here is that if either side of an FK constraint
           * is an inheritance parent, we estimate as though the constraint
*************** get_foreign_key_join_selectivity(Planner
*** 4230,4257 ****
               * the smallest per-column selectivity, instead.  (This should
               * correspond to the FK column with the most nulls.)
               */
!             Selectivity thisfksel = 1.0;
!
!             foreach(cell, removedlist)
!             {
!                 RestrictInfo *rinfo = (RestrictInfo *) lfirst(cell);
!                 Selectivity csel;
!
!                 csel = clause_selectivity(root, (Node *) rinfo,
!                                           0, jointype, sjinfo);
!                 thisfksel = Min(thisfksel, csel);
!             }
!             fkselec *= thisfksel;
          }
          else if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
          {
              /*
               * For JOIN_SEMI and JOIN_ANTI, the selectivity is defined as the
!              * fraction of LHS rows that have matches.  If the referenced
!              * table is on the inner side, that means the selectivity is 1.0
!              * (modulo nulls, which we're ignoring for now).  We already
!              * covered the other case, so no work here.
               */
          }
          else
          {
--- 4231,4271 ----
               * the smallest per-column selectivity, instead.  (This should
               * correspond to the FK column with the most nulls.)
               */
!             use_smallest_selectivity = true;
          }
          else if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
          {
              /*
               * For JOIN_SEMI and JOIN_ANTI, the selectivity is defined as the
!              * fraction of LHS rows that have matches.  The referenced table
!              * is on the inner side (we already handled the other case above),
!              * so the FK implies that every LHS row has a match *in the
!              * referenced table*.  But any restriction or join clauses below
!              * here will reduce the number of matches.
               */
+             if (bms_membership(inner_relids) == BMS_SINGLETON)
+             {
+                 /*
+                  * When the inner side of the semi/anti join is just the
+                  * referenced table, we may take the FK selectivity as equal
+                  * to the selectivity of the table's restriction clauses.
+                  */
+                 RelOptInfo *ref_rel = find_base_rel(root, fkinfo->ref_relid);
+                 double        ref_tuples = Max(ref_rel->tuples, 1.0);
+
+                 fkselec *= ref_rel->rows / ref_tuples;
+             }
+             else
+             {
+                 /*
+                  * When the inner side of the semi/anti join is itself a join,
+                  * it's hard to guess what fraction of the referenced table
+                  * will get through the join.  But we still don't want to
+                  * multiply per-column estimates together.  Take the smallest
+                  * per-column selectivity, instead.
+                  */
+                 use_smallest_selectivity = true;
+             }
          }
          else
          {
*************** get_foreign_key_join_selectivity(Planner
*** 4265,4270 ****
--- 4279,4304 ----

              fkselec *= 1.0 / ref_tuples;
          }
+
+         /*
+          * Common code for cases where we should use the smallest selectivity
+          * that would be computed for any one of the FK's clauses.
+          */
+         if (use_smallest_selectivity)
+         {
+             Selectivity thisfksel = 1.0;
+
+             foreach(cell, removedlist)
+             {
+                 RestrictInfo *rinfo = (RestrictInfo *) lfirst(cell);
+                 Selectivity csel;
+
+                 csel = clause_selectivity(root, (Node *) rinfo,
+                                           0, jointype, sjinfo);
+                 thisfksel = Min(thisfksel, csel);
+             }
+             fkselec *= thisfksel;
+         }
      }

      *restrictlist = worklist;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: [HACKERS] Proposal for changes to recovery.conf API
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] Proposal for changes to recovery.conf API