Re: Bad row estimation with indexed func returning bool

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Bad row estimation with indexed func returning bool
Дата
Msg-id 6822.1441993722@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Bad row estimation with indexed func returning bool  (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>)
Ответы Re: Bad row estimation with indexed func returning bool  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Jehan-Guillaume de Rorthais <jgdr@dalibo.com> writes:
> I faced a correlation problem on a query today and tried the usual trick
> consisting of using an functional index and rewriting the query to use it.

The core reason this isn't doing anything useful is that
clause_selectivity() is hard-wired to estimate the selectivity of a
top-level WHERE clause that is a function call as 0.3333333, no matter
what:
   else if (is_funcclause(clause))   {       /*        * This is not an operator, so we guess at the selectivity. THIS
ISA        * HACK TO GET V4 OUT THE DOOR.  FUNCS SHOULD BE ABLE TO HAVE        * SELECTIVITIES THEMSELVES.       -- JMH
7/9/92       */       s1 = (Selectivity) 0.3333333;   }
 

Adding per-function selectivity estimators, as Joe was presumably
envisioning, would be a sufficiently large amount of work that it's not
too surprising nobody's gotten around to it in twenty-three years.  (The
infrastructure maybe wouldn't be so bad, but where would the estimators
themselves come from, especially for user-defined functions?)

However, in the case at hand, the complaint basically is why aren't we
treating the boolean function expression like a boolean variable, and
looking to see if there are stats available for it, like this other
bit in clause_selectivity:
           /*            * A Var at the top of a clause must be a bool Var. This is            * equivalent to the
clausereln.attribute = 't', so we compute            * the selectivity as if that is what we have.            */
  s1 = restriction_selectivity(root,                                        BooleanEqualOperator,
                list_make2(var,                                                   makeBoolConst(true,
                                             false)),                                        InvalidOid,
                       varRelid);
 

Indeed you could argue that this ought to be the fallback behavior for
*any* unhandled case, not just function expressions.  Not sure if we'd
need to restrict it to single-relation expressions or not.

The implication of doing it like this would be that the default estimate
in the absence of any matching stats would be 0.5 (since eqsel defaults
to 1/ndistinct, and get_variable_numdistinct will report 2.0 for any
boolean-type expression it has no stats for).  That's not a huge change
from the existing 0.3333333 estimate, which seems pretty unprincipled
anyway ... but it would probably be enough to annoy people if we did it in
stable branches.  So I'd be inclined to propose changing this in HEAD and
maybe 9.5, but not further back.  (For non-function expressions, 0.5 is
the default already, so those would not change behavior.)

Comments?
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: DBT-3 with SF=20 got failed
Следующее
От: Dean Rasheed
Дата:
Сообщение: Re: RLS open items are vague and unactionable