Idea about estimating selectivity for single-column expressions

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Idea about estimating selectivity for single-column expressions
Дата
Msg-id 10499.1250650414@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: Idea about estimating selectivity for single-column expressions  (Robert Haas <robertmhaas@gmail.com>)
Re: Idea about estimating selectivity for single-column expressions  (Greg Stark <gsstark@mit.edu>)
Список pgsql-hackers
There were two different complaints today about the planner's inability
to make good estimates for WHERE conditions involving bit AND tests,
such as(field & 8) = 0(field & 8) <> 0(field & 127) = field

I'm not particularly eager to try to put in special-case knowledge
for this sort of thing, but I had an idea that might cover a wide
enough variety of cases to be worth the trouble.  It's this: if we
have a WHERE clause that is an immutable (or maybe even just stable)
function of a single table column, and we don't have any special-purpose
estimator that works for it, we could estimate the selectivity by
actually evaluating the clause for each MCV and histogram entry for
the column, then combining the results with appropriate weightings.
We do something comparable already for "field LIKE pattern" expressions,
but this would generalize it to any expression.  With the larger
histogram sizes that are default as of 8.4, I think this could be
expected to provide a pretty good estimate most of the time ---
certainly far better than the hardwired default estimates are.

In the long run this might be workable for multi-column expressions
too, but not before we have multi-column statistics so we know which
combinations of values to try.

I can see a couple of possible downsides:

* If the expression is expensive to evaluate, we could waste a lot
of time making the estimate.  This would only be a serious issue
if the query were designed to avoid evaluating the WHERE clause
very often, but that's certainly possible.  We could perhaps check
the estimated cost of the expression and not do this if it's too
high ... but expression cost estimates are generally pretty bogus.

* The expression might throw an error for some inputs, for instance(1 / field) < 0.5
which would fail on zero.  We could recover by wrapping the whole
estimation process in a subtransaction, but that seems really expensive.
I thought about arguing that the failure would happen anyway at runtime,
but that doesn't hold water --- for example, the user might have just
deleted all the rows with field = 0, and would have grounds to complain
if the query failed; but there would still be an entry for zero in the
histogram.

Comments?
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: pg_hba.conf: samehost and samenet
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Idea about estimating selectivity for single-column expressions