Re: Should the function get_variable_numdistinct consider the case when stanullfrac is 1.0?

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Should the function get_variable_numdistinct consider the case when stanullfrac is 1.0?
Дата
Msg-id 148741.1604105449@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Should the function get_variable_numdistinct consider the case when stanullfrac is 1.0?  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: Should the function get_variable_numdistinct consider the case when stanullfrac is 1.0?  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> So I'm not sure I understand what would be the risk with this ... Tom,
> can you elaborate why you dislike the patch?

I've got a couple issues with the patch as presented.

* As you said, it creates discontinuous behavior for stanullfrac = 1.0
versus stanullfrac = 1.0 - epsilon.  That doesn't seem good.

* It's not apparent why, if ANALYZE's sample is all nulls, we wouldn't
conclude stadistinct = 0 and thus arrive at the desired answer that
way.  (Since we have a complaint, I'm guessing that ANALYZE might
disbelieve its own result and stick in some larger stadistinct.  But
then maybe that's where to fix this, not here.)

* We generally disbelieve edge-case estimates to begin with.  The
most obvious example is that we don't accept rowcount estimates that
are zero.  There are also some clamps that disbelieve selectivities
approaching 0.0 or 1.0 when estimating from a histogram, and I think
we have a couple other similar rules.  The reason for this is mainly
that taking such estimates at face value creates too much risk of
severe relative error due to imprecise or out-of-date statistics.
So a special case for stanullfrac = 1.0 seems to go directly against
that mindset.

I agree that there might be some gold to be mined in this area,
as we haven't thought particularly hard about high-stanullfrac
situations.  One idea is to figure what stanullfrac says about the
number of non-null rows, and clamp the get_variable_numdistinct
result to be not more than that.  But I still would not want to
trust an exact zero result.

            regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: A couple questions about ordered-set aggregates
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Should the function get_variable_numdistinct consider the case when stanullfrac is 1.0?