Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics

Поиск
Список
Период
Сортировка
"Nathan Boley" <npboley@gmail.com> writes:
>> (1) On what grounds do you assert the above?

> For a table with 1000000 non-mcv rows, the planner estimates a result
> set of cardinality 1000000/20 = 50000, not 1.

The real problem in that situation is that you need another twenty slots
in the MCV list.  The MCV list should *always* exhaust the set of values
for which it'd be bad to do an indexscan.  Assuming that the threshold
for switching to an indexscan is somewhere around selectivity 0.005
(I am not certain offhand, but it's in that general area), this cannot
possibly require more than 200 MCV slots, and for most data
distributions it'd be a whole lot less.

Given such an MCV list, the planner will always make the right choice
of whether to do index or seqscan ... as long as it knows the value
being searched for, that is.  Parameterized plans have a hard time here,
but that's not really the fault of the statistics.

> I see where the confusion is coming from. When I said
> What if ( somehow ) we knew that there was a 90%
> chance that query would return an empty result set
> I meant that the planner doesn't know that information. And how could it?

Hmm.  IIRC the estimates are set up on the assumption that you are
searching for a value that occurs in the table.  I suppose there are
applications where that's often false, but as you say, it's hard to know
that in advance.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Nathan Boley"
Дата:
Сообщение: Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics
Следующее
От: Gregory Stark
Дата:
Сообщение: Re: Proposal - improve eqsel estimates by including histogram bucket numdistinct statistics