Re: Understanding histograms

Поиск
Список
Период
Сортировка
От Len Shapiro
Тема Re: Understanding histograms
Дата
Msg-id c5ee9b8a0804292332q32b468e3ga6b99e25b56c18c7@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Understanding histograms  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Understanding histograms  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-performance
Tom,

Thank you for your prompt reply.

On Tue, Apr 29, 2008 at 10:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Len Shapiro <len@cs.pdx.edu> writes:
>  > 1. Why does Postgres come up with a negative n_distinct?
>
>  It's a fractional representation.  Per the docs:
>
>  > stadistinct   float4          The number of distinct nonnull data values in the column. A value greater than zero
isthe actual number of distinct values. A value less than zero is the negative of a fraction of the number of rows in
thetable (for example, a column in which values appear about twice on the average could be represented by stadistinct =
-0.5).A zero value means the number of distinct values is unknown 

I asked about n_distinct, whose documentation reads in part "The
negated form is used when ANALYZE believes that the number of distinct
values is likely to increase as the table grows".  and I asked about
why ANALYZE believes that the number of distinct values is likely to
increase.  I'm unclear why you quoted to me the documentation on
stadistinct.
>
>
>  > The "rows=2" estimate makes sense when const = 1 or 5, but it makes no
>  > sense to me for other values of const not in the MVC list.
>  > For example, if I run the query
>  > EXPLAIN SELECT * from sailors where rank = -1000;
>  > Postgres still gives an estimate of "row=2".
>
>  I'm not sure what estimate you'd expect instead?

Instead I would expect an estimate of "rows=0" for values of const
that are not in the MCV list and not in the histogram.  When the
histogram has less than the maximum number of entries, implying (I am
guessing here) that all non-MCV values are in the histogram list, this
seems like a simple strategy and has the virtue of being accurate.

Where in the source is the code that manipulates the histogram?

> The code has a built in
>  assumption that no value not present in the MCV list can be more
>  frequent than the last member of the MCV list, so it's definitely not
>  gonna guess *more* than 2.

That's interesting.  Where is this in the source code?

Thanks for all your help.

All the best,

Len Shapiro

>                         regards, tom lane
>

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Gauri Kanekar"
Дата:
Сообщение: Re: Replication Syatem
Следующее
От: "Pavan Deolasee"
Дата:
Сообщение: Re: Replication Syatem