Re: Odd statistics behaviour in 7.2

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Odd statistics behaviour in 7.2
Дата
Msg-id 21967.1013968953@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Odd statistics behaviour in 7.2  ("Gordon A. Runkle" <gar@integrated-dynamics.com>)
Список pgsql-hackers
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> It would seem that if you could determine if the number of distinct
> values is _increasing_ as you scan more rows, that an increase in table
> size would also cause an increase, e.g. if you have X distinct values
> looking at N rows, and 2X distinct values looking at 2N rows, that
> clearly would show a scale.

[ thinks for awhile... ]  I don't think that'll help.  You could not
expect an exact 2:1 increase, except in the case of a simple unique
column, which isn't the problem anyway.  So the above would really
have to be coded as "count the number of distinct values in the sample
(d1) and the number in half of the sample (d2); then if d1/d2 >= X
assume the number of distinct values scales".  X is a constant somewhere
between 1 and 2, but where?  I think you've only managed to trade one
arbitrary threshold for another one.

A more serious problem is that the above could easily be fooled by a
distribution that contains a few very-popular values and a larger number
of seldom-seen ones.  Consider for example a column "number of children"
over a database of families.  In a sample of a thousand or so, you might
well see only values 0..4 (or so); if you double the size of the sample,
and find a few rows with 5 to 10 kids, are you then correct to label the
column as scaling with the size of the database?
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Brian Bruns
Дата:
Сообщение: Re: making way for DRDA
Следующее
От: "Marc G. Fournier"
Дата:
Сообщение: Branch created ... May v7.3 be Born!!