Re: Odd statistics behaviour in 7.2

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: Odd statistics behaviour in 7.2
Дата	17 февраля 2002 г. 13:13:27
Msg-id	21967.1013968953@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Odd statistics behaviour in 7.2 ("Gordon A. Runkle" <gar@integrated-dynamics.com>)
Список	pgsql-hackers

Дерево обсуждения

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> It would seem that if you could determine if the number of distinct
> values is _increasing_ as you scan more rows, that an increase in table
> size would also cause an increase, e.g. if you have X distinct values
> looking at N rows, and 2X distinct values looking at 2N rows, that
> clearly would show a scale.

[ thinks for awhile... ]  I don't think that'll help.  You could not
expect an exact 2:1 increase, except in the case of a simple unique
column, which isn't the problem anyway.  So the above would really
have to be coded as "count the number of distinct values in the sample
(d1) and the number in half of the sample (d2); then if d1/d2 >= X
assume the number of distinct values scales".  X is a constant somewhere
between 1 and 2, but where?  I think you've only managed to trade one
arbitrary threshold for another one.

A more serious problem is that the above could easily be fooled by a
distribution that contains a few very-popular values and a larger number
of seldom-seen ones.  Consider for example a column "number of children"
over a database of families.  In a sample of a thousand or so, you might
well see only values 0..4 (or so); if you double the size of the sample,
and find a few rows with 5 to 10 kids, are you then correct to label the
column as scaling with the size of the database?
        regards, tom lane

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Odd statistics behaviour in 7.2