Re: Query optimizer 8.0.1 (and 8.0)

Поиск
Список
Период
Сортировка
От pgsql@mohawksoft.com
Тема Re: Query optimizer 8.0.1 (and 8.0)
Дата
Msg-id 16805.24.91.171.78.1107800884.squirrel@mail.mohawksoft.com
обсуждение исходный текст
Ответ на Re: Query optimizer 8.0.1 (and 8.0)  (Bruno Wolff III <bruno@wolff.to>)
Ответы Re: Query optimizer 8.0.1 (and 8.0)  (Bruno Wolff III <bruno@wolff.to>)
Список pgsql-hackers
> On Mon, Feb 07, 2005 at 11:27:59 -0500,
>   pgsql@mohawksoft.com wrote:
>>
>> It is inarguable that increasing the sample size increases the accuracy
>> of
>> a study, especially when diversity of the subject is unknown. It is
>> known
>> that reducing a sample size increases probability of error in any poll
>> or
>> study. The required sample size depends on the variance of the whole. It
>> is mathmatically unsound to ASSUME any sample size is valid without
>> understanding the standard deviation of the set.
>
> For large populations the accuracy of estimates of statistics based on
> random
> samples from that population are not very sensitve to population size and
> depends primarily on the sample size. So that you would not expect to need
> to use larger sample sizes on larger data sets for data sets over some
> minimum size.

That assumes a fairly low standard deviation. If the standard deviation is
low, then a minimal sample size works fine. If there was zero deviation in
the  data, then a sample of one works fine.

If the standard deviation is high, then you need more samples. If you have
a high standard deviation and a large data set, you need more samples than
you would need for a smaller data set.

In the current implementation of analyze.c, the default is 100 samples. On
a table of 10,000 rows, that is probably a good number characterize the
data enough for the query optimizer (1% sample). For a table with 4.6
million rows, that's less than 0.002%

Think about an iregularly occuring event, unevenly distributed throughout
the data set. A randomized sample strategy normalized across the whole
data set with too few samples will mischaracterize the event or even miss
it altogether.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Abhijit Menon-Sen
Дата:
Сообщение: Re: Patent issues and 8.1
Следующее
От: Bruno Wolff III
Дата:
Сообщение: Re: Query optimizer 8.0.1 (and 8.0)