Re: Query optimizer 8.0.1 (and 8.0)

Поиск
Список
Период
Сортировка
От pgsql@mohawksoft.com
Тема Re: Query optimizer 8.0.1 (and 8.0)
Дата
Msg-id 16623.24.91.171.78.1107793679.squirrel@mail.mohawksoft.com
обсуждение исходный текст
Ответ на Re: Query optimizer 8.0.1 (and 8.0)  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Query optimizer 8.0.1 (and 8.0)  (Bruno Wolff III <bruno@wolff.to>)
Список pgsql-hackers
> pgsql@mohawksoft.com writes:
>> On a very basic level, why bother sampling the whole table at all? Why
>> not
>> check one block and infer all information from that? Because we know
>> that
>> isn't enough data. In a table of 4.6 million rows, can you say with any
>> mathmatical certainty that a sample of 100 points can be, in any way,
>> representative?
>
> This is a statistical argument, not a rhetorical one, and I'm not going
> to bother answering handwaving.  Show me some mathematical arguments for
> a specific sampling rule and I'll listen.
>

Tom, I am floored by this response, I am shaking my head in disbelief.

It is inarguable that increasing the sample size increases the accuracy of
a study, especially when diversity of the subject is unknown. It is known
that reducing a sample size increases probability of error in any poll or
study. The required sample size depends on the variance of the whole. It
is mathmatically unsound to ASSUME any sample size is valid without
understanding the standard deviation of the set.

http://geographyfieldwork.com/MinimumSampleSize.htm

Again, I understand why you used the Vitter algorithm, but it has been
proven insufficient (as used) with the US Census TIGER database. We
understand this because we have seen that the random sampling as
implemented has insufficient information to properly characterize the
variance in the data.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Is there a way to make VACUUM run completely outside transaction
Следующее
От: Jan Wieck
Дата:
Сообщение: Re: Patent issues and 8.1