Re: Add min and max execute statement time in pg_stat_statement

Поиск
Список
Период
Сортировка
От David G. Johnston
Тема Re: Add min and max execute statement time in pg_stat_statement
Дата
Msg-id CAKFQuwb2Up+rMMxa3Wkx4DjwdKrgmiOzGGTnsr6=Lb3uJKy1zw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Add min and max execute statement time in pg_stat_statement  (David Fetter <david@fetter.org>)
Список pgsql-hackers
On Thu, Feb 19, 2015 at 11:10 AM, David Fetter <david@fetter.org> wrote:
On Wed, Feb 18, 2015 at 08:31:09PM -0700, David G. Johnston wrote:
> On Wed, Feb 18, 2015 at 6:50 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
> > On 02/18/2015 08:34 PM, David Fetter wrote:
> >
> >> On Tue, Feb 17, 2015 at 08:21:32PM -0500, Peter Eisentraut wrote:
> >>
> >>> On 1/20/15 6:32 PM, David G Johnston wrote:
> >>>
> >>>> In fact, as far as the database knows, the values provided to this
> >>>> function do represent an entire population and such a correction
> >>>> would be unnecessary.  I guess it boils down to whether "future"
> >>>> queries are considered part of the population or whether the
> >>>> population changes upon each query being run and thus we are
> >>>> calculating the ever-changing population variance.
>
> > I think we should be calculating the population variance.
>
> >> Why population variance and not sample variance?  In distributions
> >> where the second moment about the mean exists, it's an unbiased
> >> estimator of the variance.  In this, it's different from the
> >> population variance.
>
> > Because we're actually measuring the whole population, and not a sample?

We're not.  We're taking a sample, which is to say past measurements,
and using it to make inferences about the population, which includes
all queries in the future.


​"All past measurements" does not qualify as a "random sample" of a population made up of all past measurements and any potential members that may be added in the future.  Without the "random sample" aspect you throw away all pretense of avoiding bias so you might as well just call the totality of the past measurements the population, describe them using population descriptive statistics, and call it a day.

For large populations it isn't going to matter anyway but for small populations the difference of one in the divisor seems like it would make the past performance look worse than it actually was.

David J.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
Следующее
От: Rod Taylor
Дата:
Сообщение: Re: Allow "snapshot too old" error, to prevent bloat