Re: Add min and max execute statement time in pg_stat_statement

Поиск
Список
Период
Сортировка
От Arne Scheffer
Тема Re: Add min and max execute statement time in pg_stat_statement
Дата
Msg-id permail-20150121121804fe5316b600007a2b-scheffa@message-id.uni-muenster.de
обсуждение исходный текст
Ответ на Re: Add min and max execute statement time in pg_stat_statement  (David G Johnston <david.g.johnston@gmail.com>)
Список pgsql-hackers

David G Johnston schrieb am 2015-01-21:
> Andrew Dunstan wrote
> > On 01/20/2015 01:26 PM, Arne Scheffer wrote:

> >> And a very minor aspect:
> >> The term "standard deviation" in your code stands for
> >> (corrected) sample standard deviation, I think,
> >> because you devide by n-1 instead of n to keep the
> >> estimator unbiased.
> >> How about mentioning the prefix "sample"
> >> to indicate this beiing the estimator?


> > I don't understand. I'm following pretty exactly the calculations
> > stated
> > at <http://www.johndcook.com/blog/standard_deviation/>


> > I'm not a statistician. Perhaps others who are more literate in
> > statistics can comment on this paragraph.

> I'm largely in the same boat as Andrew but...

> I take it that Arne is referring to:

> http://en.wikipedia.org/wiki/Bessel's_correction

Yes, it is.

> but the mere presence of an (n-1) divisor does not mean that is what
> is
> happening.  In this particular situation I believe the (n-1) simply
> is a
> necessary part of the recurrence formula and not any attempt to
> correct for
> sampling bias when estimating a population's variance.

That's wrong, it's applied in the end to the sum of squared differences
and therefore per definition the corrected sample standard deviation
estimator.

> In fact, as
> far as
> the database knows, the values provided to this function do represent
> an
> entire population and such a correction would be unnecessary.  I

That would probably be an exotic assumption in a working database
and it is not, what is computed here!

> guess it
> boils down to whether "future" queries are considered part of the
> population
> or whether the population changes upon each query being run and thus
> we are
> calculating the ever-changing population variance.

Yes, indeed correct.
And exactly to avoid that misunderstanding, I suggested to
use the "sample" term.
To speak in Postgresql terms; applied in Andrews/Welfords algorithm
is stddev_samp(le), not stddev_pop(ulation).
Therefore stddev in Postgres is only kept for historical reasons, look at
http://www.postgresql.org/docs/9.4/static/functions-aggregate.html
Table 9-43.

VlG-Arne

> Note point 3 in
> the
> linked Wikipedia article.





> David J.



> --
> View this message in context:
> http://postgresql.nabble.com/Add-min-and-max-execute-statement-time-in-pg-stat-statement-tp5774989p5834805.html
> Sent from the PostgreSQL - hackers mailing list archive at
> Nabble.com.


> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: parallel mode and parallel contexts
Следующее
От: Arne Scheffer
Дата:
Сообщение: Re: Add min and max execute statement time in pg_stat_statement