Re: Implementing STDDEV and VARIANCE

Поиск
Список
Период
Сортировка
От JanWieck@t-online.de (Jan Wieck)
Тема Re: Implementing STDDEV and VARIANCE
Дата
Msg-id 200006091039.MAA04186@hot.jw.home
обсуждение исходный текст
Ответ на Re: Implementing STDDEV and VARIANCE  (Bruce Momjian <pgman@candle.pha.pa.us>)
Список pgsql-hackers
Bruce Momjian wrote:
   I  created  all that sometimes back. Dunno why never added it   to contrib.  Will post it another day.

Jan

> Has this gone anywhere?
>
> > I'd like to implement stddev and variance aggregates in Postgres. This is a
> > long standing TODO item.
> >
> > There already has been some discussion on implementing this, see
> > http://www.postgresql.org/mhonarc/pgsql-hackers/1998-06/msg00175.html
> >
> > There are two definitions for standard deviation and variance:
> >                                          _
> > population variance = sigma^2 = SUM(X - X)^2 / N
> >
> > population stddev = sqrt(population variance)
> >                                  _
> > sample variance = s^2 = SUM(X - X)^2 / (N-1)
> >
> > sample stddev = sqrt(sample variance)
> >
> > These statistics can be calculated in one pass when three variables are
> > kept when scanning through the rows, n, sum(x), sum(x^2). Currently, only
> > two variables are kept.
> >
> > E.g. avg() is calculated as follows:
> >
> > sx = 0
> > n = 0
> > for every row {
> >  sx = sx + value in row                   // transition function 1
> >  n = n+1                                  // transition function 2
> > }
> > avg = sum(x) / n
> >
> > stddev / variance might be calculated as follows:
> >
> > sx = 0
> > n = 0
> > sx2 = 0
> > for every row {
> >  sx = sx + value in row                   // transition function 1
> >  n = n+1                                  // transition function 2
> >  sx2 = sx2 + value in row^2               // transition function 3
> > }
> > var = (1/n) * (sx2 - (1/n) * sx^2)              // Population
> >
> > or
> >
> > var = (1/(n-1)) * (sx2 - (1/n) * sx^2)          // Sample
> >
> > and
> >
> > stddev = sqrt(var)
> >
> > I've looked through the code and the following things need to be implemented:
> >
> > 1. Add three columns to pg_aggregate for the additional third transition
> > function.
> >
> > Tom Lane wrote at Fri, 05 Jun 1998 11:24:04 -0400:
> >  >All that you need to implement this is room to keep two running
> >  >sums instead of one.  I haven't looked at pgsql's aggregate functions,
> >  >but I'd hope that the working state can be a struct not just a
> >  >single number.
> >
> > I saw no other way than adding another transition function and logic, as
> > this might break user-defined aggregates (are there any around?).
> >
> > 2. Add logic to nodeAgg.c to execute the third transition function and
> > finalize function with three rather than two parameters
> > 3. Add functions f(a,b) that returns a + b^2 for selected types
> > 4. Add four finalize functions to calculate the variance / stddev
> > 5. Update the code for create aggregate, to include the definition of the
> > third transition function
> > 6. Update the documentation
> >
> > My questions are:
> > 1. Is this the correct way to continue? What am I missing? Any errors in my
> > reasoning?
> > 2. I am proposing the names stddev(x) and variance(x) for population and
> > samplestddev(x) and
> > samplevariance(x) for sample statistics. Any comments?
> > 3. I'm planning to implement this for types float4, float8 and numeric. Any
> > other types also? int[2,4,8] don't seem logical, as these would introduce
> > serious rounding errors.
> >
> > Let me know what you think,
> >
> >
> > Jeroen
> >
> > ************
> >
>
>
> --
>   Bruce Momjian                        |  http://www.op.net/~candle
>   pgman@candle.pha.pa.us               |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
>


--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: freefuncs.c is never called from anywhere!?
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: freefuncs.c is never called from anywhere!?]