Re: [SQL] `statistical' aggregate functions etc.

Поиск
Список
Период
Сортировка
От jwieck@debis.com (Jan Wieck)
Тема Re: [SQL] `statistical' aggregate functions etc.
Дата
Msg-id m0zWHbu-000EBPC@orion.SAPserv.Hamburg.dsh.de
обсуждение исходный текст
Ответ на `statistical' aggregate functions etc.  (Ulf Mehlig <umehlig@uni-bremen.de>)
Список pgsql-sql
> I'm a novice to postgresql, so please excuse if I'm addressing to the
> wrong mailing list!
>
> I need to specify some column functions for statistical analysis,
> e.g. standard deviation or variance. After reading the user's guideI
> think that's to realize with aggregates, but with those docs only I
> cannot figure out how to do it -- does anybody know whether there is a
> ready-for-use "statistics" package, or how to write something that's
> usable in a SQL statement like this:
>
>           select date, avg(temperature), stddev(temperature)
>           from temperatures
>           where date between '01.01.1999' and '15.02.1999'
>           group by date                                      ?
>

    The example below works and produces

    select a, count(b) as n, average(b),
        variance(b), stddev(b) from t1 group by a;
    a| n|average|        variance|        stddev
    -+--+-------+----------------+--------------
    1| 5|    6.4|             0.8|   0.894427191
    2|10|   3.53| 0.0401111111111|0.200277585144
    3|10|   3.56|0.00488888888889|0.069920589878
    (3 rows)

    but  it  is  slow.  On  my good old 486DX4 100MHz here in the
    office it runs approx. 80 seconds if there are 6400  rows  in
    t1.  If there are only a few rows to output, most of the time
    is spent in the collector. So you  would  need  at  least  to
    create  a  new  postgres  basetype  that  holds  two float8's
    internal where the external representation is a Tcl list  and
    replace  the stat_collect() function by a C language one. But
    when already doing so, implementing the others in C too isn't
    that much more work.

    Doing  it  all  in  C would require 6 functions to implement.
    The C equivalents of the four functions below plus the  input
    and output functions for the new data type.

    Call  me  on +49 40 5395-1404 (office) if you have trouble or
    don't know how to do it at all.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #


Statistical aggregate example in PL/Tcl:

create table t1 (a int4, b float8);

insert into t1 values (1, 5.0);
insert into t1 values (1, 6.0);
insert into t1 values (1, 7.0);
insert into t1 values (1, 7.0);
insert into t1 values (1, 7.0);

insert into t1 values (2, 3.3);
insert into t1 values (2, 3.5);
insert into t1 values (2, 3.7);
insert into t1 values (2, 3.2);
insert into t1 values (2, 3.6);
insert into t1 values (2, 3.5);
insert into t1 values (2, 3.6);
insert into t1 values (2, 3.4);
insert into t1 values (2, 3.6);
insert into t1 values (2, 3.9);

insert into t1 values (3, 3.5);
insert into t1 values (3, 3.6);
insert into t1 values (3, 3.6);
insert into t1 values (3, 3.7);
insert into t1 values (3, 3.5);
insert into t1 values (3, 3.6);
insert into t1 values (3, 3.5);
insert into t1 values (3, 3.5);
insert into t1 values (3, 3.6);
insert into t1 values (3, 3.5);

create function stat_collect (text, float8) returns text as '
    if {[llength $1] == 0} {
        set sum [expr double($2)]
    set qsum [expr double($2) * double($2)]
    } else {
        set sum [expr [lindex $1 0] + double($2)]
    set qsum [expr [lindex $1 1] + double($2) * double($2)]
    }
    return [list $sum $qsum]
' language 'pltcl';


create function stat_average (float8, int4) returns float8 as '
    if {$2 == 0} {return 0.0}

    return [expr double($1) / double($2)]
' language 'pltcl';


create function stat_variance (text, int4) returns float8 as '
    if {$2 < 2} {return 0.0}

    set n [expr double($2)]
    set sum [lindex $1 0]
    set qsum [lindex $1 1]

    set average  [expr $sum / $n]
    set avgqsum  [expr $average * $average * $n]
    set variance [expr ($qsum - $avgqsum) / ($n - 1.0)]

    return $variance
' language 'pltcl';


create function stat_stddev (text, int4) returns float8 as '
    if {$2 < 2} {return 0.0}

    set n [expr double($2)]
    set sum [lindex $1 0]
    set qsum [lindex $1 1]

    set average  [expr $sum / $n]
    set avgqsum  [expr $average * $average * $n]
    set variance [expr ($qsum - $avgqsum) / ($n - 1.0)]
    set stddev   [expr sqrt($variance)]

    return $stddev
' language 'pltcl';


create aggregate average (
    basetype  = float8,
    stype1    = float8,
    stype2    = int4,
    sfunc1    = float8pl,
    sfunc2    = int4inc,
    finalfunc = stat_average,
    initcond1 = '0.0',
    initcond2 = '0'
);


create aggregate variance (
    basetype  = float8,
    stype1    = text,
    stype2    = int4,
    sfunc1    = stat_collect,
    sfunc2    = int4inc,
    finalfunc = stat_variance,
    initcond1 = '',
    initcond2 = '0'
);


create aggregate stddev (
    basetype  = float8,
    stype1    = text,
    stype2    = int4,
    sfunc1    = stat_collect,
    sfunc2    = int4inc,
    finalfunc = stat_stddev,
    initcond1 = '',
    initcond2 = '0'
);


select a, count(b) as n, average(b),
    variance(b), stddev(b) from t1 group by a;

В списке pgsql-sql по дате отправления:

Предыдущее
От: Gregor Zych
Дата:
Сообщение: Probs with arrays
Следующее
От: "Philippe Rochat (RSR: 318 17 93)"
Дата:
Сообщение: Time format ? (Really microsecond ??)