Re: [HACKERS] Implementing STDDEV and VARIANCE

Поиск
Список
Период
Сортировка
От Jeroen van Vianen
Тема Re: [HACKERS] Implementing STDDEV and VARIANCE
Дата
Msg-id 4.2.2.20000124111157.00aa2530@mail.design.nl
обсуждение исходный текст
Ответ на Re: [HACKERS] Implementing STDDEV and VARIANCE  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: [HACKERS] Implementing STDDEV and VARIANCE  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
At 11:27 PM 1/23/00 -0500, Tom Lane wrote:
>Jeroen van Vianen <jeroen@design.nl> writes:
> > aggname
> > aggowner
> > aggtype
> > aggtranstype           [ n, sx, sx2 ]
> > agginitfunction        function that does ( n = 0, sx = 0.0, sx2 = 0.0 )
> > aggtransfunction       function that does ( n = n + 1, sx = sx + x,
> >                          sx2 = sx2 + x * x )
> > aggfinalizefunction    function that returns (sx2 - (1/n) * sx * sx ) /
> > n
>
>Right, that's pretty much what I'm visualizing.  One minor detail: there
>is not an "agginitfunction", there is an "agginitvalue".  So your
>special datatype to hold n/sx/sx2 must have at least a typinput function
>that can convert the text string held in pg_aggregate into the desired
>internal form of the initial state.  (At least, that's how it's done
>now.  Do you want to argue to change it?  As long as we're opening up
>the AGG design for reconsideration, we could revisit that choice too.)

I would suggest supplying an initfunction that initializes the datatype 
that holds n/sx/sx2 so you're able to set individual members to NULL if so 
desired. I also won't need to implement typinput for all required aggregate 
types, one small headache less ;-)

count --> int4
min/max --> basetype
sum --> basetype
avg --> basetype, n
stddev, variance --> n, basetype, basetype

> > Might it be better for me to wait for 7.1 before implementing stddev and
> > variance?
>
>Well, you will need to be pretty fast on the draw if you want to get it
>into 7.0 release, since we will be freezing features for beta in a week.

True.

>But I see no reason that you couldn't implement STDDEV within the
>existing framework; just ignore transfn2 and do it as above.  You might
>have some problems with getting the desired response for zero or one
>tuples, but there isn't any way to fix that within the current
>framework :-(.  We have to do the function manager rewrite before you
>can have control over when to return a NULL.  As long as you are willing
>to live with that, you can have useful functionality now.

The problem with zero or one rows is pretty important IMO if you want to 
implement stddev and variance for both population and sample. You won't be 
able to explain the difference in outcome if you don't do it right.

Let me wait for you to overhaul the fmgr code and do all the aggregate 
stuff right in one sweep. Only thing is: how do we deal with current 
user-defined aggregates?

At 09:02 PM 1/23/00 -0800, Don Baccus wrote:
>At 11:27 PM 1/23/00 -0500, Tom Lane wrote:
> >Right, that's pretty much what I'm visualizing.  One minor detail: there
> >is not an "agginitfunction", there is an "agginitvalue".  So your
> >special datatype to hold n/sx/sx2 must have at least a typinput function
> >that can convert the text string held in pg_aggregate into the desired
> >internal form of the initial state.  (At least, that's how it's done
> >now.  Do you want to argue to change it?  As long as we're opening up
> >the AGG design for reconsideration, we could revisit that choice too.)
>
>At the moment I have a hard time visualizing an aggregate function where
>a constant initializer wouldn't serve, but ... what would be the cost of
>the generalization?  It would only be called once per query or subquery
>containing the aggregate, right?

Initializer functions for count need to return 0, for min, max, avg, sum, 
stddev and variance they need to set individual members to NULL (at least 
that's how I see it now). A function returning this (with the new fmgr 
code) would be very easy to implement (I hope ;-) ).

I'll hold my breath until the dust settles and we're starting 7.1.


Jeroen



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Hannu Krosing
Дата:
Сообщение: Re: [HACKERS] Use of Indicies ...
Следующее
От: Zeugswetter Andreas SB
Дата:
Сообщение: AW: [HACKERS] Implementing STDDEV and VARIANCE