Обсуждение: Statistical Lacunae in Interval type

Поиск
Список
Период
Сортировка

Statistical Lacunae in Interval type

От
David Fetter
Дата:
Kind people,

I just ran across this, and was wondering whether it's worth a
back-patch.  The interval type has an aggregate for average (AVG), but
not one for standard deviation (STDDEV) or variance (VARIANCE).

Is this a bug?  Is there some problem with defining variance over
intervals?

TIA for any pointers in the right direction...

Cheers,
D
-- 
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!


Re: Statistical Lacunae in Interval type

От
Peter Eisentraut
Дата:
David Fetter wrote:
> I just ran across this, and was wondering whether it's worth a
> back-patch.

New features are not back-patched.

> The interval type has an aggregate for average (AVG),
> but not one for standard deviation (STDDEV) or variance (VARIANCE).
>
> Is this a bug?

No, it's a missing feature. :-)

> Is there some problem with defining variance over
> intervals?

If all the operations that are used as part of the calculation of stddev 
are available for intervals, then I don't see one.



Re: Statistical Lacunae in Interval type

От
Tom Lane
Дата:
David Fetter <david@fetter.org> writes:
> I just ran across this, and was wondering whether it's worth a
> back-patch.  The interval type has an aggregate for average (AVG), but
> not one for standard deviation (STDDEV) or variance (VARIANCE).

AFAICS, stddev/variance require the concept of multiplying two input
values together (square, and also square root, are in the formulas).
I don't know what it means to multiply two intervals --- there's no
such operator in Postgres, anyway.

You could possibly approximate the behavior you want with something
likestddev(extract(epoch from interval_col))
which mashes the intervals down to seconds.
        regards, tom lane


Re: Statistical Lacunae in Interval type

От
Peter Eisentraut
Дата:
Tom Lane wrote:
> AFAICS, stddev/variance require the concept of multiplying two input
> values together (square, and also square root, are in the formulas).
> I don't know what it means to multiply two intervals --- there's no
> such operator in Postgres, anyway.

The problem is not much different than recording temperature 
measurements in a numeric column and then taking the standard 
deviation.  Kelvin squared does not make much sense, but it's only an 
intermediate quantity.

The problem is that an interval datum already implies the units, so in 
order to allow interval * interval we would have to add a new type 
"interval squared", which would probably be considered to be a bit 
foolish.



Re: Statistical Lacunae in Interval type

От
Tom Lane
Дата:
Peter Eisentraut <peter_e@gmx.net> writes:
> The problem is that an interval datum already implies the units, so in 
> order to allow interval * interval we would have to add a new type 
> "interval squared", which would probably be considered to be a bit 
> foolish.

Not only foolish but complicated.  Remember that interval internally
is "N months plus X seconds" (where N is integral but X needn't be).
To avoid losing information, a product datatype would have to look
something like "N months-squared plus X months-seconds plus Y
seconds-squared", which offers no intuition whatever about how to
operate on it.  I doubt there's even a unique way to define
square-rooting this.

Add on top the fact that we really need to change interval to be
"M months plus N days plus X seconds" to solve the ever-popular
daylight-savings-transition issues, and a product datatype would
get out of hand altogether.

When I said "mash it down to seconds first", I was speaking very
literally...
        regards, tom lane


Re: Statistical Lacunae in Interval type

От
David Fetter
Дата:
On Mon, Jul 12, 2004 at 11:10:34AM -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > The problem is that an interval datum already implies the units,
> > so in order to allow interval * interval we would have to add a
> > new type "interval squared", which would probably be considered to
> > be a bit foolish.
> 
> Not only foolish but complicated.  Remember that interval internally
> is "N months plus X seconds" (where N is integral but X needn't be).
> To avoid losing information, a product datatype would have to look
> something like "N months-squared plus X months-seconds plus Y
> seconds-squared", which offers no intuition whatever about how to
> operate on it.  I doubt there's even a unique way to define
> square-rooting this.

That's kinda what I was afraid of.  If an interval were defined
internally as a unique number of seconds, it would be easy.

> Add on top the fact that we really need to change interval to be "M
> months plus N days plus X seconds" to solve the ever-popular
> daylight-savings-transition issues, and a product datatype would get
> out of hand altogether.

Yeah.

> When I said "mash it down to seconds first", I was speaking very
> literally...

OK.  So it looks like (oddly) interval can have a std. deviation,
which is measured in seconds, but not a variance.  Is that pretty
close?

Cheers,
D
-- 
David Fetter david@fetter.org http://fetter.org/
phone: +1 510 893 6100   mobile: +1 415 235 3778

Remember to vote!