Обсуждение: No stddev() for interval?

Поиск
Список
Период
Сортировка

No stddev() for interval?

От
"Brendan Jurd"
Дата:
Hi all,

I noticed a peculiarity in the default postgres aggregate functions.  min(), max() and avg() support interval as an input type, but stddev() and variance() do not.

Is there a rationale behind this, or is it just something that was never implemented?

Regards,
BJ

Re: No stddev() for interval?

От
Tom Lane
Дата:
"Brendan Jurd" <direvus@gmail.com> writes:
> I noticed a peculiarity in the default postgres aggregate functions.  min()=
> ,
> max() and avg() support interval as an input type, but stddev() and
> variance() do not.

> Is there a rationale behind this, or is it just something that was never
> implemented?

Is it sensible to calculate standard deviation on intervals?  How would
you handle the multiple components?  I mean, you could certainly define
*something*, but how sane/useful would the result be?

            regards, tom lane

Re: No stddev() for interval?

От
"Ivan Zolotukhin"
Дата:
On 5/20/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Brendan Jurd" <direvus@gmail.com> writes:
> > I noticed a peculiarity in the default postgres aggregate functions.  min()=
> > ,
> > max() and avg() support interval as an input type, but stddev() and
> > variance() do not.
>
> > Is there a rationale behind this, or is it just something that was never
> > implemented?
>
> Is it sensible to calculate standard deviation on intervals?  How would
> you handle the multiple components?  I mean, you could certainly define
> *something*, but how sane/useful would the result be?

Strictly speaking there's nothing bad in intervals. Physically
standart deviation on interval can be very useful without any doubts.
I can make a lot of examples on this. Say you want to know stat
parameters of semi-regular periodical process (avg distance in time
between maximums of some value and stddev of this quasiperiod -- why
not?).

Regards,
Ivan Zolotukhin

Re: No stddev() for interval?

От
Tom Lane
Дата:
"Ivan Zolotukhin" <ivan.zolotukhin@gmail.com> writes:
> On 5/20/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Is it sensible to calculate standard deviation on intervals?  How would
>> you handle the multiple components?  I mean, you could certainly define
>> *something*, but how sane/useful would the result be?

> Strictly speaking there's nothing bad in intervals. Physically
> standart deviation on interval can be very useful without any doubts.

If the intervals are all expressed in seconds then sure, the calculation
is straightforward and useful.  I'm wondering what happens when nonzero
values of days and months get in there.

            regards, tom lane

Re: No stddev() for interval?

От
"Brendan Jurd"
Дата:

If the intervals are all expressed in seconds then sure, the calculation
is straightforward and useful.  I'm wondering what happens when nonzero
values of days and months get in there.

                        regards, tom lane


The existing logic used in avg(interval) can be seen in backend/utils/adt/timestamp.c, refer to functions interval_accum, interval_avg and interval_div.

interval_div(interval, double) is the most interesting for this discussion.  There is a helpful comment that reads /* evaluate fractional months as 30 days */.

So for example, interval_div('4 mons'::interval, 3) gives you '1 mon 10 days'.

It's not perfect, but doing arithmetic that involves converting between months and days never is.  All in favour of deleting the month as unit of measurement of time say "aye".

Well that's not going to happen in my lifetime.  How about we just extend this same logic over to stddev and variance?  It's strange having avg but not the other two.

Regards,
BJ

Re: No stddev() for interval?

От
David Fetter
Дата:
On Sun, May 21, 2006 at 01:14:15AM +1000, Brendan Jurd wrote:
> Hi all,
>
> I noticed a peculiarity in the default postgres aggregate functions.  min(),
> max() and avg() support interval as an input type, but stddev() and
> variance() do not.
>
> Is there a rationale behind this, or is it just something that was never
> implemented?

That's because variance of foo is measured in foo^2 units.  What is
the square of an interval?

Cheers,
D
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778        AIM: dfetter666
                              Skype: davidfetter

Remember to vote!

Re: No stddev() for interval?

От
"Brendan Jurd"
Дата:

That's because variance of foo is measured in foo^2 units.  What is
the square of an interval?

Cheers,
D

Well if you're willing to accept that for the purposes of computing the aggregates, an interval "month" is equal to 30 days (which is how avg(interval) already works), then an interval is reducable to a single quantity -- a number of seconds -- which can be squared.

30 days per month is pretty rough ... we could refine it to 30.4375, which is the average number of days per month over four years including one leap year.

Re: No stddev() for interval?

От
"Nikolay Samokhvalov"
Дата:
First of all, stddev doesn't return square of smth - so, why should we
worry about intermediate results? Furthermore, statistics work with
any 'units' and doesn't worry about physical meaning of variance in
any case (for example, what about variance for the set of lifetime
values of people from town N in XX century? ;-) ).

Second, SQL standard doesn't contain definition for STDDEV function,
but it has STDDEV_POP and STDDEV_SAMP (but it doesn't really matter
for this discussion). As for valid datatypes for these functions, I
cannot find exact definition unfortunately, but I see folliwing:
'Without Feature T621, "Enhanced numeric functions", conforming SQL
language shall not contain a
<computational operation> that immediately contains STDDEV_POP,
STDDEV_SAMP, VAR_POP, or
VAR_SAMP.'
So, authors meant that these functions should accept only numeric values.

Last but not least, ORA doesn't want to accept interval values for
stddev() func:

CREATE TABLE teststddev(id INTEGER PRIMARY KEY, val INTERVAL YEAR TO MONTH);
INSERT INTO teststddev(id, val) VALUES(1, INTERVAL '300' MONTH(3));
INSERT INTO teststddev(id, val) VALUES(2, INTERVAL '2' YEAR(1));
INSERT INTO teststddev(id, val) VALUES(3, INTERVAL '-125' MONTH(3));
SELECT STDDEV(val) FROM teststddev;

ORA-00932: inconsistent datatypes: expected NUMBER got INTERVAL YEAR TO MONTH

As for me, I think that it's quite reasonable to expect stddev working
with intervals... Why not?

On 5/21/06, David Fetter <david@fetter.org> wrote:
> On Sun, May 21, 2006 at 01:14:15AM +1000, Brendan Jurd wrote:
> > Hi all,
> >
> > I noticed a peculiarity in the default postgres aggregate functions.  min(),
> > max() and avg() support interval as an input type, but stddev() and
> > variance() do not.
> >
> > Is there a rationale behind this, or is it just something that was never
> > implemented?
>
> That's because variance of foo is measured in foo^2 units.  What is
> the square of an interval?
>
> Cheers,
> D
> --
> David Fetter <david@fetter.org> http://fetter.org/
> phone: +1 415 235 3778        AIM: dfetter666
>                               Skype: davidfetter
>
> Remember to vote!
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>


--
Best regards,
Nikolay

Re: No stddev() for interval?

От
"Nikolay Samokhvalov"
Дата:
On 5/20/06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> If the intervals are all expressed in seconds then sure, the calculation
> is straightforward and useful.  I'm wondering what happens when nonzero
> values of days and months get in there.

Ah!
Maybe the reason for such thoughts lies in nature of postgres  intervals.
SQL:2003 standard paper says:
'There are two classes of intervals. One class, called year-month
intervals, has an express or implied datetime
precision that includes no fields other than YEAR and MONTH, though
not both are required. The other class,
called day-time intervals, has an express or implied interval
precision that can include any fields other than
YEAR or MONTH.'

So, the basic question is 'why Postgres allows to combine month and day?'
Actually, is it good idea?

If we have two separate interval types - we haven't the problem of '1
month VS 30 days' at all... And if we have no such a problem, we would
work with intervals as with numbers (I don't see the strong reason for
absense of stddev() and even variance() for INTERVAL YEAR TO MONTH,
INTERVAL DAY TO SECOND, INTERVAL MINUTE TO SECOND and so on).

--
Best regards,
Nikolay

Re: No stddev() for interval?

От
"Nikolay Samokhvalov"
Дата:
On 5/21/06, Brendan Jurd <direvus@gmail.com> wrote:
>
> Well if you're willing to accept that for the purposes of computing the
> aggregates, an interval "month" is equal to 30 days (which is how
> avg(interval) already works), then an interval is reducable to a single
> quantity -- a number of seconds -- which can be squared.
>

Let me make a correction. Internally, intervals are stored as separate
values of months and days (and even seconds, for daylight saving
purposes). So, in almost all cases everything is OK. But it isn't so
when we should multiply or devide such 'hetorogenious intevals'
("select interval '1 month 1 day';" gives us '15 days 12:00:00')...


--
Best regards,
Nikolay

Re: No stddev() for interval?

От
"Ivan Zolotukhin"
Дата:
On 5/21/06, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:
> So, in almost all cases everything is OK. But it isn't so
> when we should multiply or devide such 'hetorogenious intevals'
> ("select interval '1 month 1 day';" gives us '15 days 12:00:00')...

You obviously meant "select interval '1 month 1 day' / 2;" to get
above weird result of 15 days and 12 hours.

Regards,
Ivan Zolotukhin

Re: No stddev() for interval?

От
"Brendan Jurd"
Дата:
Ah!
Maybe the reason for such thoughts lies in nature of postgres  intervals.
SQL:2003 standard paper says:
'There are two classes of intervals. One class, called year-month
intervals, has an express or implied datetime
precision that includes no fields other than YEAR and MONTH, though
not both are required. The other class,
called day-time intervals, has an express or implied interval
precision that can include any fields other than
YEAR or MONTH.'

So, the basic question is 'why Postgres allows to combine month and day?'
Actually, is it good idea?

If we have two separate interval types - we haven't the problem of '1
month VS 30 days' at all... And if we have no such a problem, we would
work with intervals as with numbers (I don't see the strong reason for
absense of stddev() and even variance() for INTERVAL YEAR TO MONTH,
INTERVAL DAY TO SECOND, INTERVAL MINUTE TO SECOND and so on).

--
Best regards,
Nikolay

It certainly would make life easier for interval operations if the month <-> day conversion was no longer a factor.  Keeping the years and months separate from the other fields makes sense to me.  It would be nice from a standards-compliance point of view too.

AFAICT, the only useful reason for the postgres interval to include the 'month' and 'year' units is so that users can perform arithmetic like "now() - '3 months'::interval".  With two separate types, you could still support this kind of operation with the year-month interval, and support all other operations with the day-time interval.

Clearly it would be major effort to build the two new interval types, but I'm thinking it would be worth the trouble (read: I'm willing to sink my own time into it).

Regards
BJ

Re: No stddev() for interval?

От
"Brendan Jurd"
Дата:


On 5/21/06, Brendan Jurd <direvus@gmail.com> wrote:
Ah!
Maybe the reason for such thoughts lies in nature of postgres  intervals.
SQL:2003 standard paper says:
'There are two classes of intervals. One class, called year-month
intervals, has an express or implied datetime
precision that includes no fields other than YEAR and MONTH, though
not both are required. The other class,
called day-time intervals, has an express or implied interval
precision that can include any fields other than
YEAR or MONTH.'



It certainly would make life easier for interval operations if the month <-> day conversion was no longer a factor.  Keeping the years and months separate from the other fields makes sense to me.  It would be nice from a standards-compliance point of view too.

AFAICT, the only useful reason for the postgres interval to include the 'month' and 'year' units is so that users can perform arithmetic like "now() - '3 months'::interval".  With two separate types, you could still support this kind of operation with the year-month interval, and support all other operations with the day-time interval.

Clearly it would be major effort to build the two new interval types, but I'm thinking it would be worth the trouble (read: I'm willing to sink my own time into it).

Regards
BJ


I've been mulling this idea over for the last couple of days, and I still think it would be a big improvement to postgres' interval support.  The whole day <-> month problem is a serious fly in the ointment.  Nobody's shot me down yet ... is there any significant downside to having the proposed two separate interval types?

Here's how I see it working:

 * The existing 'interval' type is left in place with existing functionality, so we don't break existing apps.

 * Two new types are added, let's call them 'month interval' and 'second interval' for the moment.  The internal representation of the month type can be a signed integer, for the second type a signed value that supports a maximum resolution of 1 microsecond.

 * Textual input syntax for 'second interval' is the same general form as for 'interval', with valid fields being second, minute, hour, day and week.  For 'month interval', only month, year, decade, century and millenium are considered valid.

 * Add functions for specifying the new interval types with numeric input, e.g., "months(int) returns month interval", "years(int) returns month interval", "days(int) returns second interval", "seconds(double) returns second interval".

 * Allow (implicit?) cast from month interval to interval, and from second interval to interval, but not the reverse.

 * Eventually, the return type of date and timestamp subtraction becomes second interval.

Regards,
BJ