Обсуждение: Data type confusion

Поиск
Список
Период
Сортировка

Data type confusion

От
"Josh Berkus"
Дата:
Tom, Stephan,

I'm writing up the date/time FAQ, and I came across some operator
behavior that confuses me:

If
INTERVAL / INTEGER = INTERVAL

then why does
INTERVAL / INTERVAL = ERROR?

Shouldn't
INTERVAL / INTERVAL = INTEGER?

I'd like to answer this before I finish the FAQ, as it seems
inconsistent behavior.

-Josh


______AGLIO DATABASE SOLUTIONS___________________________
                                       Josh Berkus
  Complete information technology      josh@agliodbs.com
   and data management solutions       (415) 565-7293
  for law firms, small businesses        fax 621-2533
    and non-profit organizations.      San Francisco

Вложения

Re: Data type confusion

От
Allan Engelhardt
Дата:
Josh Berkus wrote:

> This is a multi-part MIME message
>
> --_===97089====davinci.ethosmedia.com===_
> Content-Type: text/plain; charset="ISO-8859-1"
> Content-Transfer-Encoding: 8bit
>
> Tom, Stephan,
>
> I'm writing up the date/time FAQ, and I came across some operator
> behavior that confuses me:
>
> If
> INTERVAL / INTEGER = INTERVAL
>
> then why does
> INTERVAL / INTERVAL = ERROR?
>
> Shouldn't
> INTERVAL / INTERVAL = INTEGER?
>
> I'd like to answer this before I finish the FAQ, as it seems
> inconsistent behavior.
>
> -Josh
>
>

Josh,

I'm not Tom or Stephan (sorry) but in your scenario what would be the result of, say, dividing '3 months ago' with '6
seconds'in the future?
 

I don't think it makes conceptual sense to divide intervals....


Allan.



Re: Data type confusion

От
Tom Lane
Дата:
"Josh Berkus" <josh@agliodbs.com> writes:
> If
> INTERVAL / INTEGER = INTERVAL

Actually the operator appears to be INTERVAL / FLOAT8.

> then why does
> INTERVAL / INTERVAL = ERROR?

Because no one got around to creating an INTERVAL / INTERVAL operator.
There are plenty of such gaps in our operator set...

> Shouldn't
> INTERVAL / INTERVAL = INTEGER?

I'd think the output should be FLOAT8, myself, since the result
could be fractional.

Anyway, the generic response to such questions is "feel free to
code it up and submit a patch".
        regards, tom lane


Re: Data type confusion

От
"Josh Berkus"
Дата:
Tom,

> > then why does
> > INTERVAL / INTERVAL = ERROR?
>
> Because no one got around to creating an INTERVAL / INTERVAL
> operator.
> There are plenty of such gaps in our operator set...

Bummer.  If I could "C" then maybe I'd do something about it.

>
> > Shouldn't
> > INTERVAL / INTERVAL = INTEGER?
>
> I'd think the output should be FLOAT8, myself, since the result
> could be fractional.
>
> Anyway, the generic response to such questions is "feel free to
> code it up and submit a patch".

Sorry.  :( I'm stricly a "high-level user".

I can, however, document it so that others won't pester you for
questions about why it doesn't work.

-Josh


______AGLIO DATABASE SOLUTIONS___________________________
                                       Josh Berkus
  Complete information technology      josh@agliodbs.com
   and data management solutions       (415) 565-7293
  for law firms, small businesses        fax 621-2533
    and non-profit organizations.      San Francisco

Вложения

Re: Re: Data type confusion

От
Tom Lane
Дата:
Allan Engelhardt <allane@cybaea.com> writes:
> I don't think it makes conceptual sense to divide intervals....

It is kinda bogus, given the underlying semantics of intervals
(integer months plus float seconds).  The problem already arises
for the existing interval * float8 and interval / float8 operators,
though, so it'd be easy enough to make an interval / interval operator
that is consistent with them.  What those operators do is to convert
any fractional-month result into seconds at an arbitrary conversion
factor of 30 days to the month.  For example, consider

regression=# select '5 months 9 days'::interval;  ?column?    
---------------5 mons 9 days
(1 row)

regression=# select '5 months 9 days'::interval * 0.5;      ?column?       
----------------------2 mons 19 days 12:00
(1 row)

The initial product is effectively 2.5 months plus 4.5 days,
and then we translate the .5 months into 15 days.

This is pretty grotty, and AFAIK not documented anywhere --- I found it
out by looking at the C code for these operators.  But I'm not sure
how to do better.
        regards, tom lane


Re: Re: Data type confusion

От
Peter Eisentraut
Дата:
Tom Lane writes:

> It is kinda bogus, given the underlying semantics of intervals
> (integer months plus float seconds).

> This is pretty grotty, and AFAIK not documented anywhere --- I found it
> out by looking at the C code for these operators.  But I'm not sure
> how to do better.

One day we will have to accept the fact that months and seconds must not
be mixed, period.  You can have year/month intervals or
day/hour/minute/second intervals, not a combination.  An interval of '5
years 3 minutes' has no meaning with the natural calendar rules.

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter



Re: Re: Data type confusion

От
Tom Lane
Дата:
Peter Eisentraut <peter_e@gmx.net> writes:
> One day we will have to accept the fact that months and seconds must not
> be mixed, period.  You can have year/month intervals or
> day/hour/minute/second intervals, not a combination.  An interval of '5
> years 3 minutes' has no meaning with the natural calendar rules.

I don't agree --- five years and three minutes is perfectly meaningful.
There are only certain things you can validly do with it, however, and
scaling by a floating-point number isn't one of them, because fractional
months aren't well-defined.  But you can, for example, add it to or
subtract it from a timestamp to produce a well-defined result timestamp.

The real bogosity in the interval type is that months and seconds are
not sufficient: it should be months, days, and seconds.  As we get
reminded twice a year by the regression tests, "1 day" and "24 hours"
are not the same thing.
        regards, tom lane


Re: Re: Data type confusion

От
Peter Eisentraut
Дата:
Tom Lane writes:

> I don't agree --- five years and three minutes is perfectly meaningful.
> There are only certain things you can validly do with it, however, and
> scaling by a floating-point number isn't one of them, because fractional
> months aren't well-defined.  But you can, for example, add it to or
> subtract it from a timestamp to produce a well-defined result timestamp.

Maybe.  Or maybe not.  Take 1 year and 3 seconds.  E.g.,

'2001-08-06 03:03:03' - '1 year 3 seconds' = '2000-08-06 03:03:00'

'2000-08-06 03:03:03' - '1 year 3 seconds' = '1999-08-06 03:03:00'

but

'2001-08-06 03:03:03' - '2000-08-06 03:03:00' = '365 days 3 seconds'

'2000-08-06 03:03:03' - '1999-08-06 03:03:00' = '366 days 3 seconds'

This means either

a) A value such as '1 year 3 seconds' varies depending on context, which
is not how our system is intended to work, or

b) The normal rules of arithmetic do not hold.  I doubt the following is
was good idea:

select timestamp '2000-08-06 03:03:03' - ( timestamp '2000-08-06 03:03:03' - interval '1 year 3 seconds' );
?column?
-------------------366 days 00:00:03

select timestamp '2000-08-06 03:03:03' - timestamp '2000-08-06 03:03:03' + interval '1 year 3 seconds' ;   ?column?
-----------------1 year 00:00:03


On the other hand, in certain applications even fractional months may be
useful.  Banks sometimes organize a year as 360 days and months as 30
days, so talking about 0.5 months might make sense.  However, in this case
again, years/months and days/seconds must not be mixed.

Another interesting tidbit here:

select interval '1 year 00:00:03' = interval '360 days 00:00:03' ;?column?
----------t

> The real bogosity in the interval type is that months and seconds are
> not sufficient: it should be months, days, and seconds.  As we get
> reminded twice a year by the regression tests, "1 day" and "24 hours"
> are not the same thing.

Agreed.

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter



Re: Re: Data type confusion

От
Tom Lane
Дата:
Peter Eisentraut <peter_e@gmx.net> writes:
> but

> '2001-08-06 03:03:03' - '2000-08-06 03:03:00' = '365 days 3 seconds'

> '2000-08-06 03:03:03' - '1999-08-06 03:03:00' = '366 days 3 seconds'

What I said was that timestamp plus or minus interval is well-defined
(when "interval" is a multi-part symbolic interval).  It's quite obvious
that timestamp minus timestamp yielding interval is not uniquely
defined: in the above examples one could express the result either as
you show or as '1 year 3 seconds', which I would argue is preferable.

For a 3-part (month/day/second) interval, I think the preferable rule
for timestamp subtraction is to use the largest symbolic component
possible, ie, use the largest number of months/years you can, then
use the largest number of days fitting in the remainder, then express
what's left as seconds.  This is an arbitrary choice among the many
possible 3-part representations of a given interval, but it seems like
the most natural one for many applications.

> a) A value such as '1 year 3 seconds' varies depending on context, which
> is not how our system is intended to work, or

Isn't it?  The relationship between years, days, and seconds is
*inherently* context dependent in the common calendar.  It might not be
too sensible, but sensibleness has never held sway in calendars, at
least not since the Romans.

I think that the actually useful operations for symbolic intervals
have to do with adding them to (or subtracting them from) timestamps.
For example, I know exactly what I think should happen when I write
now() + '1 day'::interval, and that two days out of the year this
should yield a different result from now() + '24 hours'::interval.
Whatever else we do with intervals has to mesh with that as best
we can make it happen.

I'm not sure your notion of fractional months really holds water,
at least not for this particular operation.  When is 25 Feb 2000
plus 0.95 month?  Is the 0.95 measured with respect to the length
of February, or of March?  Does it matter that 2000 is a leap year?
There may be some other operations that have sensible interpretations
for such a datatype, however.
        regards, tom lane


Re: Re: Data type confusion

От
"Josh Berkus"
Дата:
Folks,

Wow.  Talk about asking dangerous questions ...

> For a 3-part (month/day/second) interval, I think the preferable rule
> for timestamp subtraction is to use the largest symbolic component
> possible, ie, use the largest number of months/years you can, then
> use the largest number of days fitting in the remainder, then express
> what's left as seconds.  This is an arbitrary choice among the many
> possible 3-part representations of a given interval, but it seems
> like
> the most natural one for many applications.

Sure, that makes sense.   In the meantime, I'll add a note to the FAQ
which says "Adding and subtracting wildly disparate time values (e.g. '1
year'::INTERVAL - '3 seconds'::INTERVAL) may cause the database to make
unusual interval value choices which could impair accuracy.  Please test
extensively before relying on operations of this sort."

> Isn't it?  The relationship between years, days, and seconds is
> *inherently* context dependent in the common calendar.  It might not
> be
> too sensible, but sensibleness has never held sway in calendars, at
> least not since the Romans.

Peter is absolutely correct here.  '1 year'::INTERVAL - '1
day'::INTERVAL is '364 days'::INTERVAL most of the time.  However, on
leap years it is '365 days'.

> I'm not sure your notion of fractional months really holds water,
> at least not for this particular operation.  When is 25 Feb 2000
> plus 0.95 month?  Is the 0.95 measured with respect to the length
> of February, or of March?  Does it matter that 2000 is a leap year?
> There may be some other operations that have sensible interpretations
> for such a datatype, however.

One way to simplify this would be not to allow any division operations
on INTERVALS that result in a modulo of a smaller increment than the
INTERVAL value expressed.  Thus, one could "'3 months'::INTERVAL / 3"
but would not be allowed to "'2 months::INTERVAL / 3".   However, this
seems kind of unfair to hour, minute, and second values whose fractions
are well-defined and easily manipulated.

Or, to put it another way, 95% of the time users just want to do simple
things.  Like we want to know how many weeks an employee has been with
us for:  '2 years 3 months'::INTERVAL / '1 week'::INTERVAL  (and we
don't care about the fractional week left over).
Thus we don't want to hold up simple and obvious date multiplication and
division just to deal with the wierdo cases.

-Josh Berkus


______AGLIO DATABASE SOLUTIONS___________________________
                                       Josh Berkus
  Complete information technology      josh@agliodbs.com
   and data management solutions       (415) 565-7293
  for law firms, small businesses        fax 621-2533
    and non-profit organizations.      San Francisco

Вложения

Re: Re: Data type confusion

От
Tom Lane
Дата:
"Josh Berkus" <josh@agliodbs.com> writes:
> Peter is absolutely correct here.  '1 year'::INTERVAL - '1
> day'::INTERVAL is '364 days'::INTERVAL most of the time.  However, on
> leap years it is '365 days'.

Au contraire, it is always '1 year - 1 day'::INTERVAL.  That is a
two-part interval value and is not reduced further.  When you add it
to a date or timestamp, *then* you find out how many days are meant.

> One way to simplify this would be not to allow any division operations
> on INTERVALS that result in a modulo of a smaller increment than the
> INTERVAL value expressed.  Thus, one could "'3 months'::INTERVAL / 3"
> but would not be allowed to "'2 months::INTERVAL / 3".   However, this
> seems kind of unfair to hour, minute, and second values whose fractions
> are well-defined and easily manipulated.

I was toying with the notion of allowing scalings whose results didn't
introduce any fractional part to the "months" field.  For example

'2 months + 1 day' / 2.0  =  '1 month + 12hrs'

'3 months + 1 day' / 2.0  =  error (can't have a half month)

'61 days' / 2.0          =  '30 days 12hrs'

However, I fear that this would make no sense to anyone who hadn't
thought about the issues as carefully as we have in this thread.

> Or, to put it another way, 95% of the time users just want to do simple
> things.  Like we want to know how many weeks an employee has been with
> us for:  '2 years 3 months'::INTERVAL / '1 week'::INTERVAL  (and we
> don't care about the fractional week left over).

Good point.  Ugly as the "30 day" convention is, it is probably close
enough for that sort of thing.
        regards, tom lane


Re: Data type confusion

От
Allan Engelhardt
Дата:
Josh Berkus wrote:

> Or, to put it another way, 95% of the time users just want to do simple
> things.  Like we want to know how many weeks an employee has been with
> us for:  '2 years 3 months'::INTERVAL / '1 week'::INTERVAL  (and we
> don't care about the fractional week left over).
> Thus we don't want to hold up simple and obvious date multiplication and
> division just to deal with the wierdo cases.

I see now what you are trying to do.  It sort of makes sense, but I'm still really reluctant to give (semantic or
otherwise)meaning to "yesterday divided by tomorrow" .....
 

Would it be a better solution if there was a conversion function a la
   convert(text, interval) RETURNS double precision

where text in ('day','week','hour', ....).  The function would convert the interval to the specified unit, with some
considerabledegree of fuzziness as discussed in this thread.
 

Then your query would be simply
   convert('week', '2 years 3 months') / 1

and would return something in the vicinity of 117.0  :-)


It seems to me that such a function would be more generally useful than the division of intervals.  What you really
wantto do is not to divide intervals, but to express them in different time units.  Or am I missing something (again)?
 


Comments?


Allan.



Re: Re: Data type confusion

От
"Josh Berkus"
Дата:
Allan,

> I see now what you are trying to do.  It sort of makes sense, but I'm
> still really reluctant to give (semantic or otherwise) meaning to
> "yesterday divided by tomorrow" .....

I don't agree.  Consider, for example, this statement:

'30 weeks ago'::INTERVAL / '2 weeks'::INTERVAL = -15
Just as 
-30 / 2 = -15

To phrase the equation above:  "How many two week periods is thirty
weeks ago?  Minus fifteen, or fifteen ago." 

This makes perfect sense to me.  

> It seems to me that such a function would be more generally useful
> than the division of intervals.  What you really want to do is not to
> divide intervals, but to express them in different time units.  Or am
> I missing something (again)?

From my pespective?  Yes, you are. (For one thing, the CONVERT function
in Postgres converts between unicode character sets, not data-types).

Look, if I'm designing a payroll application for a company with
bi-weekly payroll, I will want a report that shows how many payroll
periods for which an employee has been employed.  Thus I will want to:

periods_employed := (current_timestamp - date_hired) / '2
weeks'::INTERVAL

I don't want to go through a bunch of non-ANSI SQL-compliant conversion
functions to do it.  Especially not as this is just what the ANSI SQL
data type and operator specs are designed to support.

-Josh


______AGLIO DATABASE SOLUTIONS___________________________                                      Josh Berkus Complete
informationtechnology      josh@agliodbs.com  and data management solutions       (415) 565-7293 for law firms, small
businesses       fax 621-2533   and non-profit organizations.      San Francisco
 


Re: Re: Data type confusion

От
Tom Lane
Дата:
"Josh Berkus" <josh@agliodbs.com> writes:
> I don't want to go through a bunch of non-ANSI SQL-compliant conversion
> functions to do it.  Especially not as this is just what the ANSI SQL
> data type and operator specs are designed to support.

Curiously enough, ANSI doesn't define an INTERVAL-divided-by-INTERVAL
function either.  Also, it rather looks like ANSI adopted the position
Peter E. expressed:
        Year-month intervals are mutually comparable only with other year-        month intervals. [...]
Day-timeintervals are mutually comparable only with other day-        time intervals. [...]        Operations involving
itemsof type datetime require that the date-        time items be mutually comparable. Operations involving items of
   type interval require that the interval items be mutually compara-        ble.
 
        regards, tom lane


Re: Re: Data type confusion

От
"Josh Berkus"
Дата:
Tom,

> Curiously enough, ANSI doesn't define an INTERVAL-divided-by-INTERVAL
> function either.  Also, it rather looks like ANSI adopted the
> position
> Peter E. expressed:
> 
>          Year-month intervals are mutually comparable only with other
> year-
>          month intervals. [...]
>          Day-time intervals are mutually comparable only with other
> day-
>          time intervals. [...]
>          Operations involving items of type datetime require that the
> date-
>          time items be mutually comparable. Operations involving
> items of
>          type interval require that the interval items be mutually
> compara-
>          ble.

Hmmm ... does this mean that I couldn't divide '1 year' by '1 week'?  I
can certaily see not allowing division of '1 year' by '28 seconds' as it
spares us a whole bunch of calendar-generated fuzziness.

It seems to me that:

years,months,weeks,days / years,months,weeks,days is OK, and
days,hours,minutes,seconds / days,hours,minutes,seconds is also easy,
but
years,months,weeks / hours,minutes,seconds is where we get in trouble.

So I propose that we suppot the first two and disallow the third.

Thus I think that we can adhere to the spec, while still providing the
functionality developers want and avoiding a whole lot of '5 months 11
minutes' type headaches.

-Josh



______AGLIO DATABASE SOLUTIONS___________________________                                      Josh Berkus Complete
informationtechnology      josh@agliodbs.com  and data management solutions       (415) 565-7293 for law firms, small
businesses       fax 621-2533   and non-profit organizations.      San Francisco
 


Re: Re: Data type confusion

От
Tom Lane
Дата:
"Josh Berkus" <josh@agliodbs.com> writes:
> Hmmm ... does this mean that I couldn't divide '1 year' by '1 week'?

That's exactly what it says.
        regards, tom lane


Re: Re: Data type confusion

От
"Josh Berkus"
Дата:
Tom,

> > Hmmm ... does this mean that I couldn't divide '1 year' by '1
> week'?
> 
> That's exactly what it says.

If that's the case, we'd need to create some sort of function to specify
the time unit to output timestamp operation into:

to_weeks(current_timestamp - hire_date) / '2 weeks'

... otherwise division and multiplication operators for time values
don't do us much good, as we'd be forced to integer-ize all intervals
before we can perform any operations on them at all.

-Josh Berkus


______AGLIO DATABASE SOLUTIONS___________________________                                      Josh Berkus Complete
informationtechnology      josh@agliodbs.com  and data management solutions       (415) 565-7293 for law firms, small
businesses       fax 621-2533   and non-profit organizations.      San Francisco
 


[warning: largely off-topic] Re: Data type confusion

От
Allan Engelhardt
Дата:
Josh,

Thanks for your explanation.  I'd like to get hold of a copy of SQL99/PKG001 to see what they have actually defined.


I think the INTERVAL type sux :-)  Long rant follows - consider hitting the delete button now.


In this area, there are a number of different concepts that it would make sense to separate.

Let's call one UTIME.  It is time defined as (the time-coordinate of) an event in the history of the Universe.  [We'll
ignoreEinstein for the following.]  It has physical meaning.
 

Let call another LDATE.  It is what you and I normally call a date.  Specifically, it is a legal (or social or
religious)representation of a UTIME.  You can make a contract (legal with man, or religiously with God) using LDATEs.
 

Let's invent a CALENDAR.  It translates between UTIME and LDATE.  It changes all the time ;-(  Well, maybe not *all*
thetime but it is sufficient unstable to be a problem for some applications.   There is not only the problem of Julian
vsGregorian vs some other calendar.   Consider the issue of changing the rules governing summer-time.  It is not long
agothat Britain changed the rule for ending summer time from being the fourth Sunday in October to being the last
Sundayin October.  Some countries seem to decide on a year to year basis if they want to have summer time.   Finally,
theinternational committee that governs the CALENDAR only have to give eight weeks notice when it introduces a
leap-second. (Leap seconds are the reason that struct tm.tm_sec has a range of 0,...,61.)
 

This means that the CALENDAR is, strictly speaking, only known eight weeks in advance.  It also highlights the
differencebetween LDATE and UTIME: if we have a contract for me to start a machine on a specific LDATE and a leap
secondis introduced between now and then, then I'd better change that sleep(n) statement in my control program to
sleep(n+1)or the factory will start too early.
 

I once spent an unhappy week debugging a problem related to this :-(  It really would have started the factory one hour
toolate.
 

Now of course you can define deltas.  Unix systems kind of keep UTIME using a delta: a variable of type time_t holds
thenumber of seconds since a specific event (defined as a given LDATE for a fixed CALENDAR).  Let's call them DUTIME
andDLDATE.
 

They are conceptually different: One day of DLDATE may be 23, 24, or 25 hours of DUTIME, depending on summer time
rules. It may be 86,400 or 86,401 (rarely: 86,402) seconds depending on leap seconds.
 

Important: There is no meaningful translation between DUTIME and DLDATE except for a fixed (start- or end-) UTIME (or
DLDATE).

(No, really!!)

Let's introduce a final concept: a unit of UTIME.  Maybe we call it TIMEU.  You need to measure UTIME is something,
maybeseconds is the fundamental unit (as in SI) and you have other, derived units.
 

We don't need a similar concept for LDATE - it is effectively provided by the CALENDAR.  It is it that which defines
years,months, or whatever the Incas used to use...
 


Now BACK ON-TOPIC:

What's an INTERVAL supposed to be?  DUTIME, DLDATE, TIMEU or something that really belongs to the CALENDAR?

All of the above?


Methinks SQL99 (or PostgreSQL's implementation) is going for the latter option.  Methinks it is confusing.  But then,
I'mthick. :-)
 



Apologies for the interruption -- We now continue the regular scheduled program....



Allan.