Обсуждение: Interval arithmetic should emit interval in canonical format

Поиск
Список
Период
Сортировка

Interval arithmetic should emit interval in canonical format

От
Gurjeet Singh
Дата:
It's hard to argue that the current behaviour is wrong, but it's worth a try.

First I'd appreciate the "official" reasons why Postgres prefers to
keep interval values in non-canonical form, like '1 day -23:37:00'
instead of '00:23:00'. I understand it has something to do with a
year/month/day not being exactly 365-days/30-days/24-hours, and/or
operations involving interval and 'timestamp with time zone'. But
since it's not explicitly spelled out in docs or in code (at least I
didn't find it in the obvious places), seeking explanation here. I
understand that the answers may obviate any change in behaviour I am
requesting below.

The interval arithmetic operations may also yield non-canonical
values, and IMHO the 'interval op interval' or 'interval op scalar'
expressions should yield an interval in canonical form. For eg.

postgres=# select '6 days 00:16:00'::interval - '5 days
23:53:00'::interval as result;    result
-----------------1 day -23:37:00

postgres=# select '6 days 00:16:00'::interval + '5 days
23:53:00'::interval as result;     result
------------------11 days 24:09:00

I cannot think of a use case where the above results are any better
than emitting '00:23:00' and '12 days 00:09:00', respectively.

We may not be able to turn every interval datum into canonical form,
but at least the intervals produced as a result of interval operators
can be converted to canonical form to reduce surprises for users. I
may even go as far as proposing rounding up 24-hours into a day, but
not round up days into months or months into years.

I was surprised by the presence of non-canonical form of interval in a
sorted-by-interval result set. The intervals were computed within the
query, using 'timestamp without time zone' values in a table.

# select ...
result
--------
...
00:23:00
00:23:00
1 day -23:37:00
00:23:00
00:22:00
...

The ordering above demonstrates that Postgres _does_ consider '1 day
-23:37:00' == '00:23:00', then it seems pointless to confuse the user
by showing two different representations of the same datum. This also
increases the code complexity required in applications/ORMs to parse
interval data's text representation.

Best regards,
-- 
Gurjeet Singh http://gurjeet.singh.im/

EDB : www.EnterpriseDB.com : The Enterprise PostgreSQL Company



Re: Interval arithmetic should emit interval in canonical format

От
Tom Lane
Дата:
Gurjeet Singh <gurjeet@singh.im> writes:
> The interval arithmetic operations may also yield non-canonical
> values, and IMHO the 'interval op interval' or 'interval op scalar'
> expressions should yield an interval in canonical form.

You're mistaken.

> postgres=# select '6 days 00:16:00'::interval - '5 days
> 23:53:00'::interval as result;
>      result
> -----------------
>  1 day -23:37:00

> postgres=# select '6 days 00:16:00'::interval + '5 days
> 23:53:00'::interval as result;
>       result
> ------------------
>  11 days 24:09:00

> I cannot think of a use case where the above results are any better
> than emitting '00:23:00' and '12 days 00:09:00', respectively.

If that's what you want, use the justify_hours function.  But that's
discarding information, so we're not going to force users to only
be able to get that form.

The reason why Postgres distinguishes '1 day' from '24 hours' is
pretty much the same as the reason it distinguishes '1 month' from
'30 days': adding those expressions to datetime values can produce
different results.

For example, since 2014-03-09 was a daylight-savings transition day
in my zone (US/Eastern),

regression=# select '2014-03-08 00:00'::timestamptz + '2 days'::interval;       ?column?        
------------------------2014-03-10 00:00:00-04
(1 row)

regression=# select '2014-03-08 00:00'::timestamptz + '48 hours'::interval;       ?column?        
------------------------2014-03-10 01:00:00-04
(1 row)

As for months vs. days:

regression=# select '2014-07-01'::date + '1 month'::interval;     ?column?       
---------------------2014-08-01 00:00:00
(1 row)

regression=# select '2014-07-01'::date + '30 days'::interval;     ?column?       
---------------------2014-07-31 00:00:00
(1 row)

> The ordering above demonstrates that Postgres _does_ consider '1 day
> -23:37:00' == '00:23:00', then it seems pointless to confuse the user
> by showing two different representations of the same datum.

Intervals are really three separate scalar values internally (months,
days, seconds).  There isn't any way to handle that fully in a linear
sort order, so the comparison operators fall back to assuming 1 day
is equal to 24 hours (and 1 month is equal to 30 days).  But that
doesn't make them the same for all purposes.
        regards, tom lane