Обсуждение: Undocumented AT TIME ZONE INTERVAL syntax

Поиск
Список
Период
Сортировка

Undocumented AT TIME ZONE INTERVAL syntax

От
Corey Huinker
Дата:

In reviewing Paul's application period patch, I noticed some very curious syntax in the test cases. I learned that Paul is equally confused by it, and has asked about it in his PgCon 2020 presentation

> SELECT '2018-03-04' AT TIME ZONE INTERVAL '2' HOUR TO MINUTE;
      timezone      
---------------------
 2018-03-04 05:02:00
(1 row)

Searching around, I found several instances of this syntax being used [1][2][3], but with one important clarifying difference: the expected syntax was

> SELECT '2018-03-04' AT TIME ZONE INTERVAL '2:00' HOUR TO MINUTE;
      timezone      
---------------------
 2018-03-04 07:00:00
(1 row)

Now I understand that the user probably meant to do this:

# SELECT '2018-03-04' AT TIME ZONE INTERVAL '2' HOUR;
      timezone      
---------------------
 2018-03-04 07:00:00
(1 row)


But none of this is in our own documentation.

Before I write a patch to add this to the documentation, I'm curious what level of sloppiness we should tolerate in the interval calculation. Should we enforce the time string to actually conform to the format laid out in the X TO Y spec? If we don't require that, is it correct to say that the values will be filled from order of least significance to greatest?

Re: Undocumented AT TIME ZONE INTERVAL syntax

От
Tom Lane
Дата:
Corey Huinker <corey.huinker@gmail.com> writes:
>> SELECT '2018-03-04' AT TIME ZONE INTERVAL '2' HOUR TO MINUTE;

> ... But none of this is in our own documentation.

That's not entirely true.  [1] says

    When writing an interval constant with a fields specification, or when
    assigning a string to an interval column that was defined with a
    fields specification, the interpretation of unmarked quantities
    depends on the fields. For example INTERVAL '1' YEAR is read as 1
    year, whereas INTERVAL '1' means 1 second. Also, field values “to the
    right” of the least significant field allowed by the fields
    specification are silently discarded. For example, writing INTERVAL '1
    day 2:03:04' HOUR TO MINUTE results in dropping the seconds field, but
    not the day field.

But I'd certainly agree that a couple of examples are not a specification.
Looking at DecodeInterval, it looks like the rule is that unmarked or
ambiguous fields are matched to the lowest field mentioned by the typmod
restriction.  Thus

regression=# SELECT INTERVAL '4:2' HOUR TO MINUTE;
 interval
----------
 04:02:00
(1 row)

regression=# SELECT INTERVAL '4:2' MINUTE TO SECOND;
 interval
----------
 00:04:02
(1 row)

If you wanted to improve this para it'd be cool with me.

> Before I write a patch to add this to the documentation, I'm curious what
> level of sloppiness we should tolerate in the interval calculation. Should
> we enforce the time string to actually conform to the format laid out in
> the X TO Y spec?

We have never thrown away high-order fields:

regression=# SELECT INTERVAL '1 day 4:2' MINUTE TO SECOND;
    interval
----------------
 1 day 00:04:02
(1 row)

AFAICS we consider that the typmod provides a rounding rule, not a
license to transform the value to something entirely different.

I'm not sure what the SQL spec says here, but I'd be real hesitant to
change the behavior of cases that we've accepted for twenty-plus
years, unless they're just obviously insane.  Which these aren't IMO.

            regards, tom lane

[1] https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-INTERVAL-INPUT



Re: Undocumented AT TIME ZONE INTERVAL syntax

От
Corey Huinker
Дата:


On Sun, Sep 19, 2021 at 10:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Corey Huinker <corey.huinker@gmail.com> writes:
>> SELECT '2018-03-04' AT TIME ZONE INTERVAL '2' HOUR TO MINUTE;

> ... But none of this is in our own documentation.

That's not entirely true.  [1] says

    When writing an interval constant with a fields specification, or when
    assigning a string to an interval column that was defined with a
    fields specification, the interpretation of unmarked quantities
    depends on the fields. For example INTERVAL '1' YEAR is read as 1
    year, whereas INTERVAL '1' means 1 second. Also, field values “to the
    right” of the least significant field allowed by the fields
    specification are silently discarded. For example, writing INTERVAL '1
    day 2:03:04' HOUR TO MINUTE results in dropping the seconds field, but
    not the day field.

That text addresses the case of the unadorned string (seconds) and the overflow
case (more string values than places to put them), but doesn't really address
the underflow.
 

But I'd certainly agree that a couple of examples are not a specification.
Looking at DecodeInterval, it looks like the rule is that unmarked or
ambiguous fields are matched to the lowest field mentioned by the typmod
restriction.  Thus

regression=# SELECT INTERVAL '4:2' HOUR TO MINUTE;
 interval
----------
 04:02:00
(1 row)

regression=# SELECT INTERVAL '4:2' MINUTE TO SECOND;
 interval
----------
 00:04:02
(1 row)

# SELECT INTERVAL '04:02' HOUR TO SECOND;

 interval 

----------

 04:02:00


This result was a bit unexpected, and the existing documentation doesn't address underflow cases like this.

So, restating all this to get ready to document it, the rule seems to be:


1. Integer strings with no spaces or colons will always apply to the rightmost end of the restriction given, lack of a restriction means seconds.


Example:


# SELECT INTERVAL '2' HOUR TO SECOND, INTERVAL '2' HOUR TO MINUTE, INTERVAL '2';
 interval | interval | interval
----------+----------+----------
 00:00:02 | 00:02:00 | 00:00:02
(1 row)



2. Strings with time context (space separator for days, : for everything else) will apply starting with the leftmost part of the spec that fits, continuing to the right until string values are exhausted.


Examples:


# SELECT INTERVAL '4:2' HOUR TO SECOND, INTERVAL '4:2' DAY TO SECOND;
 interval | interval
----------+----------
 04:02:00 | 04:02:00

(1 row)

 

If you wanted to improve this para it'd be cool with me.

I think people's eyes are naturally drawn to the example tables, and because the rules for handling string underflow are subtle, I think a few concrete examples are the way to go.

 

> Before I write a patch to add this to the documentation, I'm curious what
> level of sloppiness we should tolerate in the interval calculation. Should
> we enforce the time string to actually conform to the format laid out in
> the X TO Y spec?

We have never thrown away high-order fields:

And with the above I'm now clear that we're fine with the existing behavior for underflow.
 

I'm not sure what the SQL spec says here, but I'd be real hesitant to
change the behavior of cases that we've accepted for twenty-plus
years, unless they're just obviously insane.  Which these aren't IMO.

Yeah, I really didn't expect to change the behavior, but wanted to make sure that the existing behavior was understood. I'll whip up a patch.

Re: Undocumented AT TIME ZONE INTERVAL syntax

От
Corey Huinker
Дата:

Yeah, I really didn't expect to change the behavior, but wanted to make sure that the existing behavior was understood. I'll whip up a patch.

Attached is an attempt at an explanation of the edge cases I was encountering, as well as some examples. If nothing else, the examples will draw eyes and searches to the explanations that were already there.
 
Вложения

Re: Undocumented AT TIME ZONE INTERVAL syntax

От
Tom Lane
Дата:
Corey Huinker <corey.huinker@gmail.com> writes:
> Attached is an attempt at an explanation of the edge cases I was
> encountering, as well as some examples. If nothing else, the examples will
> draw eyes and searches to the explanations that were already there.

I looked this over and have a few thoughts:

* I don't think your explanation of the behavior of colon-separated
times is quite correct; for example, it doesn't correctly describe
this:

regression=# select INTERVAL '2:03:04' minute to second;
 interval
----------
 02:03:04
(1 row)

I think the actual rule is that hh:mm:ss is always interpreted that
way regardless of the typmod (though we may then drop low-order
fields if the typmod says to).  Two colon-separated numbers are
interpreted as hh:mm by default, but as mm:ss if the typmod is
exactly "minute to second".  (This might work better in a separate
para; the one you've modified here is mostly about what we do with
unmarked quantities, but the use of colons makes these numbers
not unmarked.)

* I'm not sure I would bother with examples for half-broken formats
like "2:".  People who really want to know about that can experiment,
while for the rest of us it seems like adding confusion.

* In the same vein, I think your 0002 example adds way more confusion
than illumination.  Maybe better would be a less contrived offset,
say

# SELECT TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40+00' AT TIME ZONE INTERVAL '3:00:00';
      timezone
---------------------
 2001-02-16 23:38:40
(1 row)

which could be put after the second example and glossed as "The third
example rotates a timestamp specified in UTC to the zone three hours
east of Greenwich, using a constant interval as the time zone
specification".

            regards, tom lane