Обсуждение: BUG #16797: EXTRACT(EPOCH FROM timestamp) is not using local timezone
BUG #16797: EXTRACT(EPOCH FROM timestamp) is not using local timezone
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 16797 Logged by: Dana Burd Email address: djburd@gmail.com PostgreSQL version: 12.5 Operating system: Ubuntu 20.04.1 LTS Description: EXTRACT(EPOCH FROM timestamp) should be using the local timezone - which can be set in several ways, see documentation "8.5.3. Time Zones". Here I use SET TIME ZONE to set the local timezone for the client session. -- Expected results (seen from PostgreSQL 9.1.11): # SET TIME ZONE 'EST5EDT'; select extract(epoch from ('01/01/1970 00:00:00'::timestamp)); SET date_part ----------- 18000 (1 row) -- Results from PostgreSQL 12.5: # SET TIME ZONE 'EST5EDT'; select extract(epoch from ('01/01/1970 00:00:00'::timestamp)); SET date_part ----------- 0 (1 row) -- Additional ambiguity -- Expected results (seem from PostgreSQL 9.1.11): # SET TIME ZONE 'EST5EDT'; select extract(epoch from ('01/01/1970 00:00:00'::timestamp)), extract(epoch from ('01/01/1970 00:00:00'::timestamptz)) where '01/01/1970 00:00:00'::timestamp = '01/01/1970 00:00:00'::timestamptz; SET date_part | date_part -----------+----------- 18000 | 18000 (1 row) -- Ambiguous results from PostgreSQL 12.5: # SET TIME ZONE 'EST5EDT'; select extract(epoch from ('01/01/1970 00:00:00'::timestamp)), extract(epoch from ('01/01/1970 00:00:00'::timestamptz)) where '01/01/1970 00:00:00'::timestamp = '01/01/1970 00:00:00'::timestamptz; SET date_part | date_part -----------+----------- 0 | 18000 (1 row) -- Documentation https://www.postgresql.org/docs/12/datatype-datetime.html 8.5.1.3. Time Stamps "Conversions between timestamp without time zone and timestamp with time zone normally assume that the timestamp without time zone value should be taken or given as timezone local time." https://www.postgresql.org/docs/7.4/release-7-4.html E.202. Release 7.4 E.202.3.7. Data Type and Function Changes "Change EXTRACT(EPOCH FROM timestamp) so timestamp without time zone is assumed to be in local time, not GMT (Tom)" -- Server details -- OS timezone (though this should not matter since local timezone is set in the session): OS timezone of server with PostgreSQL 12.5: $ date +"%Z %z" UTC +0000 -- PostgreSQL version # SELECT version(); version ------------------------------------------------------------------------------------------------------------------------------------- PostgreSQL 12.5 (Ubuntu 12.5-0ubuntu0.20.04.1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit (1 row) -- PostgreSQL installed from standard Ubuntu focal repos: $ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.1 LTS Release: 20.04 Codename: focal $ apt list --installed |grep -i postgres postgresql-12/focal-updates,focal-security,now 12.5-0ubuntu0.20.04.1 amd64 [installed,automatic] postgresql-client-12/focal-updates,focal-security,now 12.5-0ubuntu0.20.04.1 amd64 [installed,automatic] postgresql-client-common/focal-updates,focal-security,now 214ubuntu0.1 all [installed,automatic] postgresql-common/focal-updates,focal-security,now 214ubuntu0.1 all [installed,automatic] postgresql/focal-updates,focal-security,now 12+214ubuntu0.1 all [installed]
PG Bug reporting form <noreply@postgresql.org> writes: > EXTRACT(EPOCH FROM timestamp) should be using the local timezone No, type timestamp is explicitly *not* timezone aware. If you use timestamptz (a/k/a timestamp with time zone) you will get the answer you want. > -- Expected results (seen from PostgreSQL 9.1.11): > # SET TIME ZONE 'EST5EDT'; select extract(epoch from ('01/01/1970 > 00:00:00'::timestamp)); This was a bug, cf 9.2.0 release notes [1]: Make EXTRACT(EPOCH FROM timestamp without time zone) measure the epoch from local midnight, not UTC midnight (Tom Lane) This change reverts an ill-considered change made in release 7.3. Measuring from UTC midnight was inconsistent because it made the result dependent on the timezone setting, which computations for timestamp without time zone should not be. The previous behavior remains available by casting the input value to timestamp with time zone. regards, tom lane [1] https://www.postgresql.org/docs/release/9.2.0/
Wondering then, when local timezone is set to anything other than UTC, why does:
'01/01/1970 00:00:00'::timestamp =
'01/01/1970 00:00:00'::timestamptz
'01/01/1970 00:00:00'::timestamptz
To compare these datetime values, postgres is making an implicit cast of some kind - and if they are equal then their epoch values should be equal as well. Thus, to be consistent with the extract epoch from timestamp method chosen in 9.2, these should not be equal or perhaps a type-mismatch error.
Personally, I prefered the previous behavior with the implicit cast to timestamptz when asked to convert timestamp for extracting epoch or other with timezone related purposes - just seems more consistent and expected to me. And yes - I agree being type explicit is the better route here, vs relying on implicit behaviours that could change - I'll do that, this one just bit me from some two decade old code being moved to a new postgres instance.
kind regards,
-dana
On Wed, Dec 30, 2020 at 2:01 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
> EXTRACT(EPOCH FROM timestamp) should be using the local timezone
No, type timestamp is explicitly *not* timezone aware. If you use
timestamptz (a/k/a timestamp with time zone) you will get the
answer you want.
> -- Expected results (seen from PostgreSQL 9.1.11):
> # SET TIME ZONE 'EST5EDT'; select extract(epoch from ('01/01/1970
> 00:00:00'::timestamp));
This was a bug, cf 9.2.0 release notes [1]:
Make EXTRACT(EPOCH FROM timestamp without time zone) measure the epoch
from local midnight, not UTC midnight (Tom Lane)
This change reverts an ill-considered change made in release 7.3.
Measuring from UTC midnight was inconsistent because it made the
result dependent on the timezone setting, which computations for
timestamp without time zone should not be. The previous behavior
remains available by casting the input value to timestamp with time
zone.
regards, tom lane
[1] https://www.postgresql.org/docs/release/9.2.0/
Dana Burd <djburd@gmail.com> writes: > Wondering then, when local timezone is set to anything other than UTC, why > does: > '01/01/1970 00:00:00'::timestamp = > '01/01/1970 00:00:00'::timestamptz > To compare these datetime values, postgres is making an implicit cast of > some kind - and if they are equal then their epoch values should be equal > as well. For comparison purposes, the timestamp value is taken as being in your local zone (the one specified by the timezone GUC). The timestamptz value is just an absolute UTC instant. The above example is a bit confusing since '01/01/1970 00:00:00'::timestamptz is *also* read as being in your local zone --- but that happens when the literal constant is parsed, rather than during execution of the comparison. Presuming EST5EDT zone, '01/01/1970 00:00:00'::timestamptz really means '1970-01-01 00:00:00-05'::timestamptz which is equivalent to '1970-01-01 05:00:00+00'::timestamptz, and then we have to convert the timezone at runtime to do a meaningful comparison. I'd thought this was adequately documented already, but perhaps not. There are a couple of passing references to timestamp<->timestamptz conversions in section 8.5, but really section 9.9 ought to cover datetime comparison behavior, and it doesn't say anything about this. regards, tom lane
I found the original thread that led to the change in 9.2, which comes down to maintaining immutability when executing the extract epoch function - timestamp_part() is marked immutable, yet the input 'timestamp' was changing based on the local timezone setting. Your notes above alluded to that, but detail from the thread was helpful.
Can I suggest a slight alteration in the (9.9.1. EXTRACT) epoch documentation to help others:
"For timestamp with time zone values, the number of seconds since 1970-01-01 00:00:00 UTC (can be negative); for date and timestamp values, the number of seconds since 1970-01-01 00:00:00 UTC (date and timestamp will assume UTC regardless of local timezone in order to maintain immutability - one may explicitly cast timestamp to timestamptz to assume a different timezone); for interval values, the total number of seconds in the interval"
And include in the second position of the example code box:
SELECT EXTRACT(EPOCH FROM '2001-02-16 20:38:40.12'::TIMESTAMP AT TIME ZONE 'PST8PDT');
Result: 982384720.12
Result: 982384720.12
Thanks for the information and maintaining communication history
-dana
On Wed, Dec 30, 2020 at 4:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Dana Burd <djburd@gmail.com> writes:
> Wondering then, when local timezone is set to anything other than UTC, why
> does:
> '01/01/1970 00:00:00'::timestamp =
> '01/01/1970 00:00:00'::timestamptz
> To compare these datetime values, postgres is making an implicit cast of
> some kind - and if they are equal then their epoch values should be equal
> as well.
For comparison purposes, the timestamp value is taken as being in your
local zone (the one specified by the timezone GUC). The timestamptz
value is just an absolute UTC instant. The above example is a bit
confusing since '01/01/1970 00:00:00'::timestamptz is *also* read as
being in your local zone --- but that happens when the literal constant
is parsed, rather than during execution of the comparison. Presuming
EST5EDT zone, '01/01/1970 00:00:00'::timestamptz really means
'1970-01-01 00:00:00-05'::timestamptz which is equivalent to
'1970-01-01 05:00:00+00'::timestamptz, and then we have to convert
the timezone at runtime to do a meaningful comparison.
I'd thought this was adequately documented already, but perhaps not.
There are a couple of passing references to timestamp<->timestamptz
conversions in section 8.5, but really section 9.9 ought to cover
datetime comparison behavior, and it doesn't say anything about this.
regards, tom lane
Dana Burd <djburd@gmail.com> writes: > Can I suggest a slight alteration in the (9.9.1. EXTRACT) epoch > documentation to help others: > "For timestamp with time zone values, the number of seconds since > 1970-01-01 00:00:00 UTC (can be negative); for date and timestamp values, > the number of seconds since 1970-01-01 00:00:00 UTC (date and timestamp > will assume UTC regardless of local timezone in order to maintain > immutability - one may explicitly cast timestamp to timestamptz to assume a > different timezone); for interval values, the total number of seconds in > the interval" Hmm, that's not really right either; it appears to imply that the epoch calculation is timezone-aware, which it specifically isn't for date and timestamp cases. An example (presuming US DST rules): regression=# select extract(epoch from date '2020-03-09') - extract(epoch from date '2020-03-08'); ?column? ---------- 86400 (1 row) regression=# select extract(epoch from timestamp '2020-03-09') - extract(epoch from timestamp '2020-03-08'); ?column? ---------- 86400 (1 row) regression=# select extract(epoch from timestamptz '2020-03-09') - extract(epoch from timestamptz '2020-03-08'); ?column? ---------- 82800 (1 row) The last case knows that there was a DST transition in between, the first two don't take that into account. (You could argue that this is more a property of the types' input conversion routines than of extract() itself, but I think the point is valid anyway.) Perhaps a better phrasing is "for date and timestamp values, the nominal number of seconds since 1970-01-01 00:00:00, without regard to timezone or daylight-savings rules". regards, tom lane
That phrasing makes sense.
You're correct, it is a nominal value, even if there might be a perception that the value aligns with assuming UTC.
It's definitely tricky. Prior to this thread, I would have expected the following to be the equivalent. Now I'd prefer that the first wasn't even an allowed operation without an explicit cast - but that ship has likely sailed long ago.
# SET TIME ZONE 'EST5EDT'; select extract (epoch from '2020-03-09 00:00:00'::timestamp - '2020-03-08 0:00:00'::timestamptz);
date_part
-----------
82800
(1 row)
# SET TIME ZONE 'EST5EDT'; select extract (epoch from '2020-03-09 00:00:00'::timestamp) - extract (epoch from '2020-03-08 0:00:00'::timestamptz);
?column?
----------
68400
date_part
-----------
82800
(1 row)
# SET TIME ZONE 'EST5EDT'; select extract (epoch from '2020-03-09 00:00:00'::timestamp) - extract (epoch from '2020-03-08 0:00:00'::timestamptz);
?column?
----------
68400
regards,
-dana
On Fri, Jan 1, 2021 at 12:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Dana Burd <djburd@gmail.com> writes:
> Can I suggest a slight alteration in the (9.9.1. EXTRACT) epoch
> documentation to help others:
> "For timestamp with time zone values, the number of seconds since
> 1970-01-01 00:00:00 UTC (can be negative); for date and timestamp values,
> the number of seconds since 1970-01-01 00:00:00 UTC (date and timestamp
> will assume UTC regardless of local timezone in order to maintain
> immutability - one may explicitly cast timestamp to timestamptz to assume a
> different timezone); for interval values, the total number of seconds in
> the interval"
Hmm, that's not really right either; it appears to imply that the epoch
calculation is timezone-aware, which it specifically isn't for date and
timestamp cases. An example (presuming US DST rules):
regression=# select extract(epoch from date '2020-03-09') - extract(epoch from date '2020-03-08');
?column?
----------
86400
(1 row)
regression=# select extract(epoch from timestamp '2020-03-09') - extract(epoch from timestamp '2020-03-08');
?column?
----------
86400
(1 row)
regression=# select extract(epoch from timestamptz '2020-03-09') - extract(epoch from timestamptz '2020-03-08');
?column?
----------
82800
(1 row)
The last case knows that there was a DST transition in between, the first
two don't take that into account. (You could argue that this is more a
property of the types' input conversion routines than of extract() itself,
but I think the point is valid anyway.)
Perhaps a better phrasing is "for date and timestamp values, the nominal
number of seconds since 1970-01-01 00:00:00, without regard to timezone
or daylight-savings rules".
regards, tom lane
Dana Burd <djburd@gmail.com> writes: > On Fri, Jan 1, 2021 at 12:18 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Perhaps a better phrasing is "for date and timestamp values, the nominal >> number of seconds since 1970-01-01 00:00:00, without regard to timezone >> or daylight-savings rules". > That phrasing makes sense. Sounds good, done here: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=4d3f03f42227bb351c2021a9ccea2fff9c023cfc Also see https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=319f4d54e82d15d4a0c3f4cc1328c40dba024b5c regards, tom lane