Обсуждение: Strange behavior in generate_series(date, date, interval) with DST
I've noticed a strange behavior in the generate_series functions.
I'm trying to get all days between a start and an end date including the bounds. So naturally I've tried something like the query below
.
The real query uses generate_series to join other tables and is much more complicated, but for the sake of brevity, I think that this query is good enough to show the problem.
:postgres=# SELECT generate_series('2014-10-10'::DATE,
'2014-10-15'::DATE, '1 DAY'::INTERVAL); generate_series
------------------------
2014-10-10 00:00:00-03
2014-10-11 00:00:00-03
2014-10-12 00:00:00-03
2014-10-13 00:00:00-03
2014-10-14 00:00:00-03
2014-10-15 00:00:00-03
(6 rows)
Please note that the upper bound
'2014-10-15' is included in the resulting rows.Now if I try this same query with slightly different dates I get:
postgres=# SELECT generate_series('2014-10-15'::DATE, '2014-10-20'::DATE, '1 DAY'::INTERVAL);
generate_series
------------------------
2014-10-15 00:00:00-03
2014-10-16 00:00:00-03
2014-10-17 00:00:00-03
2014-10-18 00:00:00-03
2014-10-19 01:00:00-02
(5 rows)
-----------------------------------------------
The upper bound is not included in the results!
-----------------------------------------------
Here, in Brazil our DST started on Oct 19. So if I had to guess I would say that this strange behavior is due to the DST, but I'm having a hard time to understand why this is happening!
Is this expected behavior?
I know that I can achieve the results I expect with the following query:
postgres=# WITH RECURSIVE days(d) AS (
SELECT '2014-10-15'::DATE
UNION ALL
SELECT d+1 FROM days WHERE d < '2014-10-20'::DATE
)
SELECT * FROM days;
d
------------
2014-10-15
2014-10-16
2014-10-17
2014-10-18
2014-10-19
2014-10-20
(6 rows)
But using that instead of generate_series, just feels wrong so I would like to understand what's happening and if there is a way to overcome that, before changing my queries.
Thank you!
Sérgio Saquetim
On 12/07/2014 12:11 PM, Sérgio Saquetim wrote: > I've noticed a strange behavior in the generate_series functions. > > I'm trying to get all days between a start and an end date including the > bounds. So naturally I've tried something like the query below > . > > The real query uses generate_series to join other tables and is much > more complicated, but for the sake of brevity, I think that this query > is good enough to show the problem. > : > > postgres=# SELECT generate_series('2014-10-10'::DATE, > > '2014-10-15'::DATE, '1 DAY'::INTERVAL); > generate_series > ------------------------ > 2014-10-10 00:00:00-03 > 2014-10-11 00:00:00-03 > 2014-10-12 00:00:00-03 > 2014-10-13 00:00:00-03 > 2014-10-14 00:00:00-03 > 2014-10-15 00:00:00-03 > > (6 rows) > > Please note that the upper bound > > '2014-10-15' is included in the resulting rows. > > Now if I try this same query with slightly different dates I get: > > postgres=# SELECT generate_series('2014-10-15'::DATE, > '2014-10-20'::DATE, '1 DAY'::INTERVAL); > generate_series > ------------------------ > 2014-10-15 00:00:00-03 > 2014-10-16 00:00:00-03 > 2014-10-17 00:00:00-03 > 2014-10-18 00:00:00-03 > 2014-10-19 01:00:00-02 > (5 rows) > > ----------------------------------------------- > The upper bound is not included in the results! > ----------------------------------------------- > > Here, in Brazil our DST started on Oct 19. So if I had to guess I would > say that this strange behavior is due to the DST, but I'm having a hard > time to understand why this is happening! > > Is this expected behavior? > > I know that I can achieve the results I expect with the following query: > > postgres=# WITH RECURSIVE days(d) AS ( > SELECT '2014-10-15'::DATE > UNION ALL > SELECT d+1 FROM days WHERE d < '2014-10-20'::DATE > ) > SELECT * FROM days; > d > ------------ > 2014-10-15 > 2014-10-16 > 2014-10-17 > 2014-10-18 > 2014-10-19 > 2014-10-20 > (6 rows) > > But using that instead of generate_series, just feels wrong so I would > like to understand what's happening and if there is a way to overcome > that, before changing my queries. First generate_series is really looking for a timestamp on input and returns timestamps: http://www.postgresql.org/docs/9.3/interactive/functions-srf.html generate_series(start, stop, step interval) timestamp or timestamp with time zone setof timestamp or setof timestamp with time zone (same as argument type) Generate a series of values, from start to stop with a step size of step So: test=# set timezone='Brazil/East'; SET test=# SELECT generate_series('2014-10-10'::TIMESTAMP, '2014-10-20'::TIMESTAMP, '1 DAY'::INTERVAL); generate_series --------------------- 2014-10-10 00:00:00 2014-10-11 00:00:00 2014-10-12 00:00:00 2014-10-13 00:00:00 2014-10-14 00:00:00 2014-10-15 00:00:00 2014-10-16 00:00:00 2014-10-17 00:00:00 2014-10-18 00:00:00 2014-10-19 00:00:00 2014-10-20 00:00:00 test=# SELECT generate_series('2014-10-10'::TIMESTAMP WITH TIME ZONE, '2014-10-20'::TIMESTAMP WITH TIME ZONE, '1 DAY'::INTERVAL); generate_series ------------------------ 2014-10-10 00:00:00-03 2014-10-11 00:00:00-03 2014-10-12 00:00:00-03 2014-10-13 00:00:00-03 2014-10-14 00:00:00-03 2014-10-15 00:00:00-03 2014-10-16 00:00:00-03 2014-10-17 00:00:00-03 2014-10-18 00:00:00-03 2014-10-19 01:00:00-02 Though this part I do not understand: test=# select '2014-10-19'::timestamp with time zone; timestamptz ------------------------ 2014-10-19 01:00:00-02 (1 row) test=# select '2014-10-20'::timestamp with time zone; timestamptz ------------------------ 2014-10-20 00:00:00-02 (1 row) I thought interval understood 23/25 'day' across DST/ST boundaries. I will have to think more on this. > > Thank you! > > Sérgio Saquetim > -- Adrian Klaver adrian.klaver@aklaver.com
On 12/07/2014 12:11 PM, Sérgio Saquetim wrote: > I've noticed a strange behavior in the generate_series functions. > > I'm trying to get all days between a start and an end date including the > bounds. So naturally I've tried something like the query below > . > > The real query uses generate_series to join other tables and is much > more complicated, but for the sake of brevity, I think that this query > is good enough to show the problem. > > But using that instead of generate_series, just feels wrong so I would > like to understand what's happening and if there is a way to overcome > that, before changing my queries. To follow up, it looks to be a Midnight issue. I live on the US West Coast so: test=# show timezone; TimeZone ------------ US/Pacific (1 row) Our Spring change happened March 9th at 2:00 AM: test=# select '2014-03-09 01:00'::timestamp with time zone ; timestamptz ------------------------ 2014-03-09 01:00:00-08 (1 row) test=# select '2014-03-09 02:00'::timestamp with time zone ; timestamptz ------------------------ 2014-03-09 03:00:00-07 (1 row) When I do a similar generate_series: test=# SELECT generate_series('2014-03-01'::DATE , '2014-03-10'::DATE , '1 DAY'::INTERVAL); generate_series ------------------------ 2014-03-01 00:00:00-08 2014-03-02 00:00:00-08 2014-03-03 00:00:00-08 2014-03-04 00:00:00-08 2014-03-05 00:00:00-08 2014-03-06 00:00:00-08 2014-03-07 00:00:00-08 2014-03-08 00:00:00-08 2014-03-09 00:00:00-08 2014-03-10 00:00:00-07 (10 rows) it works. So it seems there is some confusion which Midnight is being used for the DATE to timestamp with time zone conversion. > > Thank you! > > Sérgio Saquetim > -- Adrian Klaver adrian.klaver@aklaver.com
You've nailed it, thank you!
Finally I'm understanding what's going on.
I wasn't paying attention to the fact that generate_series really expects for timezone inputs. So when I was passing the upper bound as '2014-10-20'::DATE, the value was being cast to 2014-10-20 00:00:00-02.
postgres=# SELECT '2014-10-20'::TIMESTAMPTZ;
timestamptz
------------------------
2014-10-20 00:00:00-02
(1 row)
But after the DST change the generate_series changes the hour in the generated values as in 2014-10-20 01:00:00-02, which is bigger than 2014-10-20 00:00:00-02 and because of that it's not returned.
Using a larger upper bound solved my problem.
postgres=# SELECT generate_series('2014-10-15 00:00:00'::TIMESTAMPTZ, '2014-10-20 23:59:59'::TIMESTAMPTZ, '1 DAY'::INTERVAL);
generate_series
------------------------
2014-10-15 00:00:00-03
2014-10-16 00:00:00-03
2014-10-17 00:00:00-03
2014-10-18 00:00:00-03
2014-10-19 01:00:00-02
2014-10-20 01:00:00-02
(6 rows)
Thank you again!
Sérgio Saquetim
2014-12-07 20:04 GMT-02:00 Adrian Klaver <adrian.klaver@aklaver.com>:
On 12/07/2014 12:11 PM, Sérgio Saquetim wrote:I've noticed a strange behavior in the generate_series functions.
I'm trying to get all days between a start and an end date including the
bounds. So naturally I've tried something like the query below
.
The real query uses generate_series to join other tables and is much
more complicated, but for the sake of brevity, I think that this query
is good enough to show the problem.
But using that instead of generate_series, just feels wrong so I would
like to understand what's happening and if there is a way to overcome
that, before changing my queries.
To follow up, it looks to be a Midnight issue. I live on the US West Coast so:
test=# show timezone;
TimeZone
------------
US/Pacific
(1 row)
Our Spring change happened March 9th at 2:00 AM:
test=# select '2014-03-09 01:00'::timestamp with time zone ;
timestamptz
------------------------
2014-03-09 01:00:00-08
(1 row)
test=# select '2014-03-09 02:00'::timestamp with time zone ;
timestamptz
------------------------
2014-03-09 03:00:00-07
(1 row)
When I do a similar generate_series:
test=# SELECT generate_series('2014-03-01'::DATE , '2014-03-10'::DATE , '1 DAY'::INTERVAL);
generate_series
------------------------
2014-03-01 00:00:00-08
2014-03-02 00:00:00-08
2014-03-03 00:00:00-08
2014-03-04 00:00:00-08
2014-03-05 00:00:00-08
2014-03-06 00:00:00-08
2014-03-07 00:00:00-08
2014-03-08 00:00:00-08
2014-03-09 00:00:00-08
2014-03-10 00:00:00-07
(10 rows)
it works.
So it seems there is some confusion which Midnight is being used for the DATE to timestamp with time zone conversion.
On Sun, Dec 07, 2014 at 08:25:48PM -0200, Sérgio Saquetim wrote: > > I wasn't paying attention to the fact that generate_series really expects > for timezone inputs. So when I was passing the upper bound > as '2014-10-20'::DATE, the value was being cast to 2014-10-20 00:00:00-02. > > postgres=# SELECT '2014-10-20'::TIMESTAMPTZ; > timestamptz > ------------------------ > 2014-10-20 00:00:00-02 > (1 row) […] > Using a larger upper bound solved my problem. As a more general lesson, I'd suggest that when you're working with dates your best bet is to do it with your time zone as UTC. If you then want to format the output in the local time zone, you can do that in the outer SELECT with AT TIME ZONE. This isn't because Postgres is going to get this wrong, but because it's far too easy to confuse yourself with those time changes. It makes debugging easier, particularly because the time change only happens twice a year so nobody _ever_ thinks of it when troubleshooting. Best regards, A -- Andrew Sullivan ajs@crankycanuck.ca
Hi Sérgio:
On Sun, Dec 7, 2014 at 9:11 PM, Sérgio Saquetim <sergiosaquetim@gmail.com> wrote:
RegardsI've noticed a strange behavior in the generate_series functions.I'm trying to get all days between a start and an end date including the bounds. So naturally I've tried something like the query below.
As both your examples and previous responses highlight your problem is that generate_series is not defined for dates, I'll avoid further comments on this. As you are using dates and your working example is using dates I would suggest rewriting your query around to avoid implicits cast problems. Date difference is integer, generate_series is defined on them, so:
=> select generate_series(0,'2014-10-20'::date - '2014-10-10'::date);
generate_series
-----------------
0
1
2
3
4
5
6
7
8
9
10
(11 rows)
cdrs=> select '2014-10-10'::date+ generate_series(0,'2014-10-20'::date - '2014-10-10'::date);
?column?
------------
2014-10-10
2014-10-11
2014-10-12
2014-10-13
2014-10-14
2014-10-15
2014-10-16
2014-10-17
2014-10-18
2014-10-19
2014-10-20
(11 rows)
=> select generate_series(0,'2014-10-20'::date - '2014-10-10'::date);
generate_series
-----------------
0
1
2
3
4
5
6
7
8
9
10
(11 rows)
cdrs=> select '2014-10-10'::date+ generate_series(0,'2014-10-20'::date - '2014-10-10'::date);
?column?
------------
2014-10-10
2014-10-11
2014-10-12
2014-10-13
2014-10-14
2014-10-15
2014-10-16
2014-10-17
2014-10-18
2014-10-19
2014-10-20
(11 rows)
Or even
=> select ini + generate_series(0, fin-ini) from (select '2014-10-10'::date as ini, '2014-10-20'::date as fin) data ;
?column?
------------
2014-10-10
2014-10-11
2014-10-12
2014-10-13
2014-10-14
2014-10-15
2014-10-16
2014-10-17
2014-10-18
2014-10-19
2014-10-20
(11 rows)
=> select ini + generate_series(0, fin-ini) from (select '2014-10-10'::date as ini, '2014-10-20'::date as fin) data ;
?column?
------------
2014-10-10
2014-10-11
2014-10-12
2014-10-13
2014-10-14
2014-10-15
2014-10-16
2014-10-17
2014-10-18
2014-10-19
2014-10-20
(11 rows)
Francisco Olarte.