Обсуждение: Group by range in hour of day

Поиск

Список

Период

Сортировка

Group by range in hour of day

От

Israel Brewster

Дата:

17 марта 2015 г., 00:57:18

I have a table with two timestamp columns for the start time and end time of each record (call them start and end).I'm trying to figure out if there is a way to group these records by "hour of day", that is the record should be included in the group if the hour of the day for the group falls anywhere in the range [start,end]. Obviously each record may well fall into multiple groups under this scenario.

The goal here is to figure out, for each hour of the day, a) what is the total number of "active" records for that hour, and b) what is the total "active" time for those records during the hour, with an ultimate goal of figuring out the average active time per record per hour.

So, for simplified example, if the table contained three records:

start | end

-----------------------------------------------------

2015-03-15 08:15 | 2015-03-15 10:45

2015-03-15 09:30 | 2015-03-15 10:15

2015-03-15 10:30 | 2015-03-15 11:30

Then the results should break out something like this:

hour | count | sum

-----------------------------

8 | 1 | 0.75

9 | 2 | 1.5

10 | 3 | 1.5

11 | 1 | 0.5

I can then easily manipulate these values to get my ultimate goal of the average, which would of course always be less than or equal to 1. Is this doable in postgress? Or would it be a better idea to simply pull the raw data and post-process in code? Thanks.

-----------------------------------------------

Israel Brewster

Systems Analyst II

Ravn Alaska

5245 Airport Industrial Rd

Fairbanks, AK 99709

(907) 450-7293

-----------------------------------------------

Вложения

Israel Brewster.vcf

Re: Group by range in hour of day

От

Adrian Klaver

Дата:

17 марта 2015 г., 01:13:18

On 03/16/2015 02:57 PM, Israel Brewster wrote:
> I have a table with two timestamp columns for the start time and end
> time of each record (call them start and end).I'm trying to figure out
> if there is a way to group these records by "hour of day", that is the
> record should be included in the group if the hour of the day for the
> group falls anywhere in the range [start,end]. Obviously each record may
> well fall into multiple groups under this scenario.
>
> The goal here is to figure out, for each hour of the day, a) what is the
> total number of "active" records for that hour, and b) what is the total
> "active" time for those records during the hour, with an ultimate goal
> of figuring out the average active time per record per hour.
>
> So, for simplified example, if the table contained three records:
>
>           start              |               end
> -----------------------------------------------------
> 2015-03-15 08:15  |  2015-03-15 10:45
> 2015-03-15 09:30  |  2015-03-15 10:15
> 2015-03-15 10:30  |  2015-03-15 11:30
>
>
> Then the results should break out something like this:
>
> hour  |  count  |  sum
> -----------------------------
> 8       |    1       |   0.75
> 9       |    2       |   1.5
> 10     |    3       |   1.5
> 11     |    1       |   0.5
>
> I can then easily manipulate these values to get my ultimate goal of the
> average, which would of course always be less than or equal to 1. Is
> this doable in postgress? Or would it be a better idea to simply pull
> the raw data and post-process in code? Thanks.

Do not have an answer for you, but a question:

What version of Postgres are you on?

This will help determine what tools are available to work with.

>
> -----------------------------------------------
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293
> -----------------------------------------------
>
>
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Group by range in hour of day

От

Paul Jungwirth

Дата:

17 марта 2015 г., 01:16:56

> I have a table with two timestamp columns for the start time and end
> time of each record (call them start and end).I'm trying to figure out
> if there is a way to group these records by "hour of day",

I think you can do this by selecting `FROM generate_series(0, 23) s(h)`
and then joining to your table based on `h BETWEEN start AND end`.

Whenever I need to write a time-series aggregate query I reach for
generate_series. Mostly that's so I have output rows even when COUNT(*)
would be 0, but here it also means that a row from your data can feed
into multiple output rows.

I could probably write this out in more detail if you like, but that's
the short version. :-)

Good luck!

Paul

Re: Group by range in hour of day

От

"David G. Johnston"

Дата:

17 марта 2015 г., 01:22:45

On Mon, Mar 16, 2015 at 3:13 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 03/16/2015 02:57 PM, Israel Brewster wrote:
I have a table with two timestamp columns for the start time and end
time of each record (call them start and end).I'm trying to figure out
if there is a way to group these records by "hour of day", that is the
record should be included in the group if the hour of the day for the
group falls anywhere in the range [start,end]. Obviously each record may
well fall into multiple groups under this scenario.

The goal here is to figure out, for each hour of the day, a) what is the
total number of "active" records for that hour, and b) what is the total
"active" time for those records during the hour, with an ultimate goal
of figuring out the average active time per record per hour.

So, for simplified example, if the table contained three records:

start | end
-----------------------------------------------------
2015-03-15 08:15 | 2015-03-15 10:45
2015-03-15 09:30 | 2015-03-15 10:15
2015-03-15 10:30 | 2015-03-15 11:30

Then the results should break out something like this:

hour | count | sum
-----------------------------
8 | 1 | 0.75
9 | 2 | 1.5
10 | 3 | 1.5
11 | 1 | 0.5

I can then easily manipulate these values to get my ultimate goal of the
average, which would of course always be less than or equal to 1. Is
this doable in postgress? Or would it be a better idea to simply pull
the raw data and post-process in code? Thanks.

Do not have an answer for you, but a question:

What version of Postgres are you on?

This will help determine what tools are available to work with.

The following will give you endpoints for your bounds. Version is important since "range types" could be very useful in this situation - but you'd still need to generate the bounds info regardless.

SELECT *

FROM

(SELECT * FROM generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz, '1 hour'::interval) start (start_ts)) AS s

CROSS JOIN

(SELECT end_ts + '1 hour'::interval AS end_ts FROM generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz, '1 hour'::interval) e (end_ts)) AS e

You would join this using an ON condition with an OR (start BETWEEN [...] OR end BETWEEN [...]) - range logic will be better and you may want to adjust the upper bound by negative 1 (nano-second?) to allow for easier "<=" logic if using BETWEEN.

David J.

Re: Group by range in hour of day

От

Israel Brewster

Дата:

17 марта 2015 г., 01:56:50

On Mar 16, 2015, at 2:13 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>
> On 03/16/2015 02:57 PM, Israel Brewster wrote:
>> I have a table with two timestamp columns for the start time and end
>> time of each record (call them start and end).I'm trying to figure out
>> if there is a way to group these records by "hour of day", that is the
>> record should be included in the group if the hour of the day for the
>> group falls anywhere in the range [start,end]. Obviously each record may
>> well fall into multiple groups under this scenario.
>>
>> The goal here is to figure out, for each hour of the day, a) what is the
>> total number of "active" records for that hour, and b) what is the total
>> "active" time for those records during the hour, with an ultimate goal
>> of figuring out the average active time per record per hour.
>>
>> So, for simplified example, if the table contained three records:
>>
>>          start              |               end
>> -----------------------------------------------------
>> 2015-03-15 08:15  |  2015-03-15 10:45
>> 2015-03-15 09:30  |  2015-03-15 10:15
>> 2015-03-15 10:30  |  2015-03-15 11:30
>>
>>
>> Then the results should break out something like this:
>>
>> hour  |  count  |  sum
>> -----------------------------
>> 8       |    1       |   0.75
>> 9       |    2       |   1.5
>> 10     |    3       |   1.5
>> 11     |    1       |   0.5
>>
>> I can then easily manipulate these values to get my ultimate goal of the
>> average, which would of course always be less than or equal to 1. Is
>> this doable in postgress? Or would it be a better idea to simply pull
>> the raw data and post-process in code? Thanks.
>
> Do not have an answer for you, but a question:
>
> What version of Postgres are you on?
>
> This will help determine what tools are available to work with.

Oh, right. Of course. I'm on 9.4.0

-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------


>
>>
>> -----------------------------------------------
>> Israel Brewster
>> Systems Analyst II
>> Ravn Alaska
>> 5245 Airport Industrial Rd
>> Fairbanks, AK 99709
>> (907) 450-7293
>> -----------------------------------------------
>>
>>
>>
>>
>>
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com

Re: Group by range in hour of day

От

Israel Brewster

Дата:

17 марта 2015 г., 01:59:40

On Mar 16, 2015, at 2:16 PM, Paul Jungwirth <pj@illuminatedcomputing.com> wrote:
>
>> I have a table with two timestamp columns for the start time and end
>> time of each record (call them start and end).I'm trying to figure out
>> if there is a way to group these records by "hour of day",
>
> I think you can do this by selecting `FROM generate_series(0, 23) s(h)` and then joining to your table based on `h
BETWEENstart AND end`. 
>
> Whenever I need to write a time-series aggregate query I reach for generate_series. Mostly that's so I have output
rowseven when COUNT(*) would be 0, but here it also means that a row from your data can feed into multiple output rows. 
>
> I could probably write this out in more detail if you like, but that's the short version. :-)

I think I can work with that :-) Hadn't considered doing a join there, so that's a new approach I can investigate.
Thanks!
>
> Good luck!
>
> Paul
>
>
>
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Re: Group by range in hour of day

От

John W Higgins

Дата:

17 марта 2015 г., 02:02:58

Assuming 3 things

Table name - test

Column names - start_time, end_time

Added an id column (int) to distinguish each record in the table

You can go with this..... (my apologies for formatting issues)

with

slots as (

select *

from generate_series(0,1439) as s(slot)

),

slots_hours as (

select slot,

slot / 60 as hour

from slots

),

minutes as (

select id,

date_part('hour', start_time) * 60 + date_part('minute', start_time) as start_minute,

date_part('hour', end_time) * 60 + date_part('minute', end_time) as end_minute

from test

),

minute_slots as (

select id,

slot,

hour

from minutes

join slots_hours

on minutes.start_minute <= slots_hours.slot

and minutes.end_minute > slots_hours.slot

)

select hour,

count(*) / 60.0 as sum,

count(distinct id) as count

from minute_slots

group by hour

I'm certain there are more elegant solutions possible - but you can grasp each step this way.

John

On Mon, Mar 16, 2015 at 2:57 PM, Israel Brewster <israel@ravnalaska.net> wrote:

I have a table with two timestamp columns for the start time and end time of each record (call them start and end).I'm trying to figure out if there is a way to group these records by "hour of day", that is the record should be included in the group if the hour of the day for the group falls anywhere in the range [start,end]. Obviously each record may well fall into multiple groups under this scenario.

The goal here is to figure out, for each hour of the day, a) what is the total number of "active" records for that hour, and b) what is the total "active" time for those records during the hour, with an ultimate goal of figuring out the average active time per record per hour.

So, for simplified example, if the table contained three records:

start | end
-----------------------------------------------------
2015-03-15 08:15 | 2015-03-15 10:45
2015-03-15 09:30 | 2015-03-15 10:15
2015-03-15 10:30 | 2015-03-15 11:30

Then the results should break out something like this:

hour | count | sum
-----------------------------
8 | 1 | 0.75
9 | 2 | 1.5
10 | 3 | 1.5
11 | 1 | 0.5

I can then easily manipulate these values to get my ultimate goal of the average, which would of course always be less than or equal to 1. Is this doable in postgress? Or would it be a better idea to simply pull the raw data and post-process in code? Thanks.

-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------

Re: Group by range in hour of day

От

Israel Brewster

Дата:

17 марта 2015 г., 02:16:27

On Mar 16, 2015, at 2:22 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Mon, Mar 16, 2015 at 3:13 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 03/16/2015 02:57 PM, Israel Brewster wrote:
I have a table with two timestamp columns for the start time and end
time of each record (call them start and end).I'm trying to figure out
if there is a way to group these records by "hour of day", that is the
record should be included in the group if the hour of the day for the
group falls anywhere in the range [start,end]. Obviously each record may
well fall into multiple groups under this scenario.

The goal here is to figure out, for each hour of the day, a) what is the
total number of "active" records for that hour, and b) what is the total
"active" time for those records during the hour, with an ultimate goal
of figuring out the average active time per record per hour.

So, for simplified example, if the table contained three records:

start | end
-----------------------------------------------------
2015-03-15 08:15 | 2015-03-15 10:45
2015-03-15 09:30 | 2015-03-15 10:15
2015-03-15 10:30 | 2015-03-15 11:30

Then the results should break out something like this:

hour | count | sum
-----------------------------
8 | 1 | 0.75
9 | 2 | 1.5
10 | 3 | 1.5
11 | 1 | 0.5

I can then easily manipulate these values to get my ultimate goal of the
average, which would of course always be less than or equal to 1. Is
this doable in postgress? Or would it be a better idea to simply pull
the raw data and post-process in code? Thanks.

Do not have an answer for you, but a question:

What version of Postgres are you on?

This will help determine what tools are available to work with.

The following will give you endpoints for your bounds. Version is important since "range types" could be very useful in this situation - but you'd still need to generate the bounds info regardless.

SELECT *
FROM
(SELECT * FROM generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz, '1 hour'::interval) start (start_ts)) AS s
CROSS JOIN
(SELECT end_ts + '1 hour'::interval AS end_ts FROM generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz, '1 hour'::interval) e (end_ts)) AS e

You would join this using an ON condition with an OR (start BETWEEN [...] OR end BETWEEN [...]) - range logic will be better and you may want to adjust the upper bound by negative 1 (nano-second?) to allow for easier "<=" logic if using BETWEEN.

Thanks, that is very helpful, but are you sure CROSS JOIN is what you wanted here? using that, I get a 625 row result set where each row from the first SELECT is paired up with EVERY row from the second select. I would think I would want the first row of the first SELECT paired up with only the first row of the second, second row of the first paired with the second row of the second, etc - i.e. 24 start and end bounds. Or am I missing something?

-----------------------------------------------

Israel Brewster

Systems Analyst II

Ravn Alaska

5245 Airport Industrial Rd

Fairbanks, AK 99709

(907) 450-7293

-----------------------------------------------

David J.

Re: Group by range in hour of day

От

Adrian Klaver

Дата:

17 марта 2015 г., 02:46:42

On 03/16/2015 04:16 PM, Israel Brewster wrote:
> On Mar 16, 2015, at 2:22 PM, David G. Johnston
> <david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com>> wrote:
>>
>> On Mon, Mar 16, 2015 at 3:13 PM, Adrian Klaver
>> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>>wrote:
>>
>>     On 03/16/2015 02:57 PM, Israel Brewster wrote:
>>
>>         I have a table with two timestamp columns for the start time
>>         and end
>>         time of each record (call them start and end).I'm trying to
>>         figure out
>>         if there is a way to group these records by "hour of day",
>>         that is the
>>         record should be included in the group if the hour of the day
>>         for the
>>         group falls anywhere in the range [start,end]. Obviously each
>>         record may
>>         well fall into multiple groups under this scenario.
>>
>>         The goal here is to figure out, for each hour of the day, a)
>>         what is the
>>         total number of "active" records for that hour, and b) what is
>>         the total
>>         "active" time for those records during the hour, with an
>>         ultimate goal
>>         of figuring out the average active time per record per hour.
>>
>>         So, for simplified example, if the table contained three records:
>>
>>                   start              |               end
>>         ------------------------------__-----------------------
>>         2015-03-15 08:15  |  2015-03-15 10:45
>>         2015-03-15 09:30  |  2015-03-15 10:15
>>         2015-03-15 10:30  |  2015-03-15 11:30
>>
>>
>>         Then the results should break out something like this:
>>
>>         hour  |  count  |  sum
>>         -----------------------------
>>         8       |    1       |   0.75
>>         9       |    2       |   1.5
>>         10     |    3       |   1.5
>>         11     |    1       |   0.5
>>
>>         I can then easily manipulate these values to get my ultimate
>>         goal of the
>>         average, which would of course always be less than or equal to
>>         1. Is
>>         this doable in postgress? Or would it be a better idea to
>>         simply pull
>>         the raw data and post-process in code? Thanks.
>>
>>
>>     Do not have an answer for you, but a question:
>>
>>     What version of Postgres are you on?
>>
>>     This will help determine what tools are available to work with.
>>
>>
>> The following will give you endpoints for your bounds.  Version is
>> important since "range types" could be very useful in this situation -
>> but you'd still need to generate the bounds info regardless.
>>
>> 
>> SELECT *
>> FROM
>> (SELECT * FROM generate_series('2015-03-15'::timestamptz,
>> '2015-03-16'::timestamptz, '1 hour'::interval) start (start_ts)) AS s
>> CROSS JOIN
>> (SELECT end_ts + '1 hour'::interval AS end_ts FROM
>> generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz,
>> '1 hour'::interval) e (end_ts)) AS e
>>
>> You would join this using an ON condition with an OR (start BETWEEN
>> [...] OR end BETWEEN [...]) - range logic will be better and you may
>> want to adjust the upper bound by negative 1 (nano-second?) to allow
>> for easier "<=" logic if using BETWEEN.
>>
>
> Thanks, that is very helpful, but are you sure CROSS JOIN is what you
> wanted here? using that, I get a 625 row result set where each row from
> the first SELECT is paired up with EVERY row from the second select. I
> would think I would want the first row of the first SELECT paired up
> with only the first row of the second, second row of the first paired
> with the second row of the second, etc - i.e. 24 start and end bounds.
> Or am I missing something?

Given this:

test=> select * from start_end ;
  id |       start_time       |        end_time
----+------------------------+------------------------
   1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07
   2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07
   3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07

using Pauls hints I got:

test=> select *  from start_end, generate_series(0, 23) as s(h) where h
between extract(hour from start_time) and extract(hour from end_time) ;

  id |       start_time       |        end_time        | h
----+------------------------+------------------------+----
   1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 |  8
   3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 |  8
   1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 |  9
   2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 |  9
   3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 |  9
   2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 10
   3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 10
   2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 11
   3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 11


test=> select h, count(*) from start_end, generate_series(0, 23) as s(h)
where h between extract(hour from start_time) and extract(hour from
end_time) group by h order by h;

  h  | count
----+-------
   8 |     2
   9 |     3
  10 |     2
  11 |     2

>
>
>
> -----------------------------------------------
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293
> -----------------------------------------------
>
>> David J.
>>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Group by range in hour of day

От

"David G. Johnston"

Дата:

17 марта 2015 г., 02:57:11

On Mon, Mar 16, 2015 at 4:16 PM, Israel Brewster <israel@ravnalaska.net> wrote:

On Mar 16, 2015, at 2:22 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Mon, Mar 16, 2015 at 3:13 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 03/16/2015 02:57 PM, Israel Brewster wrote:
I have a table with two timestamp columns for the start time and end
time of each record (call them start and end).I'm trying to figure out
if there is a way to group these records by "hour of day", that is the
record should be included in the group if the hour of the day for the
group falls anywhere in the range [start,end]. Obviously each record may
well fall into multiple groups under this scenario.

The goal here is to figure out, for each hour of the day, a) what is the
total number of "active" records for that hour, and b) what is the total
"active" time for those records during the hour, with an ultimate goal
of figuring out the average active time per record per hour.

So, for simplified example, if the table contained three records:

start | end
-----------------------------------------------------
2015-03-15 08:15 | 2015-03-15 10:45
2015-03-15 09:30 | 2015-03-15 10:15
2015-03-15 10:30 | 2015-03-15 11:30

Then the results should break out something like this:

hour | count | sum
-----------------------------
8 | 1 | 0.75
9 | 2 | 1.5
10 | 3 | 1.5
11 | 1 | 0.5

I can then easily manipulate these values to get my ultimate goal of the
average, which would of course always be less than or equal to 1. Is
this doable in postgress? Or would it be a better idea to simply pull
the raw data and post-process in code? Thanks.

Do not have an answer for you, but a question:

What version of Postgres are you on?

This will help determine what tools are available to work with.

The following will give you endpoints for your bounds. Version is important since "range types" could be very useful in this situation - but you'd still need to generate the bounds info regardless.

SELECT *
FROM
(SELECT * FROM generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz, '1 hour'::interval) start (start_ts)) AS s
CROSS JOIN
(SELECT end_ts + '1 hour'::interval AS end_ts FROM generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz, '1 hour'::interval) e (end_ts)) AS e

You would join this using an ON condition with an OR (start BETWEEN [...] OR end BETWEEN [...]) - range logic will be better and you may want to adjust the upper bound by negative 1 (nano-second?) to allow for easier "<=" logic if using BETWEEN.

Thanks, that is very helpful, but are you sure CROSS JOIN is what you wanted here? using that, I get a 625 row result set where each row from the first SELECT is paired up with EVERY row from the second select. I would think I would want the first row of the first SELECT paired up with only the first row of the second, second row of the first paired with the second row of the second, etc - i.e. 24 start and end bounds. Or am I missing something?

No, I rushed things...:( Sorry. My concept is good though but indeed you want to end up with a table having only 24 rows (for the sample).

LATERAL may work here but I haven't had a chance to play with it yet. A simple ordinal column to join on would be sufficient.

David J.

Re: Group by range in hour of day

От

Israel Brewster

Дата:

17 марта 2015 г., 18:52:49




> On Mar 16, 2015, at 3:46 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>
> On 03/16/2015 04:16 PM, Israel Brewster wrote:
>> On Mar 16, 2015, at 2:22 PM, David G. Johnston
>> <david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com>> wrote:
>>>
>>> On Mon, Mar 16, 2015 at 3:13 PM, Adrian Klaver
>>> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>>wrote:
>>>
>>>    On 03/16/2015 02:57 PM, Israel Brewster wrote:
>>>
>>>        I have a table with two timestamp columns for the start time
>>>        and end
>>>        time of each record (call them start and end).I'm trying to
>>>        figure out
>>>        if there is a way to group these records by "hour of day",
>>>        that is the
>>>        record should be included in the group if the hour of the day
>>>        for the
>>>        group falls anywhere in the range [start,end]. Obviously each
>>>        record may
>>>        well fall into multiple groups under this scenario.
>>>
>>>        The goal here is to figure out, for each hour of the day, a)
>>>        what is the
>>>        total number of "active" records for that hour, and b) what is
>>>        the total
>>>        "active" time for those records during the hour, with an
>>>        ultimate goal
>>>        of figuring out the average active time per record per hour.
>>>
>>>        So, for simplified example, if the table contained three records:
>>>
>>>                  start              |               end
>>>        ------------------------------__-----------------------
>>>        2015-03-15 08:15  |  2015-03-15 10:45
>>>        2015-03-15 09:30  |  2015-03-15 10:15
>>>        2015-03-15 10:30  |  2015-03-15 11:30
>>>
>>>
>>>        Then the results should break out something like this:
>>>
>>>        hour  |  count  |  sum
>>>        -----------------------------
>>>        8       |    1       |   0.75
>>>        9       |    2       |   1.5
>>>        10     |    3       |   1.5
>>>        11     |    1       |   0.5
>>>
>>>        I can then easily manipulate these values to get my ultimate
>>>        goal of the
>>>        average, which would of course always be less than or equal to
>>>        1. Is
>>>        this doable in postgress? Or would it be a better idea to
>>>        simply pull
>>>        the raw data and post-process in code? Thanks.
>>>
>>>
>>>    Do not have an answer for you, but a question:
>>>
>>>    What version of Postgres are you on?
>>>
>>>    This will help determine what tools are available to work with.
>>>
>>>
>>> The following will give you endpoints for your bounds.  Version is
>>> important since "range types" could be very useful in this situation -
>>> but you'd still need to generate the bounds info regardless.
>>>
>>> 
>>> SELECT *
>>> FROM
>>> (SELECT * FROM generate_series('2015-03-15'::timestamptz,
>>> '2015-03-16'::timestamptz, '1 hour'::interval) start (start_ts)) AS s
>>> CROSS JOIN
>>> (SELECT end_ts + '1 hour'::interval AS end_ts FROM
>>> generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz,
>>> '1 hour'::interval) e (end_ts)) AS e
>>>
>>> You would join this using an ON condition with an OR (start BETWEEN
>>> [...] OR end BETWEEN [...]) - range logic will be better and you may
>>> want to adjust the upper bound by negative 1 (nano-second?) to allow
>>> for easier "<=" logic if using BETWEEN.
>>>
>>
>> Thanks, that is very helpful, but are you sure CROSS JOIN is what you
>> wanted here? using that, I get a 625 row result set where each row from
>> the first SELECT is paired up with EVERY row from the second select. I
>> would think I would want the first row of the first SELECT paired up
>> with only the first row of the second, second row of the first paired
>> with the second row of the second, etc - i.e. 24 start and end bounds.
>> Or am I missing something?
>
> Given this:
>
> test=> select * from start_end ;
> id |       start_time       |        end_time
> ----+------------------------+------------------------
>  1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07
>  2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07
>  3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07
>
> using Pauls hints I got:
>
> test=> select *  from start_end, generate_series(0, 23) as s(h) where h between extract(hour from start_time) and
extract(hourfrom end_time) ; 
>
> id |       start_time       |        end_time        | h
> ----+------------------------+------------------------+----
>  1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 |  8
>  3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 |  8
>  1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 |  9
>  2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 |  9
>  3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 |  9
>  2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 10
>  3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 10
>  2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 11
>  3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 11
>
>
> test=> select h, count(*) from start_end, generate_series(0, 23) as s(h) where h between extract(hour from
start_time)and extract(hour from end_time) group by h order by h; 
>
> h  | count
> ----+-------
>  8 |     2
>  9 |     3
> 10 |     2
> 11 |     2
>

Awesome! That looks perfect. I still need to incorporate it into the bigger picture, of course, but I'm sure I can
handlethat. Thanks! 
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------

>>
>>
>>
>> -----------------------------------------------
>> Israel Brewster
>> Systems Analyst II
>> Ravn Alaska
>> 5245 Airport Industrial Rd
>> Fairbanks, AK 99709
>> (907) 450-7293
>> -----------------------------------------------
>>
>>> David J.
>>>
>>
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com

Вложения

Israel Brewster.vcf

Re: Group by range in hour of day

От

Paul Jungwirth

Дата:

17 марта 2015 г., 19:09:46

>> test=> select h, count(*) from start_end, generate_series(0, 23) as s(h) where h between extract(hour from
start_time)and extract(hour from end_time) group by h order by h; 
>>
>> h  | count
>> ----+-------
>>   8 |     2
>>   9 |     3
>> 10 |     2
>> 11 |     2

Note if you always want all 24 rows with a count of 0 when appropriate
(which seems common in reports with tables or plots), you can just tweak
the above query to use a left join: FROM generate_series(0, 23) AS s(h)
LEFT OUTER JOIN start_end ON h BETWEEN ...

Paul

Re: Group by range in hour of day

От

Israel Brewster

Дата:

17 марта 2015 г., 19:52:38


> On Mar 17, 2015, at 8:09 AM, Paul Jungwirth <pj@illuminatedcomputing.com> wrote:
>
>>> test=> select h, count(*) from start_end, generate_series(0, 23) as s(h) where h between extract(hour from
start_time)and extract(hour from end_time) group by h order by h; 
>>>
>>> h  | count
>>> ----+-------
>>>  8 |     2
>>>  9 |     3
>>> 10 |     2
>>> 11 |     2
>
> Note if you always want all 24 rows with a count of 0 when appropriate (which seems common in reports with tables or
plots),you can just tweak the above query to use a left join: FROM generate_series(0, 23) AS s(h) LEFT OUTER JOIN
start_endON h BETWEEN ... 
>
> Paul

Right, thanks. That makes sense. So next question: how do I get the "active" time per hour from this? To use the same
examplethat came up with this result set: 

Given this:

test=> select * from start_end ;
id |       start_time       |        end_time
----+------------------------+------------------------
 1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07
 2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07
 3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07

The specified query returns this:

h  | count
----+-------
 8 |     2
 9 |     3
10 |     2
11 |     2

Which is an excellent start, but I also need one more column, which is the total "active" time per hour. So given the
intermediateresult of this: 

id |       start_time       |        end_time        | h
----+------------------------+------------------------+----
 1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 |  8
 3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 |  8
 1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 |  9
 2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 |  9
 3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 |  9
 2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 10
 3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 10
 2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 11
 3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 11

I'd want a final result of this:

h  | count  |  sum
----+---------------
 8 |     2    | 1.75 (or 1:45:00 or whatever)
 9 |     3    | 2.33 (2:20:00)
10 |     2   | 2.00 (2:00:00)
11 |     2   | 0.83 (0:50:00)


Where the 1:45 in the 8 hour is based on 45 minutes from row id 1 [8:15-9:00) plus the full hour [08:00-9:00) from row
id3, the hour 9 value is based on the amount of rows 1,2 and 3 that fall within the 9 hour, etc. 
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------

>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Re: Group by range in hour of day

От

"David G. Johnston"

Дата:

17 марта 2015 г., 20:05:44

On Tuesday, March 17, 2015, Israel Brewster <israel@ravnalaska.net> wrote:

> On Mar 17, 2015, at 8:09 AM, Paul Jungwirth <pj@illuminatedcomputing.com> wrote:
>
>>> test=> select h, count(*) from start_end, generate_series(0, 23) as s(h) where h between extract(hour from start_time) and extract(hour from end_time) group by h order by h;
>>>
>>> h | count
>>> ----+-------
>>> 8 | 2
>>> 9 | 3
>>> 10 | 2
>>> 11 | 2
>
> Note if you always want all 24 rows with a count of 0 when appropriate (which seems common in reports with tables or plots), you can just tweak the above query to use a left join: FROM generate_series(0, 23) AS s(h) LEFT OUTER JOIN start_end ON h BETWEEN ...
>
> Paul

Right, thanks. That makes sense. So next question: how do I get the "active" time per hour from this? To use the same example that came up with this result set:

Which is why you do not (only?) want to convert your data to hour-of-day but want to create timestamp end points. Then you simply do timestamp subtraction to get durations which you can then sum together.

David J.

Re: Group by range in hour of day

От

Israel Brewster

Дата:

17 марта 2015 г., 20:22:56

On Mar 17, 2015, at 9:05 AM, David G. Johnston <david.g.johnston@gmail.com> wrote:

On Tuesday, March 17, 2015, Israel Brewster <israel@ravnalaska.net> wrote:

> On Mar 17, 2015, at 8:09 AM, Paul Jungwirth <pj@illuminatedcomputing.com> wrote:
>
>>> test=> select h, count(*) from start_end, generate_series(0, 23) as s(h) where h between extract(hour from start_time) and extract(hour from end_time) group by h order by h;
>>>
>>> h | count
>>> ----+-------
>>> 8 | 2
>>> 9 | 3
>>> 10 | 2
>>> 11 | 2
>
> Note if you always want all 24 rows with a count of 0 when appropriate (which seems common in reports with tables or plots), you can just tweak the above query to use a left join: FROM generate_series(0, 23) AS s(h) LEFT OUTER JOIN start_end ON h BETWEEN ...
>
> Paul

Right, thanks. That makes sense. So next question: how do I get the "active" time per hour from this? To use the same example that came up with this result set:

Which is why you do not (only?) want to convert your data to hour-of-day but want to create timestamp end points. Then you simply do timestamp subtraction to get durations which you can then sum together.

Well, it's not QUITE that simple. For example, row id 3 which starts at 08:00:00 and ends at 11:45:00 in the example. If I have a timestamp endpoint of 10:00:00 for the 9 hour, and I just do simple timestamp subtraction, I'll get an interval of 2 (10:00:00 - 08:00:00), which is not correct since there can't be more than an hour in any given hour. Similarly for the 11 hour and either of the two matching rows - since they end during the hour in question (row 2 only contributes 5 minutes), I'd actually need to subtract the end_time from the start point in that case to get the time.

That said, the concept is sound, and I am fairly sure I can make it work using a case when statement to handle the various permutations of starting before and or ending after the hour in question. I'll work on that, but if there is a more elegant solution, I'm all ears :-)

David J.

Re: Group by range in hour of day

От

Paul Jungwirth

Дата:

17 марта 2015 г., 20:30:29

So next question: how do I get the "active" time per hour from this?

I think you just SUM() over the intersection between each hourly window
and each event, right? This might be easiest using tsrange, something
like this:

    SUM(extract(minutes from (tsrange(start_time, end_time) &&
tsrange(h, h + interval '1 hour'))::interval))

I think you'll have to implement ::interval yourself though, e.g. here:

http://dba.stackexchange.com/questions/52153/postgresql-9-2-number-of-days-in-a-tstzrange

Also as mentioned you'll have to convert h from an integer [0,23] to a
timestamp, but that seems pretty easy. Assuming start_time and end_time
are UTC that's just adding that many hours to UTC midnight of the same day.

Some weird edge cases to be careful about: activities that cross
midnight. Activities that last more than one full day, e.g. start 3/15
and end 3/17.

Paul

Re: Group by range in hour of day

От

Israel Brewster

Дата:

17 марта 2015 г., 20:57:47


> On Mar 17, 2015, at 9:30 AM, Paul Jungwirth <pj@illuminatedcomputing.com> wrote:
>
> So next question: how do I get the "active" time per hour from this?
>
> I think you just SUM() over the intersection between each hourly window and each event, right? This might be easiest
usingtsrange, something like this: 

Sounds reasonable. I've never worked with range values before, but it does seem appropriate here.

>
>   SUM(extract(minutes from (tsrange(start_time, end_time) && tsrange(h, h + interval '1 hour'))::interval))
>
> I think you'll have to implement ::interval yourself though, e.g. here:
>
> http://dba.stackexchange.com/questions/52153/postgresql-9-2-number-of-days-in-a-tstzrange

Gotcha

>
> Also as mentioned you'll have to convert h from an integer [0,23] to a timestamp, but that seems pretty easy.
Assumingstart_time and end_time are UTC that's just adding that many hours to UTC midnight of the same day. 
>
> Some weird edge cases to be careful about: activities that cross midnight. Activities that last more than one full
day,e.g. start 3/15 and end 3/17. 

Right. And I will run into some of those (at least the crossing midnight), so I'll keep an eye out.

-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------

>
> Paul
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Re: Group by range in hour of day

От

Adrian Klaver

Дата:

17 марта 2015 г., 21:30:13

On 03/17/2015 10:57 AM, Israel Brewster wrote:
>
>
>> On Mar 17, 2015, at 9:30 AM, Paul Jungwirth <pj@illuminatedcomputing.com> wrote:
>>
>> So next question: how do I get the "active" time per hour from this?
>>
>> I think you just SUM() over the intersection between each hourly window and each event, right? This might be easiest
usingtsrange, something like this: 
>
> Sounds reasonable. I've never worked with range values before, but it does seem appropriate here.
>
>>
>>    SUM(extract(minutes from (tsrange(start_time, end_time) && tsrange(h, h + interval '1 hour'))::interval))
>>
>> I think you'll have to implement ::interval yourself though, e.g. here:
>>
>> http://dba.stackexchange.com/questions/52153/postgresql-9-2-number-of-days-in-a-tstzrange
>
> Gotcha


My take on this is using CASE.

Rough sketch:


WHEN
     date_trunc('hour', end_time) < h
THEN
     end_time - start_time
ELSE
     (date_trunc('hour', start_time) + interval '1 hr') - start_time
as
     active_time

>
>>
>> Also as mentioned you'll have to convert h from an integer [0,23] to a timestamp, but that seems pretty easy.
Assumingstart_time and end_time are UTC that's just adding that many hours to UTC midnight of the same day. 
>>
>> Some weird edge cases to be careful about: activities that cross midnight. Activities that last more than one full
day,e.g. start 3/15 and end 3/17. 
>
> Right. And I will run into some of those (at least the crossing midnight), so I'll keep an eye out.
>
> -----------------------------------------------
> Israel Brewster
> Systems Analyst II
> Ravn Alaska
> 5245 Airport Industrial Rd
> Fairbanks, AK 99709
> (907) 450-7293
> -----------------------------------------------
>

>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Group by range in hour of day

От

Paul Jungwirth

Дата:

17 марта 2015 г., 21:44:51

>> Some weird edge cases to be careful about: activities that cross midnight.
 >> Activities that last more than one full day,
 >> e.g. start 3/15 and end 3/17.
> Right. And I will run into some of those (at least the crossing midnight),
 > so I'll keep an eye out.

If you are running the report on more than one day at a time, I think
David Johnston is right that you want to convert from integers [0, 23]
to timestamps as soon as possible, possibly even just generate a series
of timestamps rather than integers right from the beginning. Also beware
of extract(hour from foo). Probably you want tsrange intersection as
your join condition rather than BETWEEN.

Paul

Re: Group by range in hour of day

От

Adrian Klaver

Дата:

17 марта 2015 г., 22:40:08


--
  Adrian Klaver
  adrian.klaver@aklaver.com

On Tue, Mar 17, 2015, at 11:30 AM, Adrian Klaver wrote:
> On 03/17/2015 10:57 AM, Israel Brewster wrote:
> >
> >
> >> On Mar 17, 2015, at 9:30 AM, Paul Jungwirth <pj@illuminatedcomputing.com> wrote:
> >>
> >> So next question: how do I get the "active" time per hour from this?
> >>
> >> I think you just SUM() over the intersection between each hourly window and each event, right? This might be
easiestusing tsrange, something like this: 
> >
> > Sounds reasonable. I've never worked with range values before, but it does seem appropriate here.
> >
> >>
> >>    SUM(extract(minutes from (tsrange(start_time, end_time) && tsrange(h, h + interval '1 hour'))::interval))
> >>
> >> I think you'll have to implement ::interval yourself though, e.g. here:
> >>
> >> http://dba.stackexchange.com/questions/52153/postgresql-9-2-number-of-days-in-a-tstzrange
> >
> > Gotcha
>
>
> My take on this is using CASE.
>
> Rough sketch:
>
>
> WHEN
>      date_trunc('hour', end_time) < h
> THEN
>      end_time - start_time
> ELSE
>      (date_trunc('hour', start_time) + interval '1 hr') - start_time
> as
>      active_time


Aah, should be

WHEN
      date_trunc('hour', end_time) < h + 1
 THEN
      end_time - start_time
 ELSE
      (date_trunc('hour', start_time) + interval '1 hr') - start_time
 as
     active_time
>
> >
> >>
> >> Also as mentioned you'll have to convert h from an integer [0,23] to a timestamp, but that seems pretty easy.
Assumingstart_time and end_time are UTC that's just adding that many hours to UTC midnight of the same day. 
> >>
> >> Some weird edge cases to be careful about: activities that cross midnight. Activities that last more than one full
day,e.g. start 3/15 and end 3/17. 
> >
> > Right. And I will run into some of those (at least the crossing midnight), so I'll keep an eye out.
> >
> > -----------------------------------------------
> > Israel Brewster
> > Systems Analyst II
> > Ravn Alaska
> > 5245 Airport Industrial Rd
> > Fairbanks, AK 99709
> > (907) 450-7293
> > -----------------------------------------------
> >
>
> >
>
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com

Re: Group by range in hour of day

От

Marc Mamin

Дата:

18 марта 2015 г., 00:03:12

>
>
>--
>  Adrian Klaver
>  adrian.klaver@aklaver.com
>
>On Tue, Mar 17, 2015, at 11:30 AM, Adrian Klaver wrote:
>> On 03/17/2015 10:57 AM, Israel Brewster wrote:
>> >
>> >
>> >> On Mar 17, 2015, at 9:30 AM, Paul Jungwirth <pj@illuminatedcomputing.com> wrote:
>> >>
>> >> So next question: how do I get the "active" time per hour from this?
>> >>
>> >> I think you just SUM() over the intersection between each hourly window and each event, right? This might be
easiestusing tsrange, something like this: 
>> >
>> > Sounds reasonable. I've never worked with range values before, but it does seem appropriate here.
>> >
>> >>
>> >>    SUM(extract(minutes from (tsrange(start_time, end_time) && tsrange(h, h + interval '1 hour'))::interval))
>> >>
>> >> I think you'll have to implement ::interval yourself though, e.g. here:
>> >>
>> >> http://dba.stackexchange.com/questions/52153/postgresql-9-2-number-of-days-in-a-tstzrange
>> >
>> > Gotcha
>>
>>
>> My take on this is using CASE.
>>
>> Rough sketch:
>>
>>
>> WHEN
>>      date_trunc('hour', end_time) < h
>> THEN
>>      end_time - start_time
>> ELSE
>>      (date_trunc('hour', start_time) + interval '1 hr') - start_time
>> as
>>      active_time
>
>
>Aah, should be
>
>WHEN
>      date_trunc('hour', end_time) < h + 1
> THEN
>      end_time - start_time
> ELSE
>      (date_trunc('hour', start_time) + interval '1 hr') - start_time
> as
>     active_time

Here another approach while building an hourly serie for each start/end pair, truncated to the hours:

create temp table t (s timestamptz, e timestamptz);

insert into t select '2015-03-16 08:15:00','2015-03-16 09:35:00';
insert into t select '2015-03-16 09:15:00','2015-03-16 11:05:00';
insert into t select '2015-03-16 08:00:00','2015-03-16 11:45:00';
insert into t select '2015-03-17 15:15:00','2015-03-18 11:45:00';
insert into t select '2015-03-17 20:15:00','2015-03-18 11:45:00';
insert into t select '2015-03-17 21:15:00','2015-03-18 10:10:00';
insert into t select '2015-03-18 23:30:00','2015-03-19 01:30:00';

SELECT ser, SUM(
  case when e - ser < interval '1 hour' then e-ser --end interval
  when s >= ser then interval '1 hour' - (s - ser) --start interval
  else interval '1 hour'
  end ) as time_tot
FROM
  (select e,s,
        generate_series(date_trunc('hour',s), date_trunc('hour',e), '1 hour') ser
  from t
  )foo
group by ser
order by 1

regards,
Marc Mamin

Re: Group by range in hour of day

От

Marc Mamin

Дата:

18 марта 2015 г., 00:41:31

>>On Tue, Mar 17, 2015, at 11:30 AM, Adrian Klaver wrote:
>>> On 03/17/2015 10:57 AM, Israel Brewster wrote:
>>> >
>>> >
>>> >> On Mar 17, 2015, at 9:30 AM, Paul Jungwirth <pj@illuminatedcomputing.com> wrote:
>>> >>
>>> >> So next question: how do I get the "active" time per hour from this?
>>> >>
>>> >> I think you just SUM() over the intersection between each hourly window and each event, right? This might be
easiestusing tsrange, something like this: 
>>> >
>>> > Sounds reasonable. I've never worked with range values before, but it does seem appropriate here.
>>> >
>>> >>
>>> >>    SUM(extract(minutes from (tsrange(start_time, end_time) && tsrange(h, h + interval '1 hour'))::interval))
>>> >>
>>> >> I think you'll have to implement ::interval yourself though, e.g. here:
>>> >>
>>> >> http://dba.stackexchange.com/questions/52153/postgresql-9-2-number-of-days-in-a-tstzrange
>>> >
>>> > Gotcha
>>>
>>>
>>> My take on this is using CASE.
>>>
>>> Rough sketch:
>>>
>>>
>>> WHEN
>>>      date_trunc('hour', end_time) < h
>>> THEN
>>>      end_time - start_time
>>> ELSE
>>>      (date_trunc('hour', start_time) + interval '1 hr') - start_time
>>> as
>>>      active_time
>>
>>
>>Aah, should be
>>
>>WHEN
>>      date_trunc('hour', end_time) < h + 1
>> THEN
>>      end_time - start_time
>> ELSE
>>      (date_trunc('hour', start_time) + interval '1 hr') - start_time
>> as
>>     active_time
>
>Here another approach while building an hourly serie for each start/end pair, truncated to the hours:
>
>create temp table t (s timestamptz, e timestamptz);
>
>insert into t select '2015-03-16 08:15:00','2015-03-16 09:35:00';
>insert into t select '2015-03-16 09:15:00','2015-03-16 11:05:00';
>insert into t select '2015-03-16 08:00:00','2015-03-16 11:45:00';
>insert into t select '2015-03-17 15:15:00','2015-03-18 11:45:00';
>insert into t select '2015-03-17 20:15:00','2015-03-18 11:45:00';
>insert into t select '2015-03-17 21:15:00','2015-03-18 10:10:00';
>insert into t select '2015-03-18 23:30:00','2015-03-19 01:30:00';
>
>SELECT ser, SUM(
>  case when e - ser < interval '1 hour' then e-ser --end interval
>  when s >= ser then interval '1 hour' - (s - ser) --start interval
>  else interval '1 hour'
>  end ) as time_tot
>FROM
>  (select e,s,
>        generate_series(date_trunc('hour',s), date_trunc('hour',e), '1 hour') ser
>  from t
>  )foo
>group by ser
>order by 1
>
>regards,
>Marc Mamin

I missed the case when the start and end points are in the same hour:

SELECT ser, SUM(
  case when e - ser < interval '1 hour' then e - greatest(ser,s) --end interval or s&e in same hour
  when s >= ser then interval '1 hour' - (s - ser) --start interval
  else interval '1 hour'
  end ) as time_tot
FROM
  (select e,s,
        generate_series(date_trunc('hour',s), date_trunc('hour',e), '1 hour') ser
  from t
  )foo
group by ser
order by 1

Marc

Re: Group by range in hour of day

От

Israel Brewster

Дата:

18 марта 2015 г., 01:34:59

> On Mar 17, 2015, at 1:41 PM, Marc Mamin <M.Mamin@intershop.de> wrote:
>
>
>>> On Tue, Mar 17, 2015, at 11:30 AM, Adrian Klaver wrote:
>>>> On 03/17/2015 10:57 AM, Israel Brewster wrote:
>>>>>
>>>>>
>>>>>> On Mar 17, 2015, at 9:30 AM, Paul Jungwirth <pj@illuminatedcomputing.com> wrote:
>>>>>>
>>>>>> So next question: how do I get the "active" time per hour from this?
>>>>>>
>>>>>> I think you just SUM() over the intersection between each hourly window and each event, right? This might be
easiestusing tsrange, something like this: 
>>>>>
>>>>> Sounds reasonable. I've never worked with range values before, but it does seem appropriate here.
>>>>>
>>>>>>
>>>>>>   SUM(extract(minutes from (tsrange(start_time, end_time) && tsrange(h, h + interval '1 hour'))::interval))
>>>>>>
>>>>>> I think you'll have to implement ::interval yourself though, e.g. here:
>>>>>>
>>>>>> http://dba.stackexchange.com/questions/52153/postgresql-9-2-number-of-days-in-a-tstzrange
>>>>>
>>>>> Gotcha
>>>>
>>>>
>>>> My take on this is using CASE.
>>>>
>>>> Rough sketch:
>>>>
>>>>
>>>> WHEN
>>>>     date_trunc('hour', end_time) < h
>>>> THEN
>>>>     end_time - start_time
>>>> ELSE
>>>>     (date_trunc('hour', start_time) + interval '1 hr') - start_time
>>>> as
>>>>     active_time
>>>
>>>
>>> Aah, should be
>>>
>>> WHEN
>>>     date_trunc('hour', end_time) < h + 1
>>> THEN
>>>     end_time - start_time
>>> ELSE
>>>     (date_trunc('hour', start_time) + interval '1 hr') - start_time
>>> as
>>>    active_time
>>
>> Here another approach while building an hourly serie for each start/end pair, truncated to the hours:
>>
>> create temp table t (s timestamptz, e timestamptz);
>>
>> insert into t select '2015-03-16 08:15:00','2015-03-16 09:35:00';
>> insert into t select '2015-03-16 09:15:00','2015-03-16 11:05:00';
>> insert into t select '2015-03-16 08:00:00','2015-03-16 11:45:00';
>> insert into t select '2015-03-17 15:15:00','2015-03-18 11:45:00';
>> insert into t select '2015-03-17 20:15:00','2015-03-18 11:45:00';
>> insert into t select '2015-03-17 21:15:00','2015-03-18 10:10:00';
>> insert into t select '2015-03-18 23:30:00','2015-03-19 01:30:00';
>>
>> SELECT ser, SUM(
>> case when e - ser < interval '1 hour' then e-ser --end interval
>> when s >= ser then interval '1 hour' - (s - ser) --start interval
>> else interval '1 hour'
>> end ) as time_tot
>> FROM
>> (select e,s,
>>       generate_series(date_trunc('hour',s), date_trunc('hour',e), '1 hour') ser
>> from t
>> )foo
>> group by ser
>> order by 1
>>
>> regards,
>> Marc Mamin
>
> I missed the case when the start and end points are in the same hour:
>
> SELECT ser, SUM(
>  case when e - ser < interval '1 hour' then e - greatest(ser,s) --end interval or s&e in same hour
>  when s >= ser then interval '1 hour' - (s - ser) --start interval
>  else interval '1 hour'
>  end ) as time_tot
> FROM
>  (select e,s,
>        generate_series(date_trunc('hour',s), date_trunc('hour',e), '1 hour') ser
>  from t
>  )foo
> group by ser
> order by 1
>
> Marc

That you all for the suggestions. I think I have it working now, using CASE statements similar to these. I'll have to
spendsome time playing around with the tsrange suggestions as well, since I think it could end up being cleaner and
safer(especially, as mentioned, for any cases where there may be date changes involved), but at least I now have a
functioningquery I can tweak. Thanks again! 
-----------------------------------------------
Israel Brewster
Systems Analyst II
Ravn Alaska
5245 Airport Industrial Rd
Fairbanks, AK 99709
(907) 450-7293
-----------------------------------------------


>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general