Обсуждение: [GENERAL] time series data

Поиск
Список
Период
Сортировка

[GENERAL] time series data

От
Khalil Khamlichi
Дата:
Hi everyone,

I have a data stream of a call center application coming in  to postgres in this format :

user_name, user_status, event_time

'user1', 'ready', '2017-01-01 10:00:00'
'user1', 'talking', '2017-01-01 10:02:00'
'user1', 'after_call', '2017-01-01 10:07:00'
'user1', 'ready', '2017-01-01 10:08:00'
'user1', 'talking', '2017-01-01 10:10:00'
'user1', 'after_call', '2017-01-01 10:15:00'
'user1', 'paused', '2017-01-01 10:20:00'
...
...

so as you see each new insert of an "event" is in fact the start_time of that event and also the end_time of the previous one so should be used to calculate the duration of this previous one.

What is the best way to get user_status statistics like total duration, frequency, avg ...etc , does any body have an experience with this sort of data streams ?


Thanks in advance.

Re: [GENERAL] time series data

От
Melvin Davidson
Дата:


On Sun, Oct 1, 2017 at 4:17 AM, Khalil Khamlichi <khamlichi.khalil@gmail.com> wrote:
Hi everyone,

I have a data stream of a call center application coming in  to postgres in this format :

user_name, user_status, event_time

'user1', 'ready', '2017-01-01 10:00:00'
'user1', 'talking', '2017-01-01 10:02:00'
'user1', 'after_call', '2017-01-01 10:07:00'
'user1', 'ready', '2017-01-01 10:08:00'
'user1', 'talking', '2017-01-01 10:10:00'
'user1', 'after_call', '2017-01-01 10:15:00'
'user1', 'paused', '2017-01-01 10:20:00'
...
...

so as you see each new insert of an "event" is in fact the start_time of that event and also the end_time of the previous one so should be used to calculate the duration of this previous one.

What is the best way to get user_status statistics like total duration, frequency, avg ...etc , does any body have an experience with this sort of data streams ?


Thanks in advance.

Just a suggestion, but here is what I would do.
First, create your tables similar to as follows

CREATE TABLE status
(
 call_status    varchar(10) NOT NULL,
 CONSTRAINT status_pk PRIMARY KEY (call_status)
);

INSERT INTO status
(call_status)
VALUES
('ready'),
('talking'),
('after_call');

CREATE TABLE user_sessions
(
 username        name        NOT NULL,
 session_id        bigint        NOT NULL,
 call_status    varchar(10) NOT NULL,
 call_time        timestamp NOT NULL,
 CONSTRAINT user_sessions_pk PRIMARY KEY (username, session_id,call_status),
 CONSTRAINT user_sessions_fk_status FOREIGN KEY (call_status)
    REFERENCES status(call_status)
);

Next, you will need to generate a unique session_id for each
user, but only for when call_status is 'ready'. So probably
a table of the form:

CREATE TABLE current_session
(
username        name        NOT NULL,
session_id        serial        NOT NULL,
CONSTRAINT current_session_pk PRIMARY KEY (username)
);

Then all you need to do is:
1. Update current_session and get the new session_id each time a user connects (call_status = 'ready'.
   Probably best to use a BEFORE trigger to do this, but you will need to code it yourself.

2. You can then do

SELECT username,
       age ( (SELECT call_time FROM current_session WHERE call_status = 'talking'),
              ( SELECT call_time FROM current_session WHERE call_status = 'after_call')
           ) as duration
  FROM user_sessions
 WHERE username = 'actual_user_name'
   AND session_id =  actual_session_id;  

You can use similar queries for avg and frequency.

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: [GENERAL] time series data

От
Clifford Snow
Дата:
I have a stream that updates every minute with a trigger that updates another table with information from the stream. That way I'm constantly updated with no need to run a script to update before I want a report.

Clifford

On Sun, Oct 1, 2017 at 10:08 AM, Melvin Davidson <melvin6925@gmail.com> wrote:


On Sun, Oct 1, 2017 at 4:17 AM, Khalil Khamlichi <khamlichi.khalil@gmail.com> wrote:
Hi everyone,

I have a data stream of a call center application coming in  to postgres in this format :

user_name, user_status, event_time

'user1', 'ready', '2017-01-01 10:00:00'
'user1', 'talking', '2017-01-01 10:02:00'
'user1', 'after_call', '2017-01-01 10:07:00'
'user1', 'ready', '2017-01-01 10:08:00'
'user1', 'talking', '2017-01-01 10:10:00'
'user1', 'after_call', '2017-01-01 10:15:00'
'user1', 'paused', '2017-01-01 10:20:00'
...
...

so as you see each new insert of an "event" is in fact the start_time of that event and also the end_time of the previous one so should be used to calculate the duration of this previous one.

What is the best way to get user_status statistics like total duration, frequency, avg ...etc , does any body have an experience with this sort of data streams ?


Thanks in advance.

Just a suggestion, but here is what I would do.
First, create your tables similar to as follows

CREATE TABLE status
(
 call_status    varchar(10) NOT NULL,
 CONSTRAINT status_pk PRIMARY KEY (call_status)
);

INSERT INTO status
(call_status)
VALUES
('ready'),
('talking'),
('after_call');

CREATE TABLE user_sessions
(
 username        name        NOT NULL,
 session_id        bigint        NOT NULL,
 call_status    varchar(10) NOT NULL,
 call_time        timestamp NOT NULL,
 CONSTRAINT user_sessions_pk PRIMARY KEY (username, session_id,call_status),
 CONSTRAINT user_sessions_fk_status FOREIGN KEY (call_status)
    REFERENCES status(call_status)
);

Next, you will need to generate a unique session_id for each
user, but only for when call_status is 'ready'. So probably
a table of the form:

CREATE TABLE current_session
(
username        name        NOT NULL,
session_id        serial        NOT NULL,
CONSTRAINT current_session_pk PRIMARY KEY (username)
);

Then all you need to do is:
1. Update current_session and get the new session_id each time a user connects (call_status = 'ready'.
   Probably best to use a BEFORE trigger to do this, but you will need to code it yourself.

2. You can then do

SELECT username,
       age ( (SELECT call_time FROM current_session WHERE call_status = 'talking'),
              ( SELECT call_time FROM current_session WHERE call_status = 'after_call')
           ) as duration
  FROM user_sessions
 WHERE username = 'actual_user_name'
   AND session_id =  actual_session_id;  

You can use similar queries for avg and frequency.

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.




--
@osm_seattle
OpenStreetMap: Maps with a human touch

Re: [GENERAL] time series data

От
Melvin Davidson
Дата:


On Sun, Oct 1, 2017 at 6:20 PM, Clifford Snow <clifford@snowandsnow.us> wrote:
I have a stream that updates every minute with a trigger that updates another table with information from the stream. That way I'm constantly updated with no need to run a script to update before I want a report.

Clifford

On Sun, Oct 1, 2017 at 10:08 AM, Melvin Davidson <melvin6925@gmail.com> wrote:


On Sun, Oct 1, 2017 at 4:17 AM, Khalil Khamlichi <khamlichi.khalil@gmail.com> wrote:
Hi everyone,

I have a data stream of a call center application coming in  to postgres in this format :

user_name, user_status, event_time

'user1', 'ready', '2017-01-01 10:00:00'
'user1', 'talking', '2017-01-01 10:02:00'
'user1', 'after_call', '2017-01-01 10:07:00'
'user1', 'ready', '2017-01-01 10:08:00'
'user1', 'talking', '2017-01-01 10:10:00'
'user1', 'after_call', '2017-01-01 10:15:00'
'user1', 'paused', '2017-01-01 10:20:00'
...
...

so as you see each new insert of an "event" is in fact the start_time of that event and also the end_time of the previous one so should be used to calculate the duration of this previous one.

What is the best way to get user_status statistics like total duration, frequency, avg ...etc , does any body have an experience with this sort of data streams ?


Thanks in advance.

Just a suggestion, but here is what I would do.
First, create your tables similar to as follows

CREATE TABLE status
(
 call_status    varchar(10) NOT NULL,
 CONSTRAINT status_pk PRIMARY KEY (call_status)
);

INSERT INTO status
(call_status)
VALUES
('ready'),
('talking'),
('after_call');

CREATE TABLE user_sessions
(
 username        name        NOT NULL,
 session_id        bigint        NOT NULL,
 call_status     varchar(10) NOT NULL,
 call_time        timestamp NOT NULL,
 CONSTRAINT user_sessions_pk PRIMARY KEY (username, session_id,call_status),
 CONSTRAINT user_sessions_fk_status FOREIGN KEY (call_status)
    REFERENCES status(call_status)
);

Next, you will need to generate a unique session_id for each
user, but only for when call_status is 'ready'. So probably
a table of the form:

CREATE TABLE current_session
(
username        name        NOT NULL,
session_id        serial        NOT NULL,
CONSTRAINT current_session_pk PRIMARY KEY (username)
);

Then all you need to do is:
1. Update current_session and get the new session_id each time a user connects (call_status = 'ready'.
   Probably best to use a BEFORE trigger to do this, but you will need to code it yourself.

2. You can then do

SELECT username,
       age ( (SELECT call_time FROM current_session WHERE call_status = 'talking'),
              ( SELECT call_time FROM current_session WHERE call_status = 'after_call')
           ) as duration
  FROM user_sessions
 WHERE username = 'actual_user_name'
   AND session_id =  actual_session_id;  

You can use similar queries for avg and frequency.

--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.




--
@osm_seattle
OpenStreetMap: Maps with a human touch

I thought about the table design for user_sessions and came up
with a better one:

CREATE TABLE user_sessions
(
 username      name          NOT NULL,
 session_id     bigint           NOT NULL,
 call_status     varchar(10) NOT NULL,
 call_ready      timestamp   NOT NULL,
 call_talking     timestamp,
 call_after_call timestamp,
 call_duration   interval,
 CONSTRAINT user_sessions_pk PRIMARY KEY (username, session_id),
 CONSTRAINT user_sessions_fk_status FOREIGN KEY (call_status)
    REFERENCES status(call_status)
);

So in essence, when the call starts, just do:

INSERT INTO user_sessions
(username, call_ready)
VALUES
('actual_user_name', now() );

Then
SELECT max(session_id) AS current_session
  FROM user_sessions
 WHERE username = 'actual_user_name';
   
When talking starts:
UPDATE user_sessions
   SET call_status = 'talking',
       call_talking = now()
 WHERE username = 'actual_user_name'
   AND session_id = current_session;

When call ends:
UPDATE user_sessions
   SET call_status = 'after_call',
       call_after_call = now()
 WHERE username = 'actual_user_name'
   AND session_id = current_session;
   
Now all you have to do to get call length is:

SELECT username,
       age ( call_after_call, call_talking ) as duration
  FROM user_sessions
 WHERE username = 'actual_user_name'
   AND session_id =  current_session;

Re: [GENERAL] time series data

От
Scott Marlowe
Дата:
On Sun, Oct 1, 2017 at 2:17 AM, Khalil Khamlichi
<khamlichi.khalil@gmail.com> wrote:
> Hi everyone,
>
> I have a data stream of a call center application coming in  to postgres in
> this format :
>
> user_name, user_status, event_time
>
> 'user1', 'ready', '2017-01-01 10:00:00'
> 'user1', 'talking', '2017-01-01 10:02:00'
> 'user1', 'after_call', '2017-01-01 10:07:00'
> 'user1', 'ready', '2017-01-01 10:08:00'
> 'user1', 'talking', '2017-01-01 10:10:00'
> 'user1', 'after_call', '2017-01-01 10:15:00'
> 'user1', 'paused', '2017-01-01 10:20:00'
> ...
> ...
>
> so as you see each new insert of an "event" is in fact the start_time of
> that event and also the end_time of the previous one so should be used to
> calculate the duration of this previous one.
>
> What is the best way to get user_status statistics like total duration,
> frequency, avg ...etc , does any body have an experience with this sort of
> data streams ?

Have you looked at temporal_tables extension? It seems custom made for
what you're trying to do.

http://clarkdave.net/2015/02/historical-records-with-postgresql-and-temporal-tables-and-sql-2011/


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] time series data

От
Khalil Khamlichi
Дата:
interesting proposition, I am reading the docs.

On Mon, Oct 2, 2017 at 6:08 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
On Sun, Oct 1, 2017 at 2:17 AM, Khalil Khamlichi
<khamlichi.khalil@gmail.com> wrote:
> Hi everyone,
>
> I have a data stream of a call center application coming in  to postgres in
> this format :
>
> user_name, user_status, event_time
>
> 'user1', 'ready', '2017-01-01 10:00:00'
> 'user1', 'talking', '2017-01-01 10:02:00'
> 'user1', 'after_call', '2017-01-01 10:07:00'
> 'user1', 'ready', '2017-01-01 10:08:00'
> 'user1', 'talking', '2017-01-01 10:10:00'
> 'user1', 'after_call', '2017-01-01 10:15:00'
> 'user1', 'paused', '2017-01-01 10:20:00'
> ...
> ...
>
> so as you see each new insert of an "event" is in fact the start_time of
> that event and also the end_time of the previous one so should be used to
> calculate the duration of this previous one.
>
> What is the best way to get user_status statistics like total duration,
> frequency, avg ...etc , does any body have an experience with this sort of
> data streams ?

Have you looked at temporal_tables extension? It seems custom made for
what you're trying to do.

http://clarkdave.net/2015/02/historical-records-with-postgresql-and-temporal-tables-and-sql-2011/

Re: [GENERAL] time series data

От
Khalil Khamlichi
Дата:
Hi Melvin,

Thanks a lot for your help,

let me explain to you my problem.

we have records like this

ccdb1=# select user_name, agent_status, event_time from cc_events ;
user_name | agent_status |     event_time
-----------+--------------+---------------------user1     | ready        | 2017-01-01 10:00:00user1     | talking
|2017-01-01 10:02:00user1     | after_call   | 2017-01-01 10:08:00user1     | ready        | 2017-01-01 10:10:00user1
 | talking      | 2017-01-01 10:12:00user1     | after_call   | 2017-01-01 10:15:00user1     | paused       |
2017-01-0110:17:00user1     | ready        | 2017-01-01 10:25:00user1     | talking      | 2017-01-01 10:26:00
 
(9 rows)


so user1 was READY at  2017-01-01 10:00:00 then he received a call
that he attended at 2017-01-01 10:02:00 and so on ...
so user1 was ready for 2 minutes, then he was talking for 6 minutes
then he was in after_call (doing after call work)  for 2 minutes and
this is the kind of information we want to query.

my solution so far that I came with, is in my table I have 1 more
field : end_time
so when an event comes in and before the insert I do :
update cc_events set end_time = current_timestamp  where user_name =
'user_of_event' and end_time is null;

then I insert new event leaving the end_time as null so that next
event will update it and so on.

its working fine, I have the start and end times for each event, its
not too painful to query (sum(end-start) while grouping by user_name,
agent_status), but its one more update on the table and also limited
in what you can query about,

I know this must be a common problem in every software that deals with
events, so I suppose something is already built-in in postgres to deal
with it.

I looked at your solution, it's very clever and we use something
similar but on another module where we manage live calls and route
them to available agents.

kkh


On Mon, Oct 2, 2017 at 4:06 PM, Melvin Davidson <melvin6925@gmail.com> wrote:
>
>
>
> On Sun, Oct 1, 2017 at 6:20 PM, Clifford Snow <clifford@snowandsnow.us> wrote:
>>
>> I have a stream that updates every minute with a trigger that updates another table with information from the
stream.That way I'm constantly updated with no need to run a script to update before I want a report.
 
>>
>> Clifford
>>
>> On Sun, Oct 1, 2017 at 10:08 AM, Melvin Davidson <melvin6925@gmail.com> wrote:
>>>
>>>
>>>
>>> On Sun, Oct 1, 2017 at 4:17 AM, Khalil Khamlichi <khamlichi.khalil@gmail.com> wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> I have a data stream of a call center application coming in  to postgres in this format :
>>>>
>>>> user_name, user_status, event_time
>>>>
>>>> 'user1', 'ready', '2017-01-01 10:00:00'
>>>> 'user1', 'talking', '2017-01-01 10:02:00'
>>>> 'user1', 'after_call', '2017-01-01 10:07:00'
>>>> 'user1', 'ready', '2017-01-01 10:08:00'
>>>> 'user1', 'talking', '2017-01-01 10:10:00'
>>>> 'user1', 'after_call', '2017-01-01 10:15:00'
>>>> 'user1', 'paused', '2017-01-01 10:20:00'
>>>> ...
>>>> ...
>>>>
>>>> so as you see each new insert of an "event" is in fact the start_time of that event and also the end_time of the
previousone so should be used to calculate the duration of this previous one.
 
>>>>
>>>> What is the best way to get user_status statistics like total duration, frequency, avg ...etc , does any body have
anexperience with this sort of data streams ?
 
>>>>
>>>>
>>>> Thanks in advance.
>>>
>>>
>>> Just a suggestion, but here is what I would do.
>>> First, create your tables similar to as follows
>>>
>>> CREATE TABLE status
>>> (
>>>  call_status    varchar(10) NOT NULL,
>>>  CONSTRAINT status_pk PRIMARY KEY (call_status)
>>> );
>>>
>>> INSERT INTO status
>>> (call_status)
>>> VALUES
>>> ('ready'),
>>> ('talking'),
>>> ('after_call');
>>>
>>> CREATE TABLE user_sessions
>>> (
>>>  username        name        NOT NULL,
>>>  session_id        bigint        NOT NULL,
>>>  call_status     varchar(10) NOT NULL,
>>>  call_time        timestamp NOT NULL,
>>>  CONSTRAINT user_sessions_pk PRIMARY KEY (username, session_id,call_status),
>>>  CONSTRAINT user_sessions_fk_status FOREIGN KEY (call_status)
>>>     REFERENCES status(call_status)
>>> );
>>>
>>> Next, you will need to generate a unique session_id for each
>>> user, but only for when call_status is 'ready'. So probably
>>> a table of the form:
>>>
>>> CREATE TABLE current_session
>>> (
>>> username        name        NOT NULL,
>>> session_id        serial        NOT NULL,
>>> CONSTRAINT current_session_pk PRIMARY KEY (username)
>>> );
>>>
>>> Then all you need to do is:
>>> 1. Update current_session and get the new session_id each time a user connects (call_status = 'ready'.
>>>    Probably best to use a BEFORE trigger to do this, but you will need to code it yourself.
>>>
>>> 2. You can then do
>>>
>>> SELECT username,
>>>        age ( (SELECT call_time FROM current_session WHERE call_status = 'talking'),
>>>               ( SELECT call_time FROM current_session WHERE call_status = 'after_call')
>>>            ) as duration
>>>   FROM user_sessions
>>>  WHERE username = 'actual_user_name'
>>>    AND session_id =  actual_session_id;
>>>
>>> You can use similar queries for avg and frequency.
>>>
>>> --
>>> Melvin Davidson
>>> I reserve the right to fantasize.  Whether or not you
>>> wish to share my fantasy is entirely up to you.
>>
>>
>>
>>
>> --
>> @osm_seattle
>> osm_seattle.snowandsnow.us
>> OpenStreetMap: Maps with a human touch
>
>
> I thought about the table design for user_sessions and came up
> with a better one:
>
> CREATE TABLE user_sessions
> (
>  username      name          NOT NULL,
>  session_id     bigint           NOT NULL,
>  call_status     varchar(10) NOT NULL,
>  call_ready      timestamp   NOT NULL,
>  call_talking     timestamp,
>  call_after_call timestamp,
>  call_duration   interval,
>  CONSTRAINT user_sessions_pk PRIMARY KEY (username, session_id),
>  CONSTRAINT user_sessions_fk_status FOREIGN KEY (call_status)
>     REFERENCES status(call_status)
> );
>
> So in essence, when the call starts, just do:
>
> INSERT INTO user_sessions
> (username, call_ready)
> VALUES
> ('actual_user_name', now() );
>
> Then
> SELECT max(session_id) AS current_session
>   FROM user_sessions
>  WHERE username = 'actual_user_name';
>
> When talking starts:
> UPDATE user_sessions
>    SET call_status = 'talking',
>        call_talking = now()
>  WHERE username = 'actual_user_name'
>    AND session_id = current_session;
>
> When call ends:
> UPDATE user_sessions
>    SET call_status = 'after_call',
>        call_after_call = now()
>  WHERE username = 'actual_user_name'
>    AND session_id = current_session;
>
> Now all you have to do to get call length is:
>
> SELECT username,
>        age ( call_after_call, call_talking ) as duration
>   FROM user_sessions
>  WHERE username = 'actual_user_name'
>    AND session_id =  current_session;
>


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] time series data

От
Nico Williams
Дата:
You have these choices:
- turn events into INSERTs and UPDATES on a table that represents a  single call
  You might have an events VIEW with INSTED OF insert/update triggers  so you can insert events as the interface for
updatingcalls.
 
- store the events and have a VIEW on the events table that gives you  rows that summarize each call
- both: store the events and the summaries of the calls
  You might have an events table with AFTER INSERT triggers to insert  or update the corresponding rows in the calls
table.

Nico
-- 


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] time series data

От
Schneider
Дата:
On Mon, Oct 2, 2017 at 10:27 AM, Khalil Khamlichi
<khamlichi.khalil@gmail.com> wrote:
> we have records like this
>
> ccdb1=# select user_name, agent_status, event_time from cc_events ;
>
>  user_name | agent_status |     event_time
> -----------+--------------+---------------------
>  user1     | ready        | 2017-01-01 10:00:00
>  user1     | talking      | 2017-01-01 10:02:00
>  user1     | after_call   | 2017-01-01 10:08:00
>  user1     | ready        | 2017-01-01 10:10:00
>  user1     | talking      | 2017-01-01 10:12:00
>  user1     | after_call   | 2017-01-01 10:15:00
>  user1     | paused       | 2017-01-01 10:17:00
>  user1     | ready        | 2017-01-01 10:25:00
>  user1     | talking      | 2017-01-01 10:26:00
> (9 rows)
>
>
> so user1 was READY at  2017-01-01 10:00:00 then he received a call
> that he attended at 2017-01-01 10:02:00 and so on ...
> so user1 was ready for 2 minutes, then he was talking for 6 minutes
> then he was in after_call (doing after call work)  for 2 minutes and
> this is the kind of information we want to query.
>
> my solution so far that I came with, is in my table I have 1 more
> field : end_time
> so when an event comes in and before the insert I do :
> update cc_events set end_time = current_timestamp  where user_name =
> 'user_of_event' and end_time is null;
>
> then I insert new event leaving the end_time as null so that next
> event will update it and so on.
>
> its working fine, I have the start and end times for each event, its
> not too painful to query (sum(end-start) while grouping by user_name,
> agent_status), but its one more update on the table and also limited
> in what you can query about,
>
> I know this must be a common problem in every software that deals with
> events, so I suppose something is already built-in in postgres to deal
> with it.

Khalil, changing your schema is one solution with certain benefits -
but it's definitely not necessary when you have the power of
PostgreSQL at your fingertips.  You can solve your problem without
changing anything at all.  :)

All you need is a window function:
https://www.postgresql.org/docs/9.6/static/tutorial-window.html

Here's an example which I just now tested on 9.6.3, 9.5.7, 9.4.12 and
9.3.17 (all the latest minors currently supported on RDS).  You can
try these queries on your own system; they should work anywhere.

First, I added a second user to your data set to make sure we were
handling that case correctly.

==========
create table cc_events (user_name varchar(8), agent_status
varchar(20), event_time timestamp);

insert into cc_events values
('user1', 'ready', '2017-01-01 10:00:00'),
('user1', 'talking', '2017-01-01 10:02:00'),
('user2', 'ready', '2017-01-01 10:04:00'),
('user2', 'talking', '2017-01-01 10:05:00'),
('user1', 'after_call', '2017-01-01 10:07:00'),
('user1', 'ready', '2017-01-01 10:08:00'),
('user1', 'talking', '2017-01-01 10:10:00'),
('user1', 'after_call', '2017-01-01 10:15:00'),
('user2', 'after_call', '2017-01-01 10:18:00'),
('user1', 'paused', '2017-01-01 10:20:00'),
('user2', 'paused', '2017-01-01 10:21:00');

select * from cc_events order by user_name, event_time;

==========
here's a basic window function in action:

==========
select user_name, agent_status, event_time, lead(event_time) over (partition by user_name order by event_time)
next_event_time
from cc_events order by event_time;
user_name | agent_status |     event_time      |   next_event_time
-----------+--------------+---------------------+---------------------user1     | ready        | 2017-01-01 10:00:00 |
2017-01-0110:02:00user1     | talking      | 2017-01-01 10:02:00 | 2017-01-01 10:07:00user2     | ready        |
2017-01-0110:04:00 | 2017-01-01 10:05:00user2     | talking      | 2017-01-01 10:05:00 | 2017-01-01 10:18:00user1     |
after_call  | 2017-01-01 10:07:00 | 2017-01-01 10:08:00user1     | ready        | 2017-01-01 10:08:00 | 2017-01-01
10:10:00user1    | talking      | 2017-01-01 10:10:00 | 2017-01-01 10:15:00user1     | after_call   | 2017-01-01
10:15:00| 2017-01-01 10:20:00user2     | after_call   | 2017-01-01 10:18:00 | 2017-01-01 10:21:00user1     | paused
 | 2017-01-01 10:20:00 |user2     | paused       | 2017-01-01 10:21:00 |
 

==========
and now we just add one more column which does the subtraction to
calculate the duration:

==========
select user_name, agent_status, event_time, lead(event_time) over (partition by user_name order by event_time)
next_event_time, (lead(event_time) over (partition by user_name order by event_time))
- event_time as duration
from cc_events order by event_time;

user_name | agent_status |     event_time      |   next_event_time   | duration
-----------+--------------+---------------------+---------------------+----------user1     | ready        | 2017-01-01
10:00:00| 2017-01-01 10:02:00 | 00:02:00user1     | talking      | 2017-01-01 10:02:00 | 2017-01-01 10:07:00 |
00:05:00user2    | ready        | 2017-01-01 10:04:00 | 2017-01-01 10:05:00 | 00:01:00user2     | talking      |
2017-01-0110:05:00 | 2017-01-01 10:18:00 | 00:13:00user1     | after_call   | 2017-01-01 10:07:00 | 2017-01-01 10:08:00
|00:01:00user1     | ready        | 2017-01-01 10:08:00 | 2017-01-01 10:10:00 | 00:02:00user1     | talking      |
2017-01-0110:10:00 | 2017-01-01 10:15:00 | 00:05:00user1     | after_call   | 2017-01-01 10:15:00 | 2017-01-01 10:20:00
|00:05:00user2     | after_call   | 2017-01-01 10:18:00 | 2017-01-01 10:21:00 | 00:03:00user1     | paused       |
2017-01-0110:20:00 |                     |user2     | paused       | 2017-01-01 10:21:00 |                     |
 

==========
it might also be convenient to wrap the window function in a common
table expression

https://www.postgresql.org/docs/9.6/static/queries-with.html

==========
with calculate_next_events as ( select user_name, agent_status, event_time,   lead(event_time) over (partition by
user_nameorder by event_time)
 
next_event_time from cc_events order by event_time
)
select user_name, agent_status, next_event_time-event_time duration
from calculate_next_events order by event_time;

==========
Finally, if you really want to supercharge this and power-up even
more, besides temporal databases you might check out this recent blog
post about implementing a state machine in postgresql... it's really
interesting and closely related to what you're solving.

https://felixge.de/2017/07/27/implementing-state-machines-in-postgresql.html

Hope this is helpful. Great to see that you're working on PostgreSQL -
it's a powerful engine to build with!

-Jeremy

-- 
http://about.me/jeremy_schneider


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] time series data

От
"Joshua D. Drake"
Дата:
On 10/01/2017 01:17 AM, Khalil Khamlichi wrote:
> Hi everyone,

Take a look at TimescaleDB they have an extension to Postgres that makes 
this awesome (and yes its free and open source).

jD

> 
> I have a data stream of a call center application coming in  to postgres 
> in this format :
> 
> user_name, user_status, event_time
> 
> 'user1', 'ready', '2017-01-01 10:00:00'
> 'user1', 'talking', '2017-01-01 10:02:00'
> 'user1', 'after_call', '2017-01-01 10:07:00'
> 'user1', 'ready', '2017-01-01 10:08:00'
> 'user1', 'talking', '2017-01-01 10:10:00'
> 'user1', 'after_call', '2017-01-01 10:15:00'
> 'user1', 'paused', '2017-01-01 10:20:00'
> ...
> ...
> 
> so as you see each new insert of an "event" is in fact the start_time of 
> that event and also the end_time of the previous one so should be used 
> to calculate the duration of this previous one.
> 
> What is the best way to get user_status statistics like total duration, 
> frequency, avg ...etc , does any body have an experience with this sort 
> of data streams ?
> 
> 
> Thanks in advance.


-- 
Command Prompt, Inc.                  http://the.postgres.company/                        +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] time series data

От
Khalil Khamlichi
Дата:
Thanks, I'll check it out.

Sent via mobile, please forgive typos and brevity

On Oct 14, 2017 3:23 PM, "Joshua D. Drake" <jd@commandprompt.com> wrote:
On 10/01/2017 01:17 AM, Khalil Khamlichi wrote:
Hi everyone,

Take a look at TimescaleDB they have an extension to Postgres that makes this awesome (and yes its free and open source).

jD


I have a data stream of a call center application coming in  to postgres in this format :

user_name, user_status, event_time

'user1', 'ready', '2017-01-01 10:00:00'
'user1', 'talking', '2017-01-01 10:02:00'
'user1', 'after_call', '2017-01-01 10:07:00'
'user1', 'ready', '2017-01-01 10:08:00'
'user1', 'talking', '2017-01-01 10:10:00'
'user1', 'after_call', '2017-01-01 10:15:00'
'user1', 'paused', '2017-01-01 10:20:00'
...
...

so as you see each new insert of an "event" is in fact the start_time of that event and also the end_time of the previous one so should be used to calculate the duration of this previous one.

What is the best way to get user_status statistics like total duration, frequency, avg ...etc , does any body have an experience with this sort of data streams ?


Thanks in advance.


--
Command Prompt, Inc.                  http://the.postgres.company/
                        +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.

Re: [GENERAL] time series data

От
Khalil Khamlichi
Дата:
Thanks a lot Jeremy, we ended up integrating the code you provided
into our software (just before you patent it) :)

Best regards,

Kkh

On Tue, Oct 3, 2017 at 7:58 PM, Schneider <schneider@ardentperf.com> wrote:
> On Mon, Oct 2, 2017 at 10:27 AM, Khalil Khamlichi
> <khamlichi.khalil@gmail.com> wrote:
>> we have records like this
>>
>> ccdb1=# select user_name, agent_status, event_time from cc_events ;
>>
>>  user_name | agent_status |     event_time
>> -----------+--------------+---------------------
>>  user1     | ready        | 2017-01-01 10:00:00
>>  user1     | talking      | 2017-01-01 10:02:00
>>  user1     | after_call   | 2017-01-01 10:08:00
>>  user1     | ready        | 2017-01-01 10:10:00
>>  user1     | talking      | 2017-01-01 10:12:00
>>  user1     | after_call   | 2017-01-01 10:15:00
>>  user1     | paused       | 2017-01-01 10:17:00
>>  user1     | ready        | 2017-01-01 10:25:00
>>  user1     | talking      | 2017-01-01 10:26:00
>> (9 rows)
>>
>>
>> so user1 was READY at  2017-01-01 10:00:00 then he received a call
>> that he attended at 2017-01-01 10:02:00 and so on ...
>> so user1 was ready for 2 minutes, then he was talking for 6 minutes
>> then he was in after_call (doing after call work)  for 2 minutes and
>> this is the kind of information we want to query.
>>
>> my solution so far that I came with, is in my table I have 1 more
>> field : end_time
>> so when an event comes in and before the insert I do :
>> update cc_events set end_time = current_timestamp  where user_name =
>> 'user_of_event' and end_time is null;
>>
>> then I insert new event leaving the end_time as null so that next
>> event will update it and so on.
>>
>> its working fine, I have the start and end times for each event, its
>> not too painful to query (sum(end-start) while grouping by user_name,
>> agent_status), but its one more update on the table and also limited
>> in what you can query about,
>>
>> I know this must be a common problem in every software that deals with
>> events, so I suppose something is already built-in in postgres to deal
>> with it.
>
> Khalil, changing your schema is one solution with certain benefits -
> but it's definitely not necessary when you have the power of
> PostgreSQL at your fingertips.  You can solve your problem without
> changing anything at all.  :)
>
> All you need is a window function:
> https://www.postgresql.org/docs/9.6/static/tutorial-window.html
>
> Here's an example which I just now tested on 9.6.3, 9.5.7, 9.4.12 and
> 9.3.17 (all the latest minors currently supported on RDS).  You can
> try these queries on your own system; they should work anywhere.
>
> First, I added a second user to your data set to make sure we were
> handling that case correctly.
>
> ==========
> create table cc_events (user_name varchar(8), agent_status
> varchar(20), event_time timestamp);
>
> insert into cc_events values
> ('user1', 'ready', '2017-01-01 10:00:00'),
> ('user1', 'talking', '2017-01-01 10:02:00'),
> ('user2', 'ready', '2017-01-01 10:04:00'),
> ('user2', 'talking', '2017-01-01 10:05:00'),
> ('user1', 'after_call', '2017-01-01 10:07:00'),
> ('user1', 'ready', '2017-01-01 10:08:00'),
> ('user1', 'talking', '2017-01-01 10:10:00'),
> ('user1', 'after_call', '2017-01-01 10:15:00'),
> ('user2', 'after_call', '2017-01-01 10:18:00'),
> ('user1', 'paused', '2017-01-01 10:20:00'),
> ('user2', 'paused', '2017-01-01 10:21:00');
>
> select * from cc_events order by user_name, event_time;
>
> ==========
> here's a basic window function in action:
>
> ==========
> select user_name, agent_status, event_time,
>   lead(event_time) over (partition by user_name order by event_time)
> next_event_time
> from cc_events order by event_time;
>
>  user_name | agent_status |     event_time      |   next_event_time
> -----------+--------------+---------------------+---------------------
>  user1     | ready        | 2017-01-01 10:00:00 | 2017-01-01 10:02:00
>  user1     | talking      | 2017-01-01 10:02:00 | 2017-01-01 10:07:00
>  user2     | ready        | 2017-01-01 10:04:00 | 2017-01-01 10:05:00
>  user2     | talking      | 2017-01-01 10:05:00 | 2017-01-01 10:18:00
>  user1     | after_call   | 2017-01-01 10:07:00 | 2017-01-01 10:08:00
>  user1     | ready        | 2017-01-01 10:08:00 | 2017-01-01 10:10:00
>  user1     | talking      | 2017-01-01 10:10:00 | 2017-01-01 10:15:00
>  user1     | after_call   | 2017-01-01 10:15:00 | 2017-01-01 10:20:00
>  user2     | after_call   | 2017-01-01 10:18:00 | 2017-01-01 10:21:00
>  user1     | paused       | 2017-01-01 10:20:00 |
>  user2     | paused       | 2017-01-01 10:21:00 |
>
> ==========
> and now we just add one more column which does the subtraction to
> calculate the duration:
>
> ==========
> select user_name, agent_status, event_time,
>   lead(event_time) over (partition by user_name order by event_time)
> next_event_time,
>   (lead(event_time) over (partition by user_name order by event_time))
> - event_time as duration
> from cc_events order by event_time;
>
> user_name | agent_status |     event_time      |   next_event_time   | duration
> -----------+--------------+---------------------+---------------------+----------
>  user1     | ready        | 2017-01-01 10:00:00 | 2017-01-01 10:02:00 | 00:02:00
>  user1     | talking      | 2017-01-01 10:02:00 | 2017-01-01 10:07:00 | 00:05:00
>  user2     | ready        | 2017-01-01 10:04:00 | 2017-01-01 10:05:00 | 00:01:00
>  user2     | talking      | 2017-01-01 10:05:00 | 2017-01-01 10:18:00 | 00:13:00
>  user1     | after_call   | 2017-01-01 10:07:00 | 2017-01-01 10:08:00 | 00:01:00
>  user1     | ready        | 2017-01-01 10:08:00 | 2017-01-01 10:10:00 | 00:02:00
>  user1     | talking      | 2017-01-01 10:10:00 | 2017-01-01 10:15:00 | 00:05:00
>  user1     | after_call   | 2017-01-01 10:15:00 | 2017-01-01 10:20:00 | 00:05:00
>  user2     | after_call   | 2017-01-01 10:18:00 | 2017-01-01 10:21:00 | 00:03:00
>  user1     | paused       | 2017-01-01 10:20:00 |                     |
>  user2     | paused       | 2017-01-01 10:21:00 |                     |
>
> ==========
> it might also be convenient to wrap the window function in a common
> table expression
>
> https://www.postgresql.org/docs/9.6/static/queries-with.html
>
> ==========
> with calculate_next_events as (
>   select user_name, agent_status, event_time,
>     lead(event_time) over (partition by user_name order by event_time)
> next_event_time
>   from cc_events order by event_time
> )
> select user_name, agent_status, next_event_time-event_time duration
> from calculate_next_events order by event_time;
>
> ==========
> Finally, if you really want to supercharge this and power-up even
> more, besides temporal databases you might check out this recent blog
> post about implementing a state machine in postgresql... it's really
> interesting and closely related to what you're solving.
>
> https://felixge.de/2017/07/27/implementing-state-machines-in-postgresql.html
>
> Hope this is helpful. Great to see that you're working on PostgreSQL -
> it's a powerful engine to build with!
>
> -Jeremy
>
> --
> http://about.me/jeremy_schneider


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general