Обсуждение: Re: Given a set of daterange, finding the continuous range thatincludes a particular date (aggregates)

Поиск

Список

Период

Сортировка

Re: Given a set of daterange, finding the continuous range thatincludes a particular date (aggregates)

От

Ken Tanzer

Дата:

24 февраля 2018 г., 05:11:51

On Fri, Feb 23, 2018 at 6:10 AM, mariusz <marius@mtvk.pl> wrote:

i guess, you can easily get max continuous range for each row with
something like this:

CREATE OR REPLACE FUNCTION
append_daterange(d1 daterange, d2 daterange)
RETURNS daterange
LANGUAGE sql
AS
$$
SELECT CASE WHEN d1 && d2 OR d1 -|- d2 THEN d1 + d2 ELSE d2 END;
$$;

CREATE AGGREGATE agg_daterange (daterange) (
sfunc = append_daterange,
stype = daterange
);

SELECT dr,
lower(agg_daterange(dr) OVER (ORDER BY dr ASC)),
upper(agg_daterange(dr) OVER (ORDER BY dr DESC))
FROM ...

above example is simplified to selecting only daterange column "dr" for
readability, which in your case should be something like

daterange(staff_assign_date,staff_assign_date_end,'[)')

please note that daterange would be normalized to [) format so upper()
above returns exactly your max "staff_assign_date_end" for each
continuous range when dateranges are created with '[)' format.

the key point is ... ELSE d2 in append_daterange() which starts with new
value each time that new value is discontinuous with agg's state value
and order in which rows are processed (ASC for lower of daterange, DESC
for upper of daterange).

unfortunately this involves reading all rows for "client_id" and
additional sorting for each window.
i recall reading that you already pay the price of reading all rows for
client_id anyway, so the only question is the cost of two additional
sorts (maybe extracting dateranges to subset on which to do windows and
rejoining result of continuous ranges to original set would help to
lower the cost).

Thank you, and I wanted to follow up on this. I couldn't quite get your example working as described, but I also ended up trying something very similar that got me very close but not quite there. Basically, I can see that it is correctly calculating the ranges (in the notices), but it is only returning the last range for each client. (Because I said PARTITION BY client_id).

So I'm not sure if I should be calling this differently, or if the function needs to work differently, or if this just isn't possible. Do I need to partition by something else, and if so what? I don't see what I could specify that would indicate a new episode.

Also, it's not clear to me how an aggregate might define/return different values within a partition. Although this must be possible, since functions like rank() and row_number() seem to do it.

Hoping there is something easy that can be tweaked here. See below for copy/pastable test stuff. It includes output from both functions. Both look to be returning the same results, which makes me wonder if my passing in a start date was a waste of time, though it seems to me it would be necessary.

Cheers,

Ken

BEGIN;
CREATE TEMP TABLE sample_data (
client_id INTEGER,
start_date DATE,
end_date DATE,
episode INTEGER -- Just a label, for clarity
);

INSERT INTO sample_data VALUES
(1,'1990-01-01','1990-12-31',1),
(1,'1991-01-01','1991-12-31',1),

(1,'2000-01-01','2000-12-31',2),
(1,'2001-01-01','2001-12-31',2),
(1,'2002-01-01','2002-12-31',2),
(1,'2003-01-01','2003-12-31',2),
(1,'2004-01-01','2004-12-31',2),
(1,'2005-01-01','2005-12-31',2),
(1,'2006-01-01','2006-12-31',2),

(1,'2014-01-01','2014-12-31',3),
(1,'2015-01-01','2015-12-31',3),
(1,'2017-06-30','2017-12-31',4),
(1,'2018-01-01',NULL,4),

(2,'2014-02-01','2015-01-31',1),
(2,'2015-02-01','2015-12-31',1),
(2,'2017-09-30','2018-01-31',2),
(2,'2018-02-01','2018-02-14',2)
;

CREATE OR REPLACE FUNCTION
append_daterange(d1 daterange, d2 daterange)
RETURNS daterange
LANGUAGE sql
AS
$$
SELECT CASE WHEN d1 && d2 OR d1 -|- d2 THEN d1 + d2 ELSE d2 END;
$$;

CREATE AGGREGATE agg_daterange (daterange) (
sfunc = append_daterange,
stype = daterange
);

CREATE OR REPLACE FUNCTION range_continuous_merge( daterange, daterange, date ) RETURNS daterange AS $$

DECLARE
res daterange;

BEGIN
res:= CASE
WHEN $1 IS NULL AND NOT $2 @> $3 THEN NULL
WHEN $1 IS NULL AND $2 @> $3 THEN $2
WHEN ($1 && $2) OR ($1 -|- $2) THEN $1 + $2
WHEN NOT $1 @> $3 THEN $2
ELSE $1
END;
RAISE NOTICE 'Inputs: %,%,%. Returning %',$1::text,$2::text,$3::text,res;
RETURN res;
END;
$$ LANGUAGE plpgsql STABLE;

CREATE AGGREGATE range_continuous( daterange, date ) (

sfunc = range_continuous_merge,
stype = daterange
-- initcond = '{0,0,0}'

);

SELECT client_id,episode,start_date,end_date,range_continuous(daterange(start_date,end_date,'[]'),start_date) OVER (PARTITION by client_id) FROM sample_data ;

SELECT client_id,episode,start_date,end_date,agg_daterange(daterange(start_date,end_date,'[]')) OVER (PARTITION by client_id) FROM sample_data ;

-- RANGE_CONTINUOUS_MERGE

NOTICE: Inputs: <NULL>,[1990-01-01,1991-01-01),1990-01-01. Returning [1990-01-01,1991-01-01)
NOTICE: Inputs: [1990-01-01,1991-01-01),[1991-01-01,1992-01-01),1991-01-01. Returning [1990-01-01,1992-01-01)
NOTICE: Inputs: [1990-01-01,1992-01-01),[2000-01-01,2001-01-01),2000-01-01. Returning [2000-01-01,2001-01-01)
NOTICE: Inputs: [2000-01-01,2001-01-01),[2001-01-01,2002-01-01),2001-01-01. Returning [2000-01-01,2002-01-01)
NOTICE: Inputs: [2000-01-01,2002-01-01),[2002-01-01,2003-01-01),2002-01-01. Returning [2000-01-01,2003-01-01)
NOTICE: Inputs: [2000-01-01,2003-01-01),[2003-01-01,2004-01-01),2003-01-01. Returning [2000-01-01,2004-01-01)
NOTICE: Inputs: [2000-01-01,2004-01-01),[2004-01-01,2005-01-01),2004-01-01. Returning [2000-01-01,2005-01-01)
NOTICE: Inputs: [2000-01-01,2005-01-01),[2005-01-01,2006-01-01),2005-01-01. Returning [2000-01-01,2006-01-01)
NOTICE: Inputs: [2000-01-01,2006-01-01),[2006-01-01,2007-01-01),2006-01-01. Returning [2000-01-01,2007-01-01)
NOTICE: Inputs: [2000-01-01,2007-01-01),[2014-01-01,2015-01-01),2014-01-01. Returning [2014-01-01,2015-01-01)
NOTICE: Inputs: [2014-01-01,2015-01-01),[2015-01-01,2016-01-01),2015-01-01. Returning [2014-01-01,2016-01-01)
NOTICE: Inputs: [2014-01-01,2016-01-01),[2017-06-30,2018-01-01),2017-06-30. Returning [2017-06-30,2018-01-01)
NOTICE: Inputs: [2017-06-30,2018-01-01),[2018-01-01,),2018-01-01. Returning [2017-06-30,)
NOTICE: Inputs: <NULL>,[2014-02-01,2015-02-01),2014-02-01. Returning [2014-02-01,2015-02-01)
NOTICE: Inputs: [2014-02-01,2015-02-01),[2015-02-01,2016-01-01),2015-02-01. Returning [2014-02-01,2016-01-01)
NOTICE: Inputs: [2014-02-01,2016-01-01),[2017-09-30,2018-02-01),2017-09-30. Returning [2017-09-30,2018-02-01)
NOTICE: Inputs: [2017-09-30,2018-02-01),[2018-02-01,2018-02-15),2018-02-01. Returning [2017-09-30,2018-02-15)

client_id | episode | start_date | end_date | range_continuous
-----------+---------+------------+------------+-------------------------
1 | 1 | 1990-01-01 | 1990-12-31 | [2017-06-30,)
1 | 1 | 1991-01-01 | 1991-12-31 | [2017-06-30,)
1 | 2 | 2000-01-01 | 2000-12-31 | [2017-06-30,)
1 | 2 | 2001-01-01 | 2001-12-31 | [2017-06-30,)
1 | 2 | 2002-01-01 | 2002-12-31 | [2017-06-30,)
1 | 2 | 2003-01-01 | 2003-12-31 | [2017-06-30,)
1 | 2 | 2004-01-01 | 2004-12-31 | [2017-06-30,)
1 | 2 | 2005-01-01 | 2005-12-31 | [2017-06-30,)
1 | 2 | 2006-01-01 | 2006-12-31 | [2017-06-30,)
1 | 3 | 2014-01-01 | 2014-12-31 | [2017-06-30,)
1 | 3 | 2015-01-01 | 2015-12-31 | [2017-06-30,)
1 | 4 | 2017-06-30 | 2017-12-31 | [2017-06-30,)
1 | 4 | 2018-01-01 | | [2017-06-30,)
2 | 1 | 2014-02-01 | 2015-01-31 | [2017-09-30,2018-02-15)
2 | 1 | 2015-02-01 | 2015-12-31 | [2017-09-30,2018-02-15)
2 | 2 | 2017-09-30 | 2018-01-31 | [2017-09-30,2018-02-15)
2 | 2 | 2018-02-01 | 2018-02-14 | [2017-09-30,2018-02-15)

(17 rows)

-- AGG_DATERANGE

client_id | episode | start_date | end_date | agg_daterange
-----------+---------+------------+------------+-------------------------
1 | 1 | 1990-01-01 | 1990-12-31 | [2017-06-30,)
1 | 1 | 1991-01-01 | 1991-12-31 | [2017-06-30,)
1 | 2 | 2000-01-01 | 2000-12-31 | [2017-06-30,)
1 | 2 | 2001-01-01 | 2001-12-31 | [2017-06-30,)
1 | 2 | 2002-01-01 | 2002-12-31 | [2017-06-30,)
1 | 2 | 2003-01-01 | 2003-12-31 | [2017-06-30,)
1 | 2 | 2004-01-01 | 2004-12-31 | [2017-06-30,)
1 | 2 | 2005-01-01 | 2005-12-31 | [2017-06-30,)
1 | 2 | 2006-01-01 | 2006-12-31 | [2017-06-30,)
1 | 3 | 2014-01-01 | 2014-12-31 | [2017-06-30,)
1 | 3 | 2015-01-01 | 2015-12-31 | [2017-06-30,)
1 | 4 | 2017-06-30 | 2017-12-31 | [2017-06-30,)
1 | 4 | 2018-01-01 | | [2017-06-30,)
2 | 1 | 2014-02-01 | 2015-01-31 | [2017-09-30,2018-02-15)
2 | 1 | 2015-02-01 | 2015-12-31 | [2017-09-30,2018-02-15)
2 | 2 | 2017-09-30 | 2018-01-31 | [2017-09-30,2018-02-15)
2 | 2 | 2018-02-01 | 2018-02-14 | [2017-09-30,2018-02-15)
(17 rows)

AGENCY Software

A Free Software data system

By and for non-profits

http://agency-software.org/

https://demo.agency-software.org/client

ken.tanzer@agency-software.org

(253) 245-3801

Subscribe to the mailing list to

learn more about AGENCY or

follow the discussion.

Re: Given a set of daterange, finding the continuous range thatincludes a particular date (aggregates)

От

mariusz

Дата:

27 февраля 2018 г., 12:03:02

On Fri, 2018-02-23 at 18:11 -0800, Ken Tanzer wrote:
>         On Fri, Feb 23, 2018 at 6:10 AM,
>         mariusz <marius@mtvk.pl> wrote: 
>         
>         
>         i guess, you can easily get max continuous range for each row
>         with
>         something like this:
>         
>         CREATE OR REPLACE FUNCTION
>                 append_daterange(d1 daterange, d2 daterange)
>                 RETURNS daterange
>                 LANGUAGE sql
>                 AS
>         $$
>                 SELECT CASE WHEN d1 && d2 OR d1 -|- d2 THEN d1 + d2
>         ELSE d2 END;
>         $$;
>         
>         CREATE AGGREGATE agg_daterange (daterange) (
>                 sfunc = append_daterange,
>                 stype = daterange
>         );
>         
>         SELECT dr,
>                lower(agg_daterange(dr) OVER (ORDER BY dr ASC)),
>                upper(agg_daterange(dr) OVER (ORDER BY dr DESC))
>           FROM ...
>         
>         above example is simplified to selecting only daterange column
>         "dr" for
>         readability, which in your case should be something like
>         
>         daterange(staff_assign_date,staff_assign_date_end,'[)')
>         
>         please note that daterange would be normalized to [) format so
>         upper()
>         above returns exactly your max "staff_assign_date_end" for
>         each
>         continuous range when dateranges are created with '[)' format.
>         
>         the key point is ... ELSE d2 in append_daterange() which
>         starts with new
>         value each time that new value is discontinuous with agg's
>         state value
>         and order in which rows are processed (ASC for lower of
>         daterange, DESC
>         for upper of daterange).
>         
>         unfortunately this involves reading all rows for "client_id"
>         and
>         additional sorting for each window.
>         i recall reading that you already pay the price of reading all
>         rows for
>         client_id anyway, so the only question is the cost of two
>         additional
>         sorts (maybe extracting dateranges to subset on which to do
>         windows and
>         rejoining result of continuous ranges to original set would
>         help to
>         lower the cost).
>         
> 
> Thank you, and I wanted to follow up on this.  I couldn't quite get
> your example working as described, but I also ended up trying
> something very similar that got me very close but not quite there.
> Basically, I can see that it is correctly calculating the ranges (in
> the notices), but it is only returning the last range for each client.
> (Because I said PARTITION BY client_id).
> 
> 
sorry for late replay, i was offline from sat to mon inclusive.
i may have previously added some confusion, so i'll try to explain what
i had in my mind. see below what my suggested query is.


> So I'm not sure if I should be calling this differently, or if the
> function needs to work differently, or if this just isn't possible.
> Do I need to partition by something else, and if so what?  I don't see
> what I could specify that would indicate a new episode.
> 
definitely you want to partition by client_id if you are calculating
this for multiple client_ids, but what matters here for each client_id
is order of dateranges in over()

> 
> Also, it's not clear to me how an aggregate might define/return
> different values within a partition.  Although this must be possible,
> since functions like rank() and row_number() seem to do it.
> 
that is because function is defined like agg, but what happens is that
its current state value (return value of agg function) is returned for
every row within given window without grouping the resultset
> 
> Hoping there is something easy that can be tweaked here.  See below
> for copy/pastable test stuff.  It includes output from both functions.
> Both look to be returning the same results, which makes me wonder if
> my passing in a start date was a waste of time, though it seems to me
> it would be necessary.
> 
> Cheers,
> Ken
> 
> 
> BEGIN;
> CREATE TEMP TABLE sample_data (
> client_id INTEGER,
> start_date DATE,
> end_date DATE,
> episode INTEGER -- Just a label, for clarity
> );
> 
> INSERT INTO sample_data VALUES
> (1,'1990-01-01','1990-12-31',1),
> (1,'1991-01-01','1991-12-31',1),
> 
> (1,'2000-01-01','2000-12-31',2),
> (1,'2001-01-01','2001-12-31',2),
> (1,'2002-01-01','2002-12-31',2),
> (1,'2003-01-01','2003-12-31',2),
> (1,'2004-01-01','2004-12-31',2),
> (1,'2005-01-01','2005-12-31',2),
> (1,'2006-01-01','2006-12-31',2),
> 
> (1,'2014-01-01','2014-12-31',3),
> (1,'2015-01-01','2015-12-31',3),
> (1,'2017-06-30','2017-12-31',4),
> (1,'2018-01-01',NULL,4),
> 
> (2,'2014-02-01','2015-01-31',1),
> (2,'2015-02-01','2015-12-31',1),
> (2,'2017-09-30','2018-01-31',2),
> (2,'2018-02-01','2018-02-14',2)
> ;
> 
> CREATE OR REPLACE FUNCTION
>         append_daterange(d1 daterange, d2 daterange)
>         RETURNS daterange
>         LANGUAGE sql
>         AS
> $$
>         SELECT CASE WHEN d1 && d2 OR d1 -|- d2 THEN d1 + d2 ELSE d2
> END;
> $$;
> 
> CREATE AGGREGATE agg_daterange (daterange) (
>         sfunc = append_daterange,
>         stype = daterange
> );
> 


based on your sample data see the query:

select client_id, d,
       daterange(lower(test_agg_daterange(d) 
                          over (partition by client_id
                                order by d)),               
                 upper(test_agg_daterange(d)
                         over (partition by client_id
                               order by d desc)),
                 '[)'
                ) as cont_range
  from (select *, daterange(start_date,end_date,'[]') as d
        from sample_data
       ) q
order by 1, 2;

 client_id |            d            |       cont_range        
-----------+-------------------------+-------------------------
         1 | [1990-01-01,1991-01-01) | [1990-01-01,1992-01-01)
         1 | [1991-01-01,1992-01-01) | [1990-01-01,1992-01-01)
         1 | [2000-01-01,2001-01-01) | [2000-01-01,2007-01-01)
         1 | [2001-01-01,2002-01-01) | [2000-01-01,2007-01-01)
         1 | [2002-01-01,2003-01-01) | [2000-01-01,2007-01-01)
         1 | [2003-01-01,2004-01-01) | [2000-01-01,2007-01-01)
         1 | [2004-01-01,2005-01-01) | [2000-01-01,2007-01-01)
         1 | [2005-01-01,2006-01-01) | [2000-01-01,2007-01-01)
         1 | [2006-01-01,2007-01-01) | [2000-01-01,2007-01-01)
         1 | [2014-01-01,2015-01-01) | [2014-01-01,2016-01-01)
         1 | [2015-01-01,2016-01-01) | [2014-01-01,2016-01-01)
         1 | [2017-06-30,2018-01-01) | [2017-06-30,)
         1 | [2018-01-01,)           | [2017-06-30,)
         2 | [2014-02-01,2015-02-01) | [2014-02-01,2016-01-01)
         2 | [2015-02-01,2016-01-01) | [2014-02-01,2016-01-01)
         2 | [2017-09-30,2018-02-01) | [2017-09-30,2018-02-15)
         2 | [2018-02-01,2018-02-15) | [2017-09-30,2018-02-15)
(17 rows)
  
and time to explain confusion i added previously regarding daterange
input format (i'm working with something similar, but with dateranges in
db, not add hoc generated from start and end dates).
you need to generate daterange(start_date,end_date,'[]') for -|-
operator to work correctly, but output of daterange will be normalized
to '[)' format so it's up to you to extract end_date of continuous range
(if you prefer dates over dateranges) which gonna be something like
(upper(cont_range)-'1 day'::interval)::date

above example is for showing how it works, returns max continuous range
containing given row,
you probably want:
select distinct on (client_id,cont_range)
or something like that to extract continuous ranges

first window (within lower() func) extends its initial (earliest) range
to right for every next row continuous with current range (and jumps to
new start on discontinuity), thus over(order by d ASC)
second window (within upper() func) extends its initial (latest) range
to left for every next row continuous with current range (and jumps to
new end on discontinuity), thus over(order by d DESC)
partition by client_id within each over() is to not mix client_ids'
dateranges


i guess that is more or less what you wanted, at least as i understood
you originally, if not than sorry for even more confusion or
unnecessarily explaining simple and obvious things you might already
know

regards,
mariusz jadczak


> CREATE OR REPLACE FUNCTION range_continuous_merge( daterange,
> daterange, date ) RETURNS daterange AS $$
> 
> DECLARE
>     res daterange;
> 
> BEGIN
> res:=  CASE
>     WHEN $1 IS NULL AND NOT $2 @> $3 THEN NULL
>     WHEN $1 IS NULL AND $2 @> $3 THEN $2
>     WHEN ($1 && $2) OR ($1 -|- $2) THEN $1 + $2
>     WHEN NOT $1 @> $3 THEN $2
>     ELSE $1
> END;
> RAISE NOTICE 'Inputs: %,%,%.  Returning %',$1::text,$2::text,
> $3::text,res;
> RETURN res;
> END;
> $$ LANGUAGE plpgsql STABLE;
> 
> CREATE AGGREGATE range_continuous( daterange, date ) (
> 
>     sfunc =  range_continuous_merge,
>     stype = daterange
> --  initcond = '{0,0,0}'
> 
> );
> 
> SELECT
> client_id,episode,start_date,end_date,range_continuous(daterange(start_date,end_date,'[]'),start_date) OVER
(PARTITIONby client_id) FROM sample_data ;
 
> 
> SELECT
> client_id,episode,start_date,end_date,agg_daterange(daterange(start_date,end_date,'[]')) OVER (PARTITION by
client_id)FROM sample_data ;
 
> 
> 
> 
> -- RANGE_CONTINUOUS_MERGE
> 
> NOTICE:  Inputs: <NULL>,[1990-01-01,1991-01-01),1990-01-01.  Returning
> [1990-01-01,1991-01-01)
> NOTICE:  Inputs:
> [1990-01-01,1991-01-01),[1991-01-01,1992-01-01),1991-01-01.  Returning
> [1990-01-01,1992-01-01)
> NOTICE:  Inputs:
> [1990-01-01,1992-01-01),[2000-01-01,2001-01-01),2000-01-01.  Returning
> [2000-01-01,2001-01-01)
> NOTICE:  Inputs:
> [2000-01-01,2001-01-01),[2001-01-01,2002-01-01),2001-01-01.  Returning
> [2000-01-01,2002-01-01)
> NOTICE:  Inputs:
> [2000-01-01,2002-01-01),[2002-01-01,2003-01-01),2002-01-01.  Returning
> [2000-01-01,2003-01-01)
> NOTICE:  Inputs:
> [2000-01-01,2003-01-01),[2003-01-01,2004-01-01),2003-01-01.  Returning
> [2000-01-01,2004-01-01)
> NOTICE:  Inputs:
> [2000-01-01,2004-01-01),[2004-01-01,2005-01-01),2004-01-01.  Returning
> [2000-01-01,2005-01-01)
> NOTICE:  Inputs:
> [2000-01-01,2005-01-01),[2005-01-01,2006-01-01),2005-01-01.  Returning
> [2000-01-01,2006-01-01)
> NOTICE:  Inputs:
> [2000-01-01,2006-01-01),[2006-01-01,2007-01-01),2006-01-01.  Returning
> [2000-01-01,2007-01-01)
> NOTICE:  Inputs:
> [2000-01-01,2007-01-01),[2014-01-01,2015-01-01),2014-01-01.  Returning
> [2014-01-01,2015-01-01)
> NOTICE:  Inputs:
> [2014-01-01,2015-01-01),[2015-01-01,2016-01-01),2015-01-01.  Returning
> [2014-01-01,2016-01-01)
> NOTICE:  Inputs:
> [2014-01-01,2016-01-01),[2017-06-30,2018-01-01),2017-06-30.  Returning
> [2017-06-30,2018-01-01)
> NOTICE:  Inputs: [2017-06-30,2018-01-01),[2018-01-01,),2018-01-01.
> Returning [2017-06-30,)
> NOTICE:  Inputs: <NULL>,[2014-02-01,2015-02-01),2014-02-01.  Returning
> [2014-02-01,2015-02-01)
> NOTICE:  Inputs:
> [2014-02-01,2015-02-01),[2015-02-01,2016-01-01),2015-02-01.  Returning
> [2014-02-01,2016-01-01)
> NOTICE:  Inputs:
> [2014-02-01,2016-01-01),[2017-09-30,2018-02-01),2017-09-30.  Returning
> [2017-09-30,2018-02-01)
> NOTICE:  Inputs:
> [2017-09-30,2018-02-01),[2018-02-01,2018-02-15),2018-02-01.  Returning
> [2017-09-30,2018-02-15)
> 
>  client_id | episode | start_date |  end_date  |    range_continuous
> -----------+---------+------------+------------+-------------------------
>          1 |       1 | 1990-01-01 | 1990-12-31 | [2017-06-30,)
>          1 |       1 | 1991-01-01 | 1991-12-31 | [2017-06-30,)
>          1 |       2 | 2000-01-01 | 2000-12-31 | [2017-06-30,)
>          1 |       2 | 2001-01-01 | 2001-12-31 | [2017-06-30,)
>          1 |       2 | 2002-01-01 | 2002-12-31 | [2017-06-30,)
>          1 |       2 | 2003-01-01 | 2003-12-31 | [2017-06-30,)
>          1 |       2 | 2004-01-01 | 2004-12-31 | [2017-06-30,)
>          1 |       2 | 2005-01-01 | 2005-12-31 | [2017-06-30,)
>          1 |       2 | 2006-01-01 | 2006-12-31 | [2017-06-30,)
>          1 |       3 | 2014-01-01 | 2014-12-31 | [2017-06-30,)
>          1 |       3 | 2015-01-01 | 2015-12-31 | [2017-06-30,)
>          1 |       4 | 2017-06-30 | 2017-12-31 | [2017-06-30,)
>          1 |       4 | 2018-01-01 |            | [2017-06-30,)
>          2 |       1 | 2014-02-01 | 2015-01-31 |
> [2017-09-30,2018-02-15)
>          2 |       1 | 2015-02-01 | 2015-12-31 |
> [2017-09-30,2018-02-15)
>          2 |       2 | 2017-09-30 | 2018-01-31 |
> [2017-09-30,2018-02-15)
>          2 |       2 | 2018-02-01 | 2018-02-14 |
> [2017-09-30,2018-02-15)
> 
> (17 rows)
> 
> 
> -- AGG_DATERANGE
> 
>  client_id | episode | start_date |  end_date  |      agg_daterange
>    
> -----------+---------+------------+------------+-------------------------
>          1 |       1 | 1990-01-01 | 1990-12-31 | [2017-06-30,)
>          1 |       1 | 1991-01-01 | 1991-12-31 | [2017-06-30,)
>          1 |       2 | 2000-01-01 | 2000-12-31 | [2017-06-30,)
>          1 |       2 | 2001-01-01 | 2001-12-31 | [2017-06-30,)
>          1 |       2 | 2002-01-01 | 2002-12-31 | [2017-06-30,)
>          1 |       2 | 2003-01-01 | 2003-12-31 | [2017-06-30,)
>          1 |       2 | 2004-01-01 | 2004-12-31 | [2017-06-30,)
>          1 |       2 | 2005-01-01 | 2005-12-31 | [2017-06-30,)
>          1 |       2 | 2006-01-01 | 2006-12-31 | [2017-06-30,)
>          1 |       3 | 2014-01-01 | 2014-12-31 | [2017-06-30,)
>          1 |       3 | 2015-01-01 | 2015-12-31 | [2017-06-30,)
>          1 |       4 | 2017-06-30 | 2017-12-31 | [2017-06-30,)
>          1 |       4 | 2018-01-01 |            | [2017-06-30,)
>          2 |       1 | 2014-02-01 | 2015-01-31 |
> [2017-09-30,2018-02-15)
>          2 |       1 | 2015-02-01 | 2015-12-31 |
> [2017-09-30,2018-02-15)
>          2 |       2 | 2017-09-30 | 2018-01-31 |
> [2017-09-30,2018-02-15)
>          2 |       2 | 2018-02-01 | 2018-02-14 |
> [2017-09-30,2018-02-15)
> (17 rows)
> 
> 
> 
> 
> 
> 
> -- 
> 
> AGENCY Software  
> A Free Software data system
> By and for non-profits
> http://agency-software.org/
> https://demo.agency-software.org/client
> 
> ken.tanzer@agency-software.org
> (253) 245-3801
> 
> 
> Subscribe to the mailing list to
> learn more about AGENCY or
> follow the discussion.

Re: Given a set of daterange, finding the continuous range thatincludes a particular date (aggregates)

От

mariusz

Дата:

27 февраля 2018 г., 12:16:22


On Tue, 2018-02-27 at 10:03 +0100, mariusz wrote:
> On Fri, 2018-02-23 at 18:11 -0800, Ken Tanzer wrote:
> >         On Fri, Feb 23, 2018 at 6:10 AM,
> >         mariusz <marius@mtvk.pl> wrote: 
> >         
> >         
> >         i guess, you can easily get max continuous range for each row
> >         with
> >         something like this:
> >         
> >         CREATE OR REPLACE FUNCTION
> >                 append_daterange(d1 daterange, d2 daterange)
> >                 RETURNS daterange
> >                 LANGUAGE sql
> >                 AS
> >         $$
> >                 SELECT CASE WHEN d1 && d2 OR d1 -|- d2 THEN d1 + d2
> >         ELSE d2 END;
> >         $$;
> >         
> >         CREATE AGGREGATE agg_daterange (daterange) (
> >                 sfunc = append_daterange,
> >                 stype = daterange
> >         );
> >         
> >         SELECT dr,
> >                lower(agg_daterange(dr) OVER (ORDER BY dr ASC)),
> >                upper(agg_daterange(dr) OVER (ORDER BY dr DESC))
> >           FROM ...
> >         
> >         above example is simplified to selecting only daterange column
> >         "dr" for
> >         readability, which in your case should be something like
> >         
> >         daterange(staff_assign_date,staff_assign_date_end,'[)')
> >         
> >         please note that daterange would be normalized to [) format so
> >         upper()
> >         above returns exactly your max "staff_assign_date_end" for
> >         each
> >         continuous range when dateranges are created with '[)' format.
> >         
> >         the key point is ... ELSE d2 in append_daterange() which
> >         starts with new
> >         value each time that new value is discontinuous with agg's
> >         state value
> >         and order in which rows are processed (ASC for lower of
> >         daterange, DESC
> >         for upper of daterange).
> >         
> >         unfortunately this involves reading all rows for "client_id"
> >         and
> >         additional sorting for each window.
> >         i recall reading that you already pay the price of reading all
> >         rows for
> >         client_id anyway, so the only question is the cost of two
> >         additional
> >         sorts (maybe extracting dateranges to subset on which to do
> >         windows and
> >         rejoining result of continuous ranges to original set would
> >         help to
> >         lower the cost).
> >         
> > 
> > Thank you, and I wanted to follow up on this.  I couldn't quite get
> > your example working as described, but I also ended up trying
> > something very similar that got me very close but not quite there.
> > Basically, I can see that it is correctly calculating the ranges (in
> > the notices), but it is only returning the last range for each client.
> > (Because I said PARTITION BY client_id).
> > 
> > 
> sorry for late replay, i was offline from sat to mon inclusive.
> i may have previously added some confusion, so i'll try to explain what
> i had in my mind. see below what my suggested query is.
> 
> 
> > So I'm not sure if I should be calling this differently, or if the
> > function needs to work differently, or if this just isn't possible.
> > Do I need to partition by something else, and if so what?  I don't see
> > what I could specify that would indicate a new episode.
> > 
> definitely you want to partition by client_id if you are calculating
> this for multiple client_ids, but what matters here for each client_id
> is order of dateranges in over()
> 
> > 
> > Also, it's not clear to me how an aggregate might define/return
> > different values within a partition.  Although this must be possible,
> > since functions like rank() and row_number() seem to do it.
> > 
> that is because function is defined like agg, but what happens is that
> its current state value (return value of agg function) is returned for
> every row within given window without grouping the resultset
> > 
> > Hoping there is something easy that can be tweaked here.  See below
> > for copy/pastable test stuff.  It includes output from both functions.
> > Both look to be returning the same results, which makes me wonder if
> > my passing in a start date was a waste of time, though it seems to me
> > it would be necessary.
> > 
> > Cheers,
> > Ken
> > 
> > 
> > BEGIN;
> > CREATE TEMP TABLE sample_data (
> > client_id INTEGER,
> > start_date DATE,
> > end_date DATE,
> > episode INTEGER -- Just a label, for clarity
> > );
> > 
> > INSERT INTO sample_data VALUES
> > (1,'1990-01-01','1990-12-31',1),
> > (1,'1991-01-01','1991-12-31',1),
> > 
> > (1,'2000-01-01','2000-12-31',2),
> > (1,'2001-01-01','2001-12-31',2),
> > (1,'2002-01-01','2002-12-31',2),
> > (1,'2003-01-01','2003-12-31',2),
> > (1,'2004-01-01','2004-12-31',2),
> > (1,'2005-01-01','2005-12-31',2),
> > (1,'2006-01-01','2006-12-31',2),
> > 
> > (1,'2014-01-01','2014-12-31',3),
> > (1,'2015-01-01','2015-12-31',3),
> > (1,'2017-06-30','2017-12-31',4),
> > (1,'2018-01-01',NULL,4),
> > 
> > (2,'2014-02-01','2015-01-31',1),
> > (2,'2015-02-01','2015-12-31',1),
> > (2,'2017-09-30','2018-01-31',2),
> > (2,'2018-02-01','2018-02-14',2)
> > ;
> > 
> > CREATE OR REPLACE FUNCTION
> >         append_daterange(d1 daterange, d2 daterange)
> >         RETURNS daterange
> >         LANGUAGE sql
> >         AS
> > $$
> >         SELECT CASE WHEN d1 && d2 OR d1 -|- d2 THEN d1 + d2 ELSE d2
> > END;
> > $$;
> > 
> > CREATE AGGREGATE agg_daterange (daterange) (
> >         sfunc = append_daterange,
> >         stype = daterange
> > );
> > 
> 
> 
> based on your sample data see the query:
> 
> select client_id, d,
>        daterange(lower(test_agg_daterange(d) 
>                           over (partition by client_id
>                                 order by d)),               
>                  upper(test_agg_daterange(d)
>                          over (partition by client_id
>                                order by d desc)),
>                  '[)'
>                 ) as cont_range
>   from (select *, daterange(start_date,end_date,'[]') as d
>         from sample_data
>        ) q
> order by 1, 2;
> 

that test_agg_daterange() is of course exactly the same as
agg_daterange() agg above, sorry for not cleaning enough my examples
lazily edited and copied from my history in psql

regards, mj


>  client_id |            d            |       cont_range        
> -----------+-------------------------+-------------------------
>          1 | [1990-01-01,1991-01-01) | [1990-01-01,1992-01-01)
>          1 | [1991-01-01,1992-01-01) | [1990-01-01,1992-01-01)
>          1 | [2000-01-01,2001-01-01) | [2000-01-01,2007-01-01)
>          1 | [2001-01-01,2002-01-01) | [2000-01-01,2007-01-01)
>          1 | [2002-01-01,2003-01-01) | [2000-01-01,2007-01-01)
>          1 | [2003-01-01,2004-01-01) | [2000-01-01,2007-01-01)
>          1 | [2004-01-01,2005-01-01) | [2000-01-01,2007-01-01)
>          1 | [2005-01-01,2006-01-01) | [2000-01-01,2007-01-01)
>          1 | [2006-01-01,2007-01-01) | [2000-01-01,2007-01-01)
>          1 | [2014-01-01,2015-01-01) | [2014-01-01,2016-01-01)
>          1 | [2015-01-01,2016-01-01) | [2014-01-01,2016-01-01)
>          1 | [2017-06-30,2018-01-01) | [2017-06-30,)
>          1 | [2018-01-01,)           | [2017-06-30,)
>          2 | [2014-02-01,2015-02-01) | [2014-02-01,2016-01-01)
>          2 | [2015-02-01,2016-01-01) | [2014-02-01,2016-01-01)
>          2 | [2017-09-30,2018-02-01) | [2017-09-30,2018-02-15)
>          2 | [2018-02-01,2018-02-15) | [2017-09-30,2018-02-15)
> (17 rows)
>   
> and time to explain confusion i added previously regarding daterange
> input format (i'm working with something similar, but with dateranges in
> db, not add hoc generated from start and end dates).
> you need to generate daterange(start_date,end_date,'[]') for -|-
> operator to work correctly, but output of daterange will be normalized
> to '[)' format so it's up to you to extract end_date of continuous range
> (if you prefer dates over dateranges) which gonna be something like
> (upper(cont_range)-'1 day'::interval)::date
> 
> above example is for showing how it works, returns max continuous range
> containing given row,
> you probably want:
> select distinct on (client_id,cont_range)
> or something like that to extract continuous ranges
> 
> first window (within lower() func) extends its initial (earliest) range
> to right for every next row continuous with current range (and jumps to
> new start on discontinuity), thus over(order by d ASC)
> second window (within upper() func) extends its initial (latest) range
> to left for every next row continuous with current range (and jumps to
> new end on discontinuity), thus over(order by d DESC)
> partition by client_id within each over() is to not mix client_ids'
> dateranges
> 
> 
> i guess that is more or less what you wanted, at least as i understood
> you originally, if not than sorry for even more confusion or
> unnecessarily explaining simple and obvious things you might already
> know
> 
> regards,
> mariusz jadczak
> 
> 
> > CREATE OR REPLACE FUNCTION range_continuous_merge( daterange,
> > daterange, date ) RETURNS daterange AS $$
> > 
> > DECLARE
> >     res daterange;
> > 
> > BEGIN
> > res:=  CASE
> >     WHEN $1 IS NULL AND NOT $2 @> $3 THEN NULL
> >     WHEN $1 IS NULL AND $2 @> $3 THEN $2
> >     WHEN ($1 && $2) OR ($1 -|- $2) THEN $1 + $2
> >     WHEN NOT $1 @> $3 THEN $2
> >     ELSE $1
> > END;
> > RAISE NOTICE 'Inputs: %,%,%.  Returning %',$1::text,$2::text,
> > $3::text,res;
> > RETURN res;
> > END;
> > $$ LANGUAGE plpgsql STABLE;
> > 
> > CREATE AGGREGATE range_continuous( daterange, date ) (
> > 
> >     sfunc =  range_continuous_merge,
> >     stype = daterange
> > --  initcond = '{0,0,0}'
> > 
> > );
> > 
> > SELECT
> > client_id,episode,start_date,end_date,range_continuous(daterange(start_date,end_date,'[]'),start_date) OVER
(PARTITIONby client_id) FROM sample_data ;
 
> > 
> > SELECT
> > client_id,episode,start_date,end_date,agg_daterange(daterange(start_date,end_date,'[]')) OVER (PARTITION by
client_id)FROM sample_data ;
 
> > 
> > 
> > 
> > -- RANGE_CONTINUOUS_MERGE
> > 
> > NOTICE:  Inputs: <NULL>,[1990-01-01,1991-01-01),1990-01-01.  Returning
> > [1990-01-01,1991-01-01)
> > NOTICE:  Inputs:
> > [1990-01-01,1991-01-01),[1991-01-01,1992-01-01),1991-01-01.  Returning
> > [1990-01-01,1992-01-01)
> > NOTICE:  Inputs:
> > [1990-01-01,1992-01-01),[2000-01-01,2001-01-01),2000-01-01.  Returning
> > [2000-01-01,2001-01-01)
> > NOTICE:  Inputs:
> > [2000-01-01,2001-01-01),[2001-01-01,2002-01-01),2001-01-01.  Returning
> > [2000-01-01,2002-01-01)
> > NOTICE:  Inputs:
> > [2000-01-01,2002-01-01),[2002-01-01,2003-01-01),2002-01-01.  Returning
> > [2000-01-01,2003-01-01)
> > NOTICE:  Inputs:
> > [2000-01-01,2003-01-01),[2003-01-01,2004-01-01),2003-01-01.  Returning
> > [2000-01-01,2004-01-01)
> > NOTICE:  Inputs:
> > [2000-01-01,2004-01-01),[2004-01-01,2005-01-01),2004-01-01.  Returning
> > [2000-01-01,2005-01-01)
> > NOTICE:  Inputs:
> > [2000-01-01,2005-01-01),[2005-01-01,2006-01-01),2005-01-01.  Returning
> > [2000-01-01,2006-01-01)
> > NOTICE:  Inputs:
> > [2000-01-01,2006-01-01),[2006-01-01,2007-01-01),2006-01-01.  Returning
> > [2000-01-01,2007-01-01)
> > NOTICE:  Inputs:
> > [2000-01-01,2007-01-01),[2014-01-01,2015-01-01),2014-01-01.  Returning
> > [2014-01-01,2015-01-01)
> > NOTICE:  Inputs:
> > [2014-01-01,2015-01-01),[2015-01-01,2016-01-01),2015-01-01.  Returning
> > [2014-01-01,2016-01-01)
> > NOTICE:  Inputs:
> > [2014-01-01,2016-01-01),[2017-06-30,2018-01-01),2017-06-30.  Returning
> > [2017-06-30,2018-01-01)
> > NOTICE:  Inputs: [2017-06-30,2018-01-01),[2018-01-01,),2018-01-01.
> > Returning [2017-06-30,)
> > NOTICE:  Inputs: <NULL>,[2014-02-01,2015-02-01),2014-02-01.  Returning
> > [2014-02-01,2015-02-01)
> > NOTICE:  Inputs:
> > [2014-02-01,2015-02-01),[2015-02-01,2016-01-01),2015-02-01.  Returning
> > [2014-02-01,2016-01-01)
> > NOTICE:  Inputs:
> > [2014-02-01,2016-01-01),[2017-09-30,2018-02-01),2017-09-30.  Returning
> > [2017-09-30,2018-02-01)
> > NOTICE:  Inputs:
> > [2017-09-30,2018-02-01),[2018-02-01,2018-02-15),2018-02-01.  Returning
> > [2017-09-30,2018-02-15)
> > 
> >  client_id | episode | start_date |  end_date  |    range_continuous
> > -----------+---------+------------+------------+-------------------------
> >          1 |       1 | 1990-01-01 | 1990-12-31 | [2017-06-30,)
> >          1 |       1 | 1991-01-01 | 1991-12-31 | [2017-06-30,)
> >          1 |       2 | 2000-01-01 | 2000-12-31 | [2017-06-30,)
> >          1 |       2 | 2001-01-01 | 2001-12-31 | [2017-06-30,)
> >          1 |       2 | 2002-01-01 | 2002-12-31 | [2017-06-30,)
> >          1 |       2 | 2003-01-01 | 2003-12-31 | [2017-06-30,)
> >          1 |       2 | 2004-01-01 | 2004-12-31 | [2017-06-30,)
> >          1 |       2 | 2005-01-01 | 2005-12-31 | [2017-06-30,)
> >          1 |       2 | 2006-01-01 | 2006-12-31 | [2017-06-30,)
> >          1 |       3 | 2014-01-01 | 2014-12-31 | [2017-06-30,)
> >          1 |       3 | 2015-01-01 | 2015-12-31 | [2017-06-30,)
> >          1 |       4 | 2017-06-30 | 2017-12-31 | [2017-06-30,)
> >          1 |       4 | 2018-01-01 |            | [2017-06-30,)
> >          2 |       1 | 2014-02-01 | 2015-01-31 |
> > [2017-09-30,2018-02-15)
> >          2 |       1 | 2015-02-01 | 2015-12-31 |
> > [2017-09-30,2018-02-15)
> >          2 |       2 | 2017-09-30 | 2018-01-31 |
> > [2017-09-30,2018-02-15)
> >          2 |       2 | 2018-02-01 | 2018-02-14 |
> > [2017-09-30,2018-02-15)
> > 
> > (17 rows)
> > 
> > 
> > -- AGG_DATERANGE
> > 
> >  client_id | episode | start_date |  end_date  |      agg_daterange
> >    
> > -----------+---------+------------+------------+-------------------------
> >          1 |       1 | 1990-01-01 | 1990-12-31 | [2017-06-30,)
> >          1 |       1 | 1991-01-01 | 1991-12-31 | [2017-06-30,)
> >          1 |       2 | 2000-01-01 | 2000-12-31 | [2017-06-30,)
> >          1 |       2 | 2001-01-01 | 2001-12-31 | [2017-06-30,)
> >          1 |       2 | 2002-01-01 | 2002-12-31 | [2017-06-30,)
> >          1 |       2 | 2003-01-01 | 2003-12-31 | [2017-06-30,)
> >          1 |       2 | 2004-01-01 | 2004-12-31 | [2017-06-30,)
> >          1 |       2 | 2005-01-01 | 2005-12-31 | [2017-06-30,)
> >          1 |       2 | 2006-01-01 | 2006-12-31 | [2017-06-30,)
> >          1 |       3 | 2014-01-01 | 2014-12-31 | [2017-06-30,)
> >          1 |       3 | 2015-01-01 | 2015-12-31 | [2017-06-30,)
> >          1 |       4 | 2017-06-30 | 2017-12-31 | [2017-06-30,)
> >          1 |       4 | 2018-01-01 |            | [2017-06-30,)
> >          2 |       1 | 2014-02-01 | 2015-01-31 |
> > [2017-09-30,2018-02-15)
> >          2 |       1 | 2015-02-01 | 2015-12-31 |
> > [2017-09-30,2018-02-15)
> >          2 |       2 | 2017-09-30 | 2018-01-31 |
> > [2017-09-30,2018-02-15)
> >          2 |       2 | 2018-02-01 | 2018-02-14 |
> > [2017-09-30,2018-02-15)
> > (17 rows)
> > 
> > 
> > 
> > 
> > 
> > 
> > -- 
> > 
> > AGENCY Software  
> > A Free Software data system
> > By and for non-profits
> > http://agency-software.org/
> > https://demo.agency-software.org/client
> > 
> > ken.tanzer@agency-software.org
> > (253) 245-3801
> > 
> > 
> > Subscribe to the mailing list to
> > learn more about AGENCY or
> > follow the discussion.
> 
> 
>

Re: Given a set of daterange, finding the continuous range thatincludes a particular date (aggregates)

От

mariusz

Дата:

27 февраля 2018 г., 14:26:10

hello,

one more fix, to not let someone get incorrect/incomplete ideas, see
below


On Tue, 2018-02-27 at 10:03 +0100, mariusz wrote:
> On Fri, 2018-02-23 at 18:11 -0800, Ken Tanzer wrote:
> >         On Fri, Feb 23, 2018 at 6:10 AM,
> >         mariusz <marius@mtvk.pl> wrote: 
> >         
> >         
> >         i guess, you can easily get max continuous range for each row
> >         with
> >         something like this:
> >         
> >         CREATE OR REPLACE FUNCTION
> >                 append_daterange(d1 daterange, d2 daterange)
> >                 RETURNS daterange
> >                 LANGUAGE sql
> >                 AS
> >         $$
> >                 SELECT CASE WHEN d1 && d2 OR d1 -|- d2 THEN d1 + d2
> >         ELSE d2 END;
> >         $$;
> >         
> >         CREATE AGGREGATE agg_daterange (daterange) (
> >                 sfunc = append_daterange,
> >                 stype = daterange
> >         );
> >         
> >         SELECT dr,
> >                lower(agg_daterange(dr) OVER (ORDER BY dr ASC)),
> >                upper(agg_daterange(dr) OVER (ORDER BY dr DESC))
> >           FROM ...
> >         
> >         above example is simplified to selecting only daterange column
> >         "dr" for
> >         readability, which in your case should be something like
> >         
> >         daterange(staff_assign_date,staff_assign_date_end,'[)')
> >         
> >         please note that daterange would be normalized to [) format so
> >         upper()
> >         above returns exactly your max "staff_assign_date_end" for
> >         each
> >         continuous range when dateranges are created with '[)' format.
> >         
> >         the key point is ... ELSE d2 in append_daterange() which
> >         starts with new
> >         value each time that new value is discontinuous with agg's
> >         state value
> >         and order in which rows are processed (ASC for lower of
> >         daterange, DESC
> >         for upper of daterange).
> >         
> >         unfortunately this involves reading all rows for "client_id"
> >         and
> >         additional sorting for each window.
> >         i recall reading that you already pay the price of reading all
> >         rows for
> >         client_id anyway, so the only question is the cost of two
> >         additional
> >         sorts (maybe extracting dateranges to subset on which to do
> >         windows and
> >         rejoining result of continuous ranges to original set would
> >         help to
> >         lower the cost).
> >         
> > 
> > Thank you, and I wanted to follow up on this.  I couldn't quite get
> > your example working as described, but I also ended up trying
> > something very similar that got me very close but not quite there.
> > Basically, I can see that it is correctly calculating the ranges (in
> > the notices), but it is only returning the last range for each client.
> > (Because I said PARTITION BY client_id).
> > 
> > 
> sorry for late replay, i was offline from sat to mon inclusive.
> i may have previously added some confusion, so i'll try to explain what
> i had in my mind. see below what my suggested query is.
> 
> 
> > So I'm not sure if I should be calling this differently, or if the
> > function needs to work differently, or if this just isn't possible.
> > Do I need to partition by something else, and if so what?  I don't see
> > what I could specify that would indicate a new episode.
> > 
> definitely you want to partition by client_id if you are calculating
> this for multiple client_ids, but what matters here for each client_id
> is order of dateranges in over()
> 
> > 
> > Also, it's not clear to me how an aggregate might define/return
> > different values within a partition.  Although this must be possible,
> > since functions like rank() and row_number() seem to do it.
> > 
> that is because function is defined like agg, but what happens is that
> its current state value (return value of agg function) is returned for
> every row within given window without grouping the resultset
> > 
> > Hoping there is something easy that can be tweaked here.  See below
> > for copy/pastable test stuff.  It includes output from both functions.
> > Both look to be returning the same results, which makes me wonder if
> > my passing in a start date was a waste of time, though it seems to me
> > it would be necessary.
> > 
> > Cheers,
> > Ken
> > 
> > 
> > BEGIN;
> > CREATE TEMP TABLE sample_data (
> > client_id INTEGER,
> > start_date DATE,
> > end_date DATE,
> > episode INTEGER -- Just a label, for clarity
> > );
> > 
> > INSERT INTO sample_data VALUES
> > (1,'1990-01-01','1990-12-31',1),
> > (1,'1991-01-01','1991-12-31',1),
> > 
> > (1,'2000-01-01','2000-12-31',2),
> > (1,'2001-01-01','2001-12-31',2),
> > (1,'2002-01-01','2002-12-31',2),
> > (1,'2003-01-01','2003-12-31',2),
> > (1,'2004-01-01','2004-12-31',2),
> > (1,'2005-01-01','2005-12-31',2),
> > (1,'2006-01-01','2006-12-31',2),
> > 
> > (1,'2014-01-01','2014-12-31',3),
> > (1,'2015-01-01','2015-12-31',3),
> > (1,'2017-06-30','2017-12-31',4),
> > (1,'2018-01-01',NULL,4),
> > 
> > (2,'2014-02-01','2015-01-31',1),
> > (2,'2015-02-01','2015-12-31',1),
> > (2,'2017-09-30','2018-01-31',2),
> > (2,'2018-02-01','2018-02-14',2)
> > ;
> > 
> > CREATE OR REPLACE FUNCTION
> >         append_daterange(d1 daterange, d2 daterange)
> >         RETURNS daterange
> >         LANGUAGE sql
> >         AS
> > $$
> >         SELECT CASE WHEN d1 && d2 OR d1 -|- d2 THEN d1 + d2 ELSE d2
> > END;
> > $$;
> > 
> > CREATE AGGREGATE agg_daterange (daterange) (
> >         sfunc = append_daterange,
> >         stype = daterange
> > );
> > 
> 
> 
> based on your sample data see the query:
> 
> select client_id, d,
>        daterange(lower(test_agg_daterange(d) 
>                           over (partition by client_id
>                                 order by d)),               
>                  upper(test_agg_daterange(d)
>                          over (partition by client_id
>                                order by d desc)),
>                  '[)'
>                 ) as cont_range
>   from (select *, daterange(start_date,end_date,'[]') as d
>         from sample_data
>        ) q
> order by 1, 2;


that should work with your non-overlapping ranges, should work with
overlapping ranges except for ranges strictly included within another.
that might not apply to your case, but for safety (we can't say who
gonna read this and get some ideas), i feel it's better to correct
second window, that within upper() to:

OVER (PARTITION BY client_id ORDER BY end_date DESC)

likewise, first window (within lower()) could be sorted by start_date
asc, but that's cosmetic change, while for upper bound of range it does
matter

regards, mj


>  client_id |            d            |       cont_range        
> -----------+-------------------------+-------------------------
>          1 | [1990-01-01,1991-01-01) | [1990-01-01,1992-01-01)
>          1 | [1991-01-01,1992-01-01) | [1990-01-01,1992-01-01)
>          1 | [2000-01-01,2001-01-01) | [2000-01-01,2007-01-01)
>          1 | [2001-01-01,2002-01-01) | [2000-01-01,2007-01-01)
>          1 | [2002-01-01,2003-01-01) | [2000-01-01,2007-01-01)
>          1 | [2003-01-01,2004-01-01) | [2000-01-01,2007-01-01)
>          1 | [2004-01-01,2005-01-01) | [2000-01-01,2007-01-01)
>          1 | [2005-01-01,2006-01-01) | [2000-01-01,2007-01-01)
>          1 | [2006-01-01,2007-01-01) | [2000-01-01,2007-01-01)
>          1 | [2014-01-01,2015-01-01) | [2014-01-01,2016-01-01)
>          1 | [2015-01-01,2016-01-01) | [2014-01-01,2016-01-01)
>          1 | [2017-06-30,2018-01-01) | [2017-06-30,)
>          1 | [2018-01-01,)           | [2017-06-30,)
>          2 | [2014-02-01,2015-02-01) | [2014-02-01,2016-01-01)
>          2 | [2015-02-01,2016-01-01) | [2014-02-01,2016-01-01)
>          2 | [2017-09-30,2018-02-01) | [2017-09-30,2018-02-15)
>          2 | [2018-02-01,2018-02-15) | [2017-09-30,2018-02-15)
> (17 rows)
>   
> and time to explain confusion i added previously regarding daterange
> input format (i'm working with something similar, but with dateranges in
> db, not add hoc generated from start and end dates).
> you need to generate daterange(start_date,end_date,'[]') for -|-
> operator to work correctly, but output of daterange will be normalized
> to '[)' format so it's up to you to extract end_date of continuous range
> (if you prefer dates over dateranges) which gonna be something like
> (upper(cont_range)-'1 day'::interval)::date
> 
> above example is for showing how it works, returns max continuous range
> containing given row,
> you probably want:
> select distinct on (client_id,cont_range)
> or something like that to extract continuous ranges
> 
> first window (within lower() func) extends its initial (earliest) range
> to right for every next row continuous with current range (and jumps to
> new start on discontinuity), thus over(order by d ASC)
> second window (within upper() func) extends its initial (latest) range
> to left for every next row continuous with current range (and jumps to
> new end on discontinuity), thus over(order by d DESC)
> partition by client_id within each over() is to not mix client_ids'
> dateranges
> 
> 
> i guess that is more or less what you wanted, at least as i understood
> you originally, if not than sorry for even more confusion or
> unnecessarily explaining simple and obvious things you might already
> know
> 
> regards,
> mariusz jadczak
> 
> 
> > CREATE OR REPLACE FUNCTION range_continuous_merge( daterange,
> > daterange, date ) RETURNS daterange AS $$
> > 
> > DECLARE
> >     res daterange;
> > 
> > BEGIN
> > res:=  CASE
> >     WHEN $1 IS NULL AND NOT $2 @> $3 THEN NULL
> >     WHEN $1 IS NULL AND $2 @> $3 THEN $2
> >     WHEN ($1 && $2) OR ($1 -|- $2) THEN $1 + $2
> >     WHEN NOT $1 @> $3 THEN $2
> >     ELSE $1
> > END;
> > RAISE NOTICE 'Inputs: %,%,%.  Returning %',$1::text,$2::text,
> > $3::text,res;
> > RETURN res;
> > END;
> > $$ LANGUAGE plpgsql STABLE;
> > 
> > CREATE AGGREGATE range_continuous( daterange, date ) (
> > 
> >     sfunc =  range_continuous_merge,
> >     stype = daterange
> > --  initcond = '{0,0,0}'
> > 
> > );
> > 
> > SELECT
> > client_id,episode,start_date,end_date,range_continuous(daterange(start_date,end_date,'[]'),start_date) OVER
(PARTITIONby client_id) FROM sample_data ;
 
> > 
> > SELECT
> > client_id,episode,start_date,end_date,agg_daterange(daterange(start_date,end_date,'[]')) OVER (PARTITION by
client_id)FROM sample_data ;
 
> > 
> > 
> > 
> > -- RANGE_CONTINUOUS_MERGE
> > 
> > NOTICE:  Inputs: <NULL>,[1990-01-01,1991-01-01),1990-01-01.  Returning
> > [1990-01-01,1991-01-01)
> > NOTICE:  Inputs:
> > [1990-01-01,1991-01-01),[1991-01-01,1992-01-01),1991-01-01.  Returning
> > [1990-01-01,1992-01-01)
> > NOTICE:  Inputs:
> > [1990-01-01,1992-01-01),[2000-01-01,2001-01-01),2000-01-01.  Returning
> > [2000-01-01,2001-01-01)
> > NOTICE:  Inputs:
> > [2000-01-01,2001-01-01),[2001-01-01,2002-01-01),2001-01-01.  Returning
> > [2000-01-01,2002-01-01)
> > NOTICE:  Inputs:
> > [2000-01-01,2002-01-01),[2002-01-01,2003-01-01),2002-01-01.  Returning
> > [2000-01-01,2003-01-01)
> > NOTICE:  Inputs:
> > [2000-01-01,2003-01-01),[2003-01-01,2004-01-01),2003-01-01.  Returning
> > [2000-01-01,2004-01-01)
> > NOTICE:  Inputs:
> > [2000-01-01,2004-01-01),[2004-01-01,2005-01-01),2004-01-01.  Returning
> > [2000-01-01,2005-01-01)
> > NOTICE:  Inputs:
> > [2000-01-01,2005-01-01),[2005-01-01,2006-01-01),2005-01-01.  Returning
> > [2000-01-01,2006-01-01)
> > NOTICE:  Inputs:
> > [2000-01-01,2006-01-01),[2006-01-01,2007-01-01),2006-01-01.  Returning
> > [2000-01-01,2007-01-01)
> > NOTICE:  Inputs:
> > [2000-01-01,2007-01-01),[2014-01-01,2015-01-01),2014-01-01.  Returning
> > [2014-01-01,2015-01-01)
> > NOTICE:  Inputs:
> > [2014-01-01,2015-01-01),[2015-01-01,2016-01-01),2015-01-01.  Returning
> > [2014-01-01,2016-01-01)
> > NOTICE:  Inputs:
> > [2014-01-01,2016-01-01),[2017-06-30,2018-01-01),2017-06-30.  Returning
> > [2017-06-30,2018-01-01)
> > NOTICE:  Inputs: [2017-06-30,2018-01-01),[2018-01-01,),2018-01-01.
> > Returning [2017-06-30,)
> > NOTICE:  Inputs: <NULL>,[2014-02-01,2015-02-01),2014-02-01.  Returning
> > [2014-02-01,2015-02-01)
> > NOTICE:  Inputs:
> > [2014-02-01,2015-02-01),[2015-02-01,2016-01-01),2015-02-01.  Returning
> > [2014-02-01,2016-01-01)
> > NOTICE:  Inputs:
> > [2014-02-01,2016-01-01),[2017-09-30,2018-02-01),2017-09-30.  Returning
> > [2017-09-30,2018-02-01)
> > NOTICE:  Inputs:
> > [2017-09-30,2018-02-01),[2018-02-01,2018-02-15),2018-02-01.  Returning
> > [2017-09-30,2018-02-15)
> > 
> >  client_id | episode | start_date |  end_date  |    range_continuous
> > -----------+---------+------------+------------+-------------------------
> >          1 |       1 | 1990-01-01 | 1990-12-31 | [2017-06-30,)
> >          1 |       1 | 1991-01-01 | 1991-12-31 | [2017-06-30,)
> >          1 |       2 | 2000-01-01 | 2000-12-31 | [2017-06-30,)
> >          1 |       2 | 2001-01-01 | 2001-12-31 | [2017-06-30,)
> >          1 |       2 | 2002-01-01 | 2002-12-31 | [2017-06-30,)
> >          1 |       2 | 2003-01-01 | 2003-12-31 | [2017-06-30,)
> >          1 |       2 | 2004-01-01 | 2004-12-31 | [2017-06-30,)
> >          1 |       2 | 2005-01-01 | 2005-12-31 | [2017-06-30,)
> >          1 |       2 | 2006-01-01 | 2006-12-31 | [2017-06-30,)
> >          1 |       3 | 2014-01-01 | 2014-12-31 | [2017-06-30,)
> >          1 |       3 | 2015-01-01 | 2015-12-31 | [2017-06-30,)
> >          1 |       4 | 2017-06-30 | 2017-12-31 | [2017-06-30,)
> >          1 |       4 | 2018-01-01 |            | [2017-06-30,)
> >          2 |       1 | 2014-02-01 | 2015-01-31 |
> > [2017-09-30,2018-02-15)
> >          2 |       1 | 2015-02-01 | 2015-12-31 |
> > [2017-09-30,2018-02-15)
> >          2 |       2 | 2017-09-30 | 2018-01-31 |
> > [2017-09-30,2018-02-15)
> >          2 |       2 | 2018-02-01 | 2018-02-14 |
> > [2017-09-30,2018-02-15)
> > 
> > (17 rows)
> > 
> > 
> > -- AGG_DATERANGE
> > 
> >  client_id | episode | start_date |  end_date  |      agg_daterange
> >    
> > -----------+---------+------------+------------+-------------------------
> >          1 |       1 | 1990-01-01 | 1990-12-31 | [2017-06-30,)
> >          1 |       1 | 1991-01-01 | 1991-12-31 | [2017-06-30,)
> >          1 |       2 | 2000-01-01 | 2000-12-31 | [2017-06-30,)
> >          1 |       2 | 2001-01-01 | 2001-12-31 | [2017-06-30,)
> >          1 |       2 | 2002-01-01 | 2002-12-31 | [2017-06-30,)
> >          1 |       2 | 2003-01-01 | 2003-12-31 | [2017-06-30,)
> >          1 |       2 | 2004-01-01 | 2004-12-31 | [2017-06-30,)
> >          1 |       2 | 2005-01-01 | 2005-12-31 | [2017-06-30,)
> >          1 |       2 | 2006-01-01 | 2006-12-31 | [2017-06-30,)
> >          1 |       3 | 2014-01-01 | 2014-12-31 | [2017-06-30,)
> >          1 |       3 | 2015-01-01 | 2015-12-31 | [2017-06-30,)
> >          1 |       4 | 2017-06-30 | 2017-12-31 | [2017-06-30,)
> >          1 |       4 | 2018-01-01 |            | [2017-06-30,)
> >          2 |       1 | 2014-02-01 | 2015-01-31 |
> > [2017-09-30,2018-02-15)
> >          2 |       1 | 2015-02-01 | 2015-12-31 |
> > [2017-09-30,2018-02-15)
> >          2 |       2 | 2017-09-30 | 2018-01-31 |
> > [2017-09-30,2018-02-15)
> >          2 |       2 | 2018-02-01 | 2018-02-14 |
> > [2017-09-30,2018-02-15)
> > (17 rows)
> > 
> > 
> > 
> > 
> > 
> > 
> > -- 
> > 
> > AGENCY Software  
> > A Free Software data system
> > By and for non-profits
> > http://agency-software.org/
> > https://demo.agency-software.org/client
> > 
> > ken.tanzer@agency-software.org
> > (253) 245-3801
> > 
> > 
> > Subscribe to the mailing list to
> > learn more about AGENCY or
> > follow the discussion.
> 
> 
>

Re: Given a set of daterange, finding the continuous range thatincludes a particular date (aggregates)

От

Ken Tanzer

Дата:

28 февраля 2018 г., 03:15:14

Hi. Thanks so much for your assistance. This is definitely getting the results I was looking for. It is still syntacticallly more cumbersome than I might have hoped, but I can work with it. So I've got two follow questions/issues:

1) I can see there are many, more complex, options for aggregates, which I am trying to wrap my mind around. I'm wondering if any of these (esp. partial aggregates/combine functions, final functions or moving aggregates) could be used to streamline this into a single function call, or if that is barking up a dead tree.

2) I'm sure at this point I must be being dense, but after re-reading docs multiple times, I am still confused about one piece of this:

> first window (within lower() func) extends its initial (earliest) range
> to right for every next row continuous with current range (and jumps to
> new start on discontinuity), thus over(order by d ASC)
> second window (within upper() func) extends its initial (latest) range
> to left for every next row continuous with current range (and jumps to
> new end on discontinuity), thus over(order by d DESC)
> partition by client_id within each over() is to not mix client_ids'
> dateranges
>

which is what is it exactly that is triggering Postgresql to know there is a discontinuity and to start with a new range? And is it based on the input or the output values? Based on PARTITION BY client_id ORDER BY d, I might have thought it was d. But that doesn't seem to be right. So is it something about what agg_daterange is returning, and if so what? Again, sorry for being dense about this one.

Thanks!

Ken

AGENCY Software

A Free Software data system

By and for non-profits

http://agency-software.org/

https://demo.agency-software.org/client

ken.tanzer@agency-software.org

(253) 245-3801

Subscribe to the mailing list to

learn more about AGENCY or

follow the discussion.

Re: Given a set of daterange, finding the continuous range thatincludes a particular date (aggregates)

От

mariusz

Дата:

28 февраля 2018 г., 13:49:06

hi,

On Tue, 2018-02-27 at 16:15 -0800, Ken Tanzer wrote:
> Hi.  Thanks so much for your assistance.  This is definitely getting
> the results I was looking for.  It is still syntacticallly more
> cumbersome than I might have hoped, but I can work with it.  So I've
> got two follow questions/issues:
> 
> 
> 1)  I can see there are many, more complex, options for aggregates,
> which I am trying to wrap my mind around.  I'm wondering if any of
> these (esp. partial aggregates/combine functions, final functions or
> moving aggregates) could be used to streamline this into a single
> function call, or if that is barking up a dead tree.
> 
i'm not an expert and i don't use fancy aggregates too often, just got a
simple solution, so i may be wrong, but i don't see simple solution as
one function. so unless someone comes with a better idea, you have to
stay with this, which isn't complex as the idea and solution is really
very simple (simplicity is somewhat visually hidden in that case by
converting dates to ranges and to boundary dates again (and again to
ranges for visualizing result)).
but it requires additional sort for each window as you can see in
explain which is the cost to pay
> 
> 2)  I'm sure at this point I must be being dense, but after re-reading
> docs multiple times, I am still confused about one piece of this:
> 
> 
>         > first window (within lower() func) extends its initial
>         (earliest) range
>         > to right for every next row continuous with current range
>         (and jumps to
>         > new start on discontinuity), thus over(order by d ASC)
>         > second window (within upper() func) extends its initial
>         (latest) range
>         > to left for every next row continuous with current range
>         (and jumps to
>         > new end on discontinuity), thus over(order by d DESC)
>         > partition by client_id within each over() is to not mix
>         client_ids'
>         > dateranges
>         >
>         
>         
> which is what is it exactly that is triggering Postgresql to know
> there is a discontinuity and to start with a new range?  And is it
> based on the input or the output values?  Based on PARTITION BY
> client_id ORDER BY d, I might have thought it was d.  But that doesn't
> seem to be right.  So is it something about what agg_daterange is
> returning, and if so what?  Again, sorry for being dense about this
> one.
> 
posgresql itself doesn't know nor care about those discontinuities, all
it cares about is partition by client_id to not mix client_ids, and
applying our agg function for rows in order of our choice.

here again i remind you, you don't really want to sort by d::daterange
for second descending window, but end_date because desc order of range
will not be the same as desc order of end_date in general case,
and start_date for first window (but as i already said, for that first
ascending window it's cosmetics as order of d::daterange and
start_date::date will be the same.
that doesn't really matter in your case of exclusive ranges but matters
in more general case.

and back to your question, we find discontinuity in our function. see
the query

SELECT CASE WHEN d1 && d2 OR d1 -|- d2 THEN d1 + d2 ELSE d2 END

where d1 is internal state value of aggregate and d2 is current row's
daterange value.
we check for continuity by d1 && d2 OR d1 -|- d2, and if it is
continuous than we just extend result (return value for current row and
new state value of agg) by d1+d2,
otherwise, when we find discontinuity, we forget d1 (agg state value
until now) and simply return d2 being current row's range, thus starting
with new range.

hope that explains enough. as i already said, i'm not an expert, i'm
just coincidentally working currently on my semi-toy project which
utilizes dateranges quite heavily.
anyway, feel free to ask if you have any further questions. for now i'm
glad i could help somehow.

regards,
mariusz jadczak

> 
> Thanks!
> 
> 
> Ken
> 
> 
> 
> -- 
> 
> AGENCY Software  
> A Free Software data system
> By and for non-profits
> http://agency-software.org/
> https://demo.agency-software.org/client
> 
> ken.tanzer@agency-software.org
> (253) 245-3801
> 
> 
> Subscribe to the mailing list to
> learn more about AGENCY or
> follow the discussion.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Re: Given a set of daterange, finding the continuous range thatincludes a particular date (aggregates)

Re: Given a set of daterange, finding the continuous range thatincludes a particular date (aggregates)

Re: Given a set of daterange, finding the continuous range thatincludes a particular date (aggregates)

Re: Given a set of daterange, finding the continuous range thatincludes a particular date (aggregates)

Re: Given a set of daterange, finding the continuous range thatincludes a particular date (aggregates)

Re: Given a set of daterange, finding the continuous range thatincludes a particular date (aggregates)

Re: Given a set of daterange, finding the continuous range thatincludes a particular date (aggregates)