Обсуждение: Window function for get the last value to extend missing rows
Dear Members!
The test data:
I have a table with temperature measures.
The data is coming from the PLC, but sometimes the period is "slipping", so the values are not correctly minute based.
03:00 10
03:02 12
03:03 11
03:02 12
03:03 11
03:05 13
I have to make a virtual table which is minute based.
I have to make a virtual table which is minute based.
I thought I would make a generated temp table (generate_series) and then join these values based on minue.
03:00 10
03:01 NULL
03:02 12
03:03 11
03:02 12
03:03 11
03:04 NULL
03:05 13
I need a code to replace the value to the last value on NULL.
03:00 10
03:01 10 <
03:02 12
03:03 11
03:02 12
03:03 11
03:04 11 <
03:05 13
The LAG function seems to be ok, but how to handle if more than two periods are missing?
03:00 10
03:01 NULL
03:02 NULL
03:03 NULL
03:02 NULL
03:03 NULL
03:04 11
03:05 13
select *, coalesce(value, prev_value) as value from (
select mmin, value,
lag(value, 1) over (order by mmin) as prev_value
from test_table) t
or
select mmin, value,
lag(value, 1) over (order by mmin) as prev_value
from test_table) t
or
select *, coalesce(value, prev_value) as value from (
select mmin, value,
coalesce(lag(value, 1) over (order by mmin),
lag(value, 2) over (order by mmin)) as prev_value
from tmp_test_table) t
coalesce(lag(value, 1) over (order by mmin),
lag(value, 2) over (order by mmin)) as prev_value
from tmp_test_table) t
The test data:
create table tmp_test_table (mmin int, value int);
insert into tmp_test_table values
(1, 1),
(2, 1),
(3, NULL),
(4, NULL),
(5, 2),
(6, NULL),
(7, NULL),
(10, 10),
(11, NULL),
(12, NULL),
(13, NULL),
(14, NULL);
The result is:
So you can see, the last values are NULL because the LAG can't use the last calculated value.
1 1 NULL 1 2 1 1 1 3 NULL 1 1 4 NULL 1 1 5 2 NULL 2 6 NULL 2 2 7 NULL 2 2 10 10 NULL 10 11 NULL 10 10 12 NULL 10 10 13 NULL NULL NULL 14 NULL NULL NULL
Do you have any idea how to get the last value, doesn't matter how many NULL-s are in the set?
(15, NULLx20, 10) => 15x21, 10
Thanks
Best regards
dd
On Fri, 12 May 2023 at 13:04, Durumdara <durumdara@gmail.com> wrote:
Dear Members!I have a table with temperature measures.The data is coming from the PLC, but sometimes the period is "slipping", so the values are not correctly minute based.03:00 10
03:02 12
03:03 1103:05 13
I have to make a virtual table which is minute based.I thought I would make a generated temp table (generate_series) and then join these values based on minue.03:00 1003:01 NULL
03:02 12
03:03 1103:04 NULL03:05 13I need a code to replace the value to the last value on NULL.03:00 1003:01 10 <
03:02 12
03:03 1103:04 11 <03:05 13
Unfortunately, as per https://www.postgresql.org/docs/current/functions-window.html
"The SQL standard defines a
So, I'd keep the incoming data as is, i.e. with no nulls in values, and densify it with some generate_series magic:
RESPECT NULLS
or IGNORE NULLS
option for lead
, lag
, first_value
, last_value
, and nth_value
. This is not implemented in PostgreSQL: the behavior is always the same as the standard's default, namely RESPECT NULLS
".So, I'd keep the incoming data as is, i.e. with no nulls in values, and densify it with some generate_series magic:
select gmin as mmin, d.value
from
( select mmin, lead(mmin) over (order by mmin) nextmin, value from test_table ) d,
generate_series(d.mmin, nextmin - interval'1 minute') gmin
Best,
g
On Fri, 12 May 2023, GF wrote: >"The SQL standard defines a RESPECT NULLS or IGNORE NULLS option for lead, >lag, first_value, last_value, and nth_value. This is not implemented in >PostgreSQL: the behavior is always the same as the standard's default, >namely RESPECT NULLS". Yeah, THAT caused no small amount of cursing, earlier this year, I’d have also used IGNORE NULLS somewhere… bye, //mirabilos -- 15:41⎜<Lo-lan-do:#fusionforge> Somebody write a testsuite for helloworld :-)
>>>>> "Durumdara" == Durumdara <durumdara@gmail.com> writes: Durumdara> I have to make a virtual table which is minute based. Durumdara> I thought I would make a generated temp table Durumdara> (generate_series) and then join these values based on minue. Durumdara> 03:00 10 Durumdara> 03:01 NULL Durumdara> 03:02 12 Durumdara> 03:03 11 Durumdara> 03:04 NULL Durumdara> 03:05 13 Durumdara> I need a code to replace the value to the last value on Durumdara> NULL. It is possible to do this, even without the IGNORE NULLS option for window functions, but it's somewhat awkward. Someone else already mentioned using generate_series in lateral position to fill in values; in most cases this is probably the best approach, though it requires a little care. There is also a window function approach based on using the non-null values to delimit partitions: create table tmp_test_table(mmin,val) as select o, v from unnest(array[1,5,NULL,3,NULL,NULL,10,7,NULL,NULL,NULL,4]) with ordinality as u(v,o); select * from tmp_test_table order by mmin; mmin | val ------+----- 1 | 1 2 | 5 3 | 4 | 3 5 | 6 | 7 | 10 8 | 7 9 | 10 | 11 | 12 | 4 (12 rows) First we take advantage of the fact that mmin is increasing to generate a distinguishing value for each block of nulls: select *, max(case when val is not null then mmin end) over (order by mmin) as grp from tmp_test_table order by mmin; mmin | val | grp ------+-----+----- 1 | 1 | 1 2 | 5 | 2 3 | | 2 4 | 3 | 4 5 | | 4 6 | | 4 7 | 10 | 7 8 | 7 | 8 9 | | 8 10 | | 8 11 | | 8 12 | 4 | 12 (12 rows) Then we can fill in the missing vals by using the fact that there is at most one non-null val in each group: select *, max(val) over (partition by grp) as val2 from (select *, max(case when val is not null then mmin end) over (order by mmin) as grp from tmp_test_table) s order by mmin; mmin | val | grp | val2 ------+-----+-----+------ 1 | 1 | 1 | 1 2 | 5 | 2 | 5 3 | | 2 | 5 4 | 3 | 4 | 3 5 | | 4 | 3 6 | | 4 | 3 7 | 10 | 7 | 10 8 | 7 | 8 | 7 9 | | 8 | 7 10 | | 8 | 7 11 | | 8 | 7 12 | 4 | 12 | 4 (12 rows) This _really_ isn't efficient, though; you end up with typically three sorts of the data. For a one-off operation or for generating a materialized view it might be acceptable. -- Andrew (irc:RhodiumToad)
On Sat, May 13, 2023 at 2:18 AM Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "Durumdara" == Durumdara <durumdara@gmail.com> writes:
create table tmp_test_table(mmin,val)
as select o, v
from unnest(array[1,5,NULL,3,NULL,NULL,10,7,NULL,NULL,NULL,4])
with ordinality as u(v,o);
select * from tmp_test_table order by mmin;
That seems like a lot of work.
If you have ALL the values (no missing values) a simple CTE handles this:
SELECT mmin, value from tmp_test_table where mmin=1
UNION ALL
SELECT t.mmin, COALESCE(t.value,r.value)
FROM tmp_test_table t, rec_cte r WHERE r.mmin=(t.mmin-1)
)
SELECT * from rec_cte order by mmin;
Am 12.05.23 um 13:04 schrieb Durumdara: > [...] > > The LAG function seems to be ok, but how to handle if more than two periods > are missing? > > 03:00 10 > 03:01 NULL > 03:02 NULL > 03:03 NULL > 03:04 11 > 03:05 13 > > [...] and how do you think about NULL in first(and second/third) row?