Re: Interpolatioin problem - pg 12.4

Поиск
Список
Период
Сортировка
От Pól Ua Laoínecháin
Тема Re: Interpolatioin problem - pg 12.4
Дата
Msg-id CAF4RT5TBAUhCcxLEGHPO8Nzmk4v9oP363nOgwJEy+zOpBbQJPA@mail.gmail.com
обсуждение исходный текст
Ответ на Interpolatioin problem - pg 12.4  (Pól Ua Laoínecháin <linehanp@tcd.ie>)
Список pgsql-general
HI  all,

Sorry - posted the wrong URL for my problem - Doh...

Here is the correct one!

https://dbfiddle.uk/?rdbms=postgres_12&fiddle=428aa76d49b37961088d3dfef190757f

Again, apologies and  rgs,


Pól...

On Wed, 19 Aug 2020 at 19:16, Pól Ua Laoínecháin <linehanp@tcd.ie> wrote:
>
> Hi all,
>
> I have an interpolation problem as follows - fiddle available here:
>
> https://dbfiddle.uk/?rdbms=postgres_12&fiddle=8d23925146ea11a904c454709b0026fd
>
> A table:
>
> CREATE TABLE data
> (
>   s SERIAL PRIMARY KEY,
>   t TIMESTAMP,
>   lat NUMERIC
> );
>
> and data:
>
>
> INSERT INTO data (t, lat)
> VALUES
> ('2019-01-01 00:00:00', 5.07),
> ('2019-01-01 01:00:00', 4.60),
> ('2019-01-01 02:00:00', NULL),
> ('2019-01-01 03:00:00', NULL),
> ('2019-01-01 04:00:00', 4.7),
> ('2019-01-01 05:00:00', 4.20),
> ('2019-01-01 06:00:00', NULL),
> ('2019-01-01 07:00:00', 4.98),
> ('2019-01-01 08:00:00', 4.50),
> ('2019-01-01 09:00:00', 4.7),
> ('2019-01-01 10:00:00', NULL),
> ('2019-01-01 11:00:00', NULL),
> ('2019-01-01 12:00:00', NULL),
> ('2019-01-01 13:00:00', 6.45),
> ('2019-01-01 14:00:00', 3.50);
>
>
> There are gaps in the data as you can see - I'm trying to fill them
> using the algorithm:
>
>  - a sequence of 1 NULL - take the average of the reading above and
> the reading below
>
> - a sequence of 2 NULLs - the top assigned value is the average of the
> two records above it and the bottom assigned one is the average of the
> two records below.
>
> So far, so good - I'm able to do this (but see discussion below)
>
>  - a sequence of 3 NULLs - the middle one is assigned a value equal to
> average of the non-NULL record above and the non-null record below,
> and then the remaining NULLs above and below the average of the middle
> one and the non-NULL ones above and below.
>
> This is where it gets tricky - I'm getting answers, but I don't think
> they're correct. The result of the massive  SQL shown below are here
> (also on fiddle):
>
> s    lat     final_val
> 1    5.07         5.07
> 2    4.60         4.60
> 3    NULL         4.84
> 4    NULL         4.45
> 5     4.7          4.7
> 6    4.20         4.20
> 7    NULL         4.59
> 8    4.98         4.98
> 9    4.50         4.50
> 10    4.7          4.7
> 11   NULL         4.60
> 12   NULL         5.58
> 13   NULL         4.98
> 14   6.45         6.45
> 15   3.50         3.50
>
> The value for record 12 is correct, ,but not those above and below it.
>
> I think my *MAJOR* problem is that I've developed what is,
> essentially, a totally brute force approach - and this simply won't
> work at the scenario becomes more complex - take a look at the CASE
> statement - it's horrible and would only become exponentially worse as
> the number NULLs rises.
>
> So, my question is: Is there a recognised technique (using SQL only,
> not PL/pgSQL - soutions based on the latter are easy to find) whereby
> I can do a basic Linear Interpolation?
>
> Should you require any further information, please don't hesitate to contact me.
>
> TIA and rgs,
>
>
> Pól...
>
>
> =========================================================
>
> My mega SQL:
>
> WITH cte1 AS
> (
>   SELECT d1.s,
>     d1.t AS t1, d1.lat AS l1,
>     LAG(d1.lat, 2)  OVER (ORDER BY t ASC) AS lag_t1_2,
>     LAG(d1.lat, 1)  OVER (ORDER BY t ASC) AS lag_t1,
>     LEAD(d1.lat, 1) OVER (ORDER BY t ASC) AS lead_t1,
>     LEAD(d1.lat, 2) OVER (ORDER BY t ASC) AS lead_t1_2
>   FROM data d1
> ),
> cte2 AS
> (
>   SELECT
>     d2.t AS t2, d2.lat AS l2,
>     LAG(d2.lat, 1) OVER(ORDER BY t DESC) AS lag_t2,
>     LEAD(d2.lat, 1) OVER(ORDER BY t DESC) AS lead_t2
>   FROM data d2
> ),
> cte3 AS
> (
>   SELECT t1.s,
>     t1.t1,  t1.lag_t1_2, t1.lag_t1, t2.lag_t2, t1.l1, t2.l2,
>     t1.lead_t1, t2.lead_t2, t1.lead_t1_2
>   FROM cte1 t1
>   JOIN cte2 t2
>   ON t1.t1 = t2.t2
> ),
> cte4 AS
> (
>   SELECT t1.s,
>   t1.l1 AS lat,
>     CASE
>
>       -- The WHEN for the middle of 3 NULLs has to be at the beginning
>       -- of the CASE - if at the end, it remains NULL - why?
>
>       WHEN (t1.lag_t1 IS NULL) AND (t1.lag_t2 IS NULL) AND (t1.l1 IS NULL)
>         AND (t1.lead_t1 IS NULL) AND (t1.lead_t2 IS NULL)
>       THEN ROUND((t1.lag_t1_2 + t1.lead_t1_2)/2, 2)
>
>       WHEN (t1.l1 IS NOT NULL) THEN t1.l1
>       WHEN (t1.l1 IS NULL) AND (t1.l2) IS NULL AND (t1.lag_t1 IS NOT NULL)
>         AND (t1.lag_t2 IS NOT NULL) THEN ROUND((t1.lag_t1 + t1.lag_t2)/2, 2)
>       WHEN (t1.lag_t2 IS NULL) AND (t1.l1 IS NULL) AND (t1.l2 IS NULL)
>         AND (t1.lead_t1 IS NULL) THEN ROUND((t1.lag_t1 + t1.lag_t1_2)/2, 2)
>       WHEN (t1.l1 IS NULL) AND (t1.l2 IS NULL)  AND (t1.lag_t1 IS NULL)
>         AND (t1.lead_t2 IS NULL) THEN ROUND((t1.lead_t1 + t1.lead_t1_2)/2, 2)
>       ELSE 0
>     END AS final_val
>   FROM cte3 t1
> )
> SELECT s, lat, final_val FROM cte4;



В списке pgsql-general по дате отправления:

Предыдущее
От: Pól Ua Laoínecháin
Дата:
Сообщение: Interpolatioin problem - pg 12.4
Следующее
От: Pól Ua Laoínecháin
Дата:
Сообщение: Interpolation problem - pg 12.4 - full correct version!