Обсуждение: reduce many loosely related rows down to one

Поиск
Список
Период
Сортировка

reduce many loosely related rows down to one

От
Bill MacArthur
Дата:
Here is a boiled down example of a scenario which I am having a bit of difficulty solving.
This is a catchall table where all the rows are related to the "id" but are entered by different unrelated processes
thatdo not necessarily have access to the other data bits.
 

CREATE TABLE test (id INTEGER,rspid INTEGER,nspid INTEGER,cid INTEGER,iac BOOLEAN,newp SMALLINT,oldp SMALLINT,ppv
NUMERIC(7,2),tppvNUMERIC(7,2)
 
);

INSERT INTO test (id, rspid, nspid, cid, iac) VALUES (1,2,3,4,TRUE);
INSERT INTO test (id, rspid, nspid, newp) VALUES (1,2,3,100);
INSERT INTO test (id, rspid, nspid, oldp) VALUES (1,2,3,200);
INSERT INTO test (id, rspid, nspid, tppv) VALUES (1,2,3,4100);
INSERT INTO test (id, rspid, nspid, tppv) VALUES (1,2,3,3100);
INSERT INTO test (id, rspid, nspid, ppv) VALUES (1,2,3,-100);
INSERT INTO test (id, rspid, nspid, ppv) VALUES (1,2,3,250);
INSERT INTO test (id, rspid, nspid, cid) VALUES (2,7,8,4);

-- raw data now looks like this:

select * from test;
 id | rspid | nspid | cid | iac | newp | oldp |   ppv   |  tppv
----+-------+-------+-----+-----+------+------+---------+---------  1 |     2 |     3 |   4 | t   |      |      |
 |  1 |     2 |     3 |     |     |  100 |      |         |  1 |     2 |     3 |     |     |      |  200 |         |  1
|    2 |     3 |     |     |      |      |         | 4100.00  1 |     2 |     3 |     |     |      |      |         |
3100.00 1 |     2 |     3 |     |     |      |      | -100.00 |  1 |     2 |     3 |     |     |      |      |  250.00
| 2 |     7 |     8 |   4 |     |      |      |         |
 
(8 rows)

-- I want this result (where ppv and tppv are summed and the other distinct values are boiled down into one row)
-- I want to avoid writing explicit UNIONs that will break if, say the "cid" was entered as a discreet row from the row
containing"iac"
 
-- in this example "rspid" and "nspid" are always the same for a given ID, however they could possibly be absent for a
givenrow as well
 
 id | rspid | nspid | cid | iac | newp | oldp |   ppv   |  tppv
----+-------+-------+-----+-----+------+------+---------+---------  1 |    2  |     3 |  4  | t   | 100  | 200  |
150.00 | 7200.00  2 |    7  |     8 |  4  |     |      |      |    0.00  |    0.00
 


I have experimented with doing the aggregates as a CTE and then joining that to various incarnations of DISTINCT and
DISTINCTON, but those do not do what I want. Trying to find the right combination of terms to get an answer from Google
hasbeen unfruitful.
 

Any ideas?

Thank you for your consideration.
Bill MacArthur



Re: reduce many loosely related rows down to one

От
Marc Mamin
Дата:
> ________________________________________
> Von: pgsql-sql-owner@postgresql.org [pgsql-sql-owner@postgresql.org]" im Auftrag von "Bill MacArthur
[webmaster@dhs-club.com]
> Gesendet: Samstag, 25. Mai 2013 09:19
> An: pgsql-sql@postgresql.org
> Betreff: [SQL] reduce many loosely related rows down to one
>
> Here is a boiled down example of a scenario which I am having a bit of difficulty solving.
> This is a catchall table where all the rows are related to the "id" but are entered by different unrelated processes
thatdo not necessarily have access to the other data bits. 
>
....

> -- raw data now looks like this:
>
> select * from test;
>
>   id | rspid | nspid | cid | iac | newp | oldp |   ppv   |  tppv
> ----+-------+-------+-----+-----+------+------+---------+---------
>    1 |     2 |     3 |   4 | t   |      |      |         |
>    1 |     2 |     3 |     |     |  100 |      |         |
>    1 |     2 |     3 |     |     |      |  200 |         |
>    1 |     2 |     3 |     |     |      |      |         | 4100.00
>    1 |     2 |     3 |     |     |      |      |         | 3100.00
>    1 |     2 |     3 |     |     |      |      | -100.00 |
>    1 |     2 |     3 |     |     |      |      |  250.00 |
>    2 |     7 |     8 |   4 |     |      |      |         |
> (8 rows)
>
> -- I want this result (where ppv and tppv are summed and the other distinct values are boiled down into one row)
> -- I want to avoid writing explicit UNIONs that will break if, say the "cid" was entered as a discreet row from the
rowcontaining "iac" 
> -- in this example "rspid" and "nspid" are always the same for a given ID, however they could possibly be absent for
agiven row as well 
>
>   id | rspid | nspid | cid | iac | newp | oldp |   ppv   |  tppv
> ----+-------+-------+-----+-----+------+------+---------+---------
>    1 |    2  |     3 |  4  | t   | 100  | 200  |  150.00  | 7200.00
>    2 |    7  |     8 |  4  |     |      |      |    0.00  |    0.00
>
>
> I have experimented with doing the aggregates as a CTE and then joining that to various incarnations of DISTINCT and
DISTINCTON, but those do not do what I want. Trying to find the right combination of terms to get an answer from Google
hasbeen unfruitful. 


Hello,
If I understand you well, you want to perform a group by whereas null values are coalesced to existing not null values.
this seems to be logically not feasible.
What should look the result like if your "raw" data are as following:
 id | rspid | nspid | cid | iac | newp | oldp |   ppv   |  tppv
----+-------+-------+-----+-----+------+------+---------+---------  1 |     2 |     3 |   4 | t   |      |      |
 |  1 |     2 |     3 |   5 | t   |      |      |         |  1 |     2 |     3 |     |     |  100 |      |         | 

(to which cid should newp be summed to?)

regards,

Marc Mmain


Re: reduce many loosely related rows down to one

От
Bill MacArthur
Дата:
On 5/25/2013 7:57 AM, Marc Mamin wrote:
>
>> ________________________________________
>> Von: pgsql-sql-owner@postgresql.org [pgsql-sql-owner@postgresql.org]" im Auftrag von "Bill MacArthur
[webmaster@dhs-club.com]
>> Gesendet: Samstag, 25. Mai 2013 09:19
>> An: pgsql-sql@postgresql.org
>> Betreff: [SQL] reduce many loosely related rows down to one
>>
>> Here is a boiled down example of a scenario which I am having a bit of difficulty solving.
>> This is a catchall table where all the rows are related to the "id" but are entered by different unrelated processes
thatdo not necessarily have access to the other data bits.
 
>>
> ....
>
>> -- raw data now looks like this:
>>
>> select * from test;
>>
>>    id | rspid | nspid | cid | iac | newp | oldp |   ppv   |  tppv
>> ----+-------+-------+-----+-----+------+------+---------+---------
>>     1 |     2 |     3 |   4 | t   |      |      |         |
>>     1 |     2 |     3 |     |     |  100 |      |         |
>>     1 |     2 |     3 |     |     |      |  200 |         |
>>     1 |     2 |     3 |     |     |      |      |         | 4100.00
>>     1 |     2 |     3 |     |     |      |      |         | 3100.00
>>     1 |     2 |     3 |     |     |      |      | -100.00 |
>>     1 |     2 |     3 |     |     |      |      |  250.00 |
>>     2 |     7 |     8 |   4 |     |      |      |         |
>> (8 rows)
>>
>> -- I want this result (where ppv and tppv are summed and the other distinct values are boiled down into one row)
>> -- I want to avoid writing explicit UNIONs that will break if, say the "cid" was entered as a discreet row from the
rowcontaining "iac"
 
>> -- in this example "rspid" and "nspid" are always the same for a given ID, however they could possibly be absent for
agiven row as well
 
>>
>>    id | rspid | nspid | cid | iac | newp | oldp |   ppv   |  tppv
>> ----+-------+-------+-----+-----+------+------+---------+---------
>>     1 |    2  |     3 |  4  | t   | 100  | 200  |  150.00  | 7200.00
>>     2 |    7  |     8 |  4  |     |      |      |    0.00  |    0.00
>>
>>
>> I have experimented with doing the aggregates as a CTE and then joining that to various incarnations of DISTINCT and
DISTINCTON, but those do not do what I want. Trying to find the right combination of terms to get an answer from Google
hasbeen unfruitful.
 
>
>
> Hello,
> If I understand you well, you want to perform a group by whereas null values are coalesced to existing not null
values.
> this seems to be logically not feasible.
> What should look the result like if your "raw" data are as following:
>
>    id | rspid | nspid | cid | iac | newp | oldp |   ppv   |  tppv
> ----+-------+-------+-----+-----+------+------+---------+---------
>     1 |     2 |     3 |   4 | t   |      |      |         |
>     1 |     2 |     3 |   5 | t   |      |      |         |
>     1 |     2 |     3 |     |     |  100 |      |         |
>
> (to which cid should newp be summed to?)
>
> regards,
>
> Marc Mmain
>
Ya, there is more to the picture than I described. Didn't want to bore with excessive detail. I was hoping that perhaps
somebodywould see the example and say "oh ya that can be solved with this obscure SQL implementation" :)
 
I have resigned myself to using a few more CTEs with DISTINCTs and joining it all up to get the results I want. Thanks
forthe look anyway Marc. Your description of what I wanted was more accurate and concise than I had words for at the
timeof the night I originally posted this.
 

Have a good one.

Bill MacArthur



Re: reduce many loosely related rows down to one

От
"Torsten Grust"
Дата:
On 25 May 2013, at 9:19, Bill MacArthur wrote (with possible deletions):
> [...]
> select * from test;
>
> id | rspid | nspid | cid | iac | newp | oldp |   ppv   |  tppv
> ----+-------+-------+-----+-----+------+------+---------+---------
> 1 |     2 |     3 |   4 | t   |      |      |         |
> 1 |     2 |     3 |     |     |  100 |      |         |
> 1 |     2 |     3 |     |     |      |  200 |         |
> 1 |     2 |     3 |     |     |      |      |         | 4100.00
> 1 |     2 |     3 |     |     |      |      |         | 3100.00
> 1 |     2 |     3 |     |     |      |      | -100.00 |
> 1 |     2 |     3 |     |     |      |      |  250.00 |
> 2 |     7 |     8 |   4 |     |      |      |         |
> (8 rows)
>
> -- I want this result (where ppv and tppv are summed and the other 
> distinct values are boiled down into one row)
> -- I want to avoid writing explicit UNIONs that will break if, say the 
> "cid" was entered as a discreet row from the row containing "iac"
> -- in this example "rspid" and "nspid" are always the same for a given 
> ID, however they could possibly be absent for a given row as well
>
> id | rspid | nspid | cid | iac | newp | oldp |   ppv   |  tppv
> ----+-------+-------+-----+-----+------+------+---------+---------
> 1 |    2  |     3 |  4  | t   | 100  | 200  |  150.00  | 7200.00
> 2 |    7  |     8 |  4  |     |      |      |    0.00  |    0.00

One possible option could be

SELECT id,       (array_agg(rspid))[1] AS rspid,        -- (1)       (array_agg(nspid))[1] AS nspid,
(array_agg(cid))[1]  AS cid,       bool_or(iac)          AS iac,          -- (2)       max(newp)             AS newp,
     -- (3)       min(oldp)             AS oldp,         -- (4)       coalesce(sum(ppv), 0) AS ppv,
coalesce(sum(tppv),0)AS tppv
 
FROM test
GROUP BY id;


This query computes the desired output for your example input.

There's a caveat here: your description of the problem has been
somewhat vague and it remains unclear how the query should
respond if the functional dependency id -> rspid
does not hold.  In this case, the array_agg(rspid)[1] in the line
marked (1) will pick one among many different(!) rspid values.
I don't know your scenario well enough to judge whether this would be
an acceptable behavior.  Other possible behaviors have been
implemented in the lines (2), (3), (4) where different aggregation
functions are used to reduce sets to a single value (e.g., pick the
largest/smallest of many values ...).

Cheers,  --Torsten


-- 
| Torsten "Teggy" Grust
| Torsten.Grust@gmail.com



Re: reduce many loosely related rows down to one

От
Bill MacArthur
Дата:
On 5/28/2013 11:04 AM, Torsten Grust wrote:
> On 25 May 2013, at 9:19, Bill MacArthur wrote (with possible deletions):
>> [...]
>> select * from test;
>>
>> id | rspid | nspid | cid | iac | newp | oldp |   ppv   |  tppv
>> ----+-------+-------+-----+-----+------+------+---------+---------
>> 1 |     2 |     3 |   4 | t   |      |      |         |
>> 1 |     2 |     3 |     |     |  100 |      |         |
>> 1 |     2 |     3 |     |     |      |  200 |         |
>> 1 |     2 |     3 |     |     |      |      |         | 4100.00
>> 1 |     2 |     3 |     |     |      |      |         | 3100.00
>> 1 |     2 |     3 |     |     |      |      | -100.00 |
>> 1 |     2 |     3 |     |     |      |      |  250.00 |
>> 2 |     7 |     8 |   4 |     |      |      |         |
>> (8 rows)
>>
>> -- I want this result (where ppv and tppv are summed and the other distinct values are boiled down into one row)
>> -- I want to avoid writing explicit UNIONs that will break if, say the "cid" was entered as a discreet row from the
rowcontaining "iac"
 
>> -- in this example "rspid" and "nspid" are always the same for a given ID, however they could possibly be absent for
agiven row as well
 
>>
>> id | rspid | nspid | cid | iac | newp | oldp |   ppv   |  tppv
>> ----+-------+-------+-----+-----+------+------+---------+---------
>> 1 |    2  |     3 |  4  | t   | 100  | 200  |  150.00  | 7200.00
>> 2 |    7  |     8 |  4  |     |      |      |    0.00  |    0.00
>
> One possible option could be
>
> SELECT id,
>         (array_agg(rspid))[1] AS rspid,        -- (1)
>         (array_agg(nspid))[1] AS nspid,
>         (array_agg(cid))[1]   AS cid,
>         bool_or(iac)          AS iac,          -- (2)
>         max(newp)             AS newp,         -- (3)
>         min(oldp)             AS oldp,         -- (4)
>         coalesce(sum(ppv), 0) AS ppv,
>         coalesce(sum(tppv),0) AS tppv
> FROM test
> GROUP BY id;
>
>
> This query computes the desired output for your example input.
>
> There's a caveat here: your description of the problem has been
> somewhat vague and it remains unclear how the query should
> respond if the functional dependency id -> rspid
> does not hold.  In this case, the array_agg(rspid)[1] in the line
> marked (1) will pick one among many different(!) rspid values.
> I don't know your scenario well enough to judge whether this would be
> an acceptable behavior.  Other possible behaviors have been
> implemented in the lines (2), (3), (4) where different aggregation
> functions are used to reduce sets to a single value (e.g., pick the
> largest/smallest of many values ...).
>
> Cheers,
>    --Torsten
>
Slick! Interesting usage scenarios for those aggregate functions array_agg and bool_or, one new to me and the other
rarelyused, and even for min and max which I never thought of using in this sense.
 

I tried not be be overbearing with descriptive details hoping that somebody would look at the simplistic case and offer
whatmight be considered an obscure way of implementing some of Postgres's handy features for an unusual problem. With a
littletweaking for the exact nature of the environment, I am good to go.
 

Thank you, Torsten!
Bill MacArthur



Re: reduce many loosely related rows down to one

От
Marc Mamin
Дата:
> SELECT id,
>         (array_agg(rspid))[1] AS rspid,        -- (1)


for such cases, I have created an new aggregate function:

SELECT firstnotnull(rspid) AS rspid,

this avoid to collect first all rspid values to then keep only the first one...

 CREATE OR REPLACE FUNCTION public.first_agg_nn ( anyelement, anyelement ) RETURNS anyelement AS $$       SELECT $1; $$
LANGUAGESQL IMMUTABLE STRICT COST 1; 
 CREATE AGGREGATE public.firstnotnull (           sfunc    = public.first_agg_nn,           basetype = anyelement,
    stype    = anyelement ); 


regards,

Marc Mamin

> -----Original Message-----
> From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-
> owner@postgresql.org] On Behalf Of Torsten Grust
> Sent: Dienstag, 28. Mai 2013 17:08
> To: pgsql-sql@postgresql.org
> Subject: Re: [SQL] reduce many loosely related rows down to one
>
> On 25 May 2013, at 9:19, Bill MacArthur wrote (with possible
> deletions):
> > [...]
> > select * from test;
> >
> > id | rspid | nspid | cid | iac | newp | oldp |   ppv   |  tppv
> > ----+-------+-------+-----+-----+------+------+---------+---------
> > 1 |     2 |     3 |   4 | t   |      |      |         |
> > 1 |     2 |     3 |     |     |  100 |      |         |
> > 1 |     2 |     3 |     |     |      |  200 |         |
> > 1 |     2 |     3 |     |     |      |      |         | 4100.00
> > 1 |     2 |     3 |     |     |      |      |         | 3100.00
> > 1 |     2 |     3 |     |     |      |      | -100.00 |
> > 1 |     2 |     3 |     |     |      |      |  250.00 |
> > 2 |     7 |     8 |   4 |     |      |      |         |
> > (8 rows)
> >
> > -- I want this result (where ppv and tppv are summed and the other
> > distinct values are boiled down into one row)
> > -- I want to avoid writing explicit UNIONs that will break if, say
> the
> > "cid" was entered as a discreet row from the row containing "iac"
> > -- in this example "rspid" and "nspid" are always the same for a
> given
> > ID, however they could possibly be absent for a given row as well
> >
> > id | rspid | nspid | cid | iac | newp | oldp |   ppv   |  tppv
> > ----+-------+-------+-----+-----+------+------+---------+---------
> > 1 |    2  |     3 |  4  | t   | 100  | 200  |  150.00  | 7200.00
> > 2 |    7  |     8 |  4  |     |      |      |    0.00  |    0.00
>
> One possible option could be
>
> SELECT id,
>         (array_agg(rspid))[1] AS rspid,        -- (1)
>         (array_agg(nspid))[1] AS nspid,
>         (array_agg(cid))[1]   AS cid,
>         bool_or(iac)          AS iac,          -- (2)
>         max(newp)             AS newp,         -- (3)
>         min(oldp)             AS oldp,         -- (4)
>         coalesce(sum(ppv), 0) AS ppv,
>         coalesce(sum(tppv),0) AS tppv
> FROM test
> GROUP BY id;
>
>
> This query computes the desired output for your example input.
>
> There's a caveat here: your description of the problem has been
> somewhat vague and it remains unclear how the query should respond if
> the functional dependency id -> rspid does not hold.  In this case, the
> array_agg(rspid)[1] in the line marked (1) will pick one among many
> different(!) rspid values.
> I don't know your scenario well enough to judge whether this would be
> an acceptable behavior.  Other possible behaviors have been implemented
> in the lines (2), (3), (4) where different aggregation functions are
> used to reduce sets to a single value (e.g., pick the largest/smallest
> of many values ...).
>
> Cheers,
>    --Torsten
>
>
> --
> | Torsten "Teggy" Grust
> | Torsten.Grust@gmail.com
>
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) To make
> changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql