Обсуждение: Pairwise array sum aggregate function?

Поиск
Список
Период
Сортировка

Pairwise array sum aggregate function?

От
François Beausoleil
Дата:
Hi all,

NOTE: Also posted to StackOverflow: http://stackoverflow.com/questions/24997131/pairwise-array-sum-aggregate-function

I have a table with arrays as one column, and I want to sum the array elements together:

> create table regres(a int[] not null);
> insert into regres values ('{1,2,3}'), ('{9, 12, 13}');
> select * from regres;
     a
-----------
 {1,2,3}
 {9,12,13}

I want the result to be:

{10, 14, 16}

that is: {1 + 9, 2 + 12, 3 + 13}.

Does such a function already exist somewhere? The intagg extension looked like a good candidate, but such a function
doesnot already exist. 

The arrays are expected to be between 24 and 31 elements in length, all elements are NOT NULL, and the arrays
themselveswill also always be NOT NULL. All elements are basic int. There will be more than two rows per aggregate. 

My implementation target is:

 PostgreSQL 9.1.13 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

Thanks!
François



Re: Pairwise array sum aggregate function?

От
David G Johnston
Дата:
François Beausoleil wrote
> Hi all,
>
> NOTE: Also posted to StackOverflow:
> http://stackoverflow.com/questions/24997131/pairwise-array-sum-aggregate-function
>
> I have a table with arrays as one column, and I want to sum the array
> elements together:
>
>> create table regres(a int[] not null);
>> insert into regres values ('{1,2,3}'), ('{9, 12, 13}');
>> select * from regres;
>      a
> -----------
>  {1,2,3}
>  {9,12,13}
>
> I want the result to be:
>
> {10, 14, 16}
>
> that is: {1 + 9, 2 + 12, 3 + 13}.
>
> Does such a function already exist somewhere? The intagg extension looked
> like a good candidate, but such a function does not already exist.
>
> The arrays are expected to be between 24 and 31 elements in length, all
> elements are NOT NULL, and the arrays themselves will also always be NOT
> NULL. All elements are basic int. There will be more than two rows per
> aggregate.
>
> My implementation target is:
>
>  PostgreSQL 9.1.13 on x86_64-unknown-linux-gnu, compiled by gcc
> (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

This should be doable with pl/pgsql, CREATE FUNCTION, and CREATE AGGREGATE

You basically want to write a "sum" function that accepts an array input.
You will need to explode both the stored state and the incoming data, add
them together, then reconstruct a new array to put back into the state.  The
final function would just return the current contents of state.

With a different language you may be able to optimize by using a native
array capabilities of the language but with pl/pgsql I think you will have
to do the explode/rebuild.

Another implementation possibility is to keep track of the number of input
rows but simply concatenate the new data onto the end of the old data.  In
the final function you would break apart (unnest) the array into a single
long set and then use division/modulo to identify which items belong
together and add them.  You then convert the result back into an array and
return.  This should save considerable array exploding time.

David J.



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/Pairwise-array-sum-aggregate-function-tp5813036p5813044.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Pairwise array sum aggregate function?

От
Joe Conway
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/28/2014 07:09 AM, François Beausoleil wrote:
> Hi all,
>
> NOTE: Also posted to StackOverflow:
> http://stackoverflow.com/questions/24997131/pairwise-array-sum-aggregate-function
>
>  I have a table with arrays as one column, and I want to sum the
> array elements together:
>
>> create table regres(a int[] not null); insert into regres values
>> ('{1,2,3}'), ('{9, 12, 13}'); select * from regres;
> a ----------- {1,2,3} {9,12,13}
>
> I want the result to be:
>
> {10, 14, 16}
>
> that is: {1 + 9, 2 + 12, 3 + 13}.
>
> Does such a function already exist somewhere?

You might try PL/R:


create table regres(grp int, a int[] not null);
insert into regres values
  (1,'{1, 2, 3}'),
  (1,'{9, 12, 13}'),
  (1,'{4, 2, 3}'),
  (2,'{4, 5, 6}'),
  (2,'{7, 8, 9}')
;


create or replace function plr_pairwise_sum(a int[], b int[]) returns
int[] as $$
  if (is.null(a))
    return(b)
  else
    return(a + b)
$$ language plr;

CREATE AGGREGATE pairwise_sum (int[])
(
  sfunc = plr_pairwise_sum,
  stype = int[]
);

select grp, pairwise_sum(a) from regres group by grp;
 grp | pairwise_sum
- -----+--------------
   1 | {14,16,19}
   2 | {11,13,15}
(2 rows)


HTH,

Joe

- --
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJT1mxmAAoJEDfy90M199hlvUgQAKWM46GMNsTmOn1VSbEx7FKq
cbolbtGkxuxcuv9bAf/PEEJlQ8RFYKPm7CiBX0V1etKx9brUF2eSnOdwk2vf7irt
T15f9dMfgz7Gh9Ehwa23Qut2PBcqIP4OQ+GCYtxCMn+OnwxrETMkNDFXwxw4P4ZI
IfluCfSdZXfDfh/3VGzNJX8dvCc6iLOTYjCCS7TY4RErgzO+rbdNS+zBbYP+uEo2
e8LuWJMR0Lllh7tEJjLR7aOWPKuy5ytIKgiaWHPHi5kL05VxMNcJc69upcWuOmy6
ITnMA2k3s+QWTNaBUTSONSz7d6v1N5uW2JiOy7tE4KfPsc+rPTa/DbII2W3/rzEX
kc3+EgeemW36Z1kMXnBS1JbrdktcXxDAF9MgnA754chE408+hmwwCynaZ3DpJO+g
1R2ui9f11sZXPRbI39egjO2nAd1QCyk5dXDuZ9l2iDmuv1pBO48Bg+orYExSogXZ
D0/Qbe5DHztw7HUkMve57b6h32dedZN4U2/2kDRjzs47C4v9FnhG88IVl5vizEaX
dxBHNIMG6YlI764koDWXH4NsPVOeL15JoUCln5b9DixoTXSHjdjxFWp+tKRiP8ih
me1lL06BBNR7grP877zkB/ld8F4mbMSROFiOH8vUKKKkXYLCBxNyI1sdgg+Q+Wr3
sN1RckyaWHAVVaWRJmkf
=8783
-----END PGP SIGNATURE-----


Re: Pairwise array sum aggregate function?

От
François Beausoleil
Дата:
Le 2014-07-28 à 10:58, David G Johnston <david.g.johnston@gmail.com> a écrit :

> François Beausoleil wrote
>> Hi all,
>>
>> NOTE: Also posted to StackOverflow:
>> http://stackoverflow.com/questions/24997131/pairwise-array-sum-aggregate-function
>>
>> I have a table with arrays as one column, and I want to sum the array
>> elements together:
>>
>>> create table regres(a int[] not null);
>>> insert into regres values ('{1,2,3}'), ('{9, 12, 13}');
>>> select * from regres;
>>     a
>> -----------
>> {1,2,3}
>> {9,12,13}
>>
>> I want the result to be:
>>
>> {10, 14, 16}
>>
>> that is: {1 + 9, 2 + 12, 3 + 13}.
>>
>> Does such a function already exist somewhere? The intagg extension looked
>> like a good candidate, but such a function does not already exist.
>>
>> The arrays are expected to be between 24 and 31 elements in length, all
>> elements are NOT NULL, and the arrays themselves will also always be NOT
>> NULL. All elements are basic int. There will be more than two rows per
>> aggregate.
>>
>> My implementation target is:
>>
>> PostgreSQL 9.1.13 on x86_64-unknown-linux-gnu, compiled by gcc
>> (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit
>
> This should be doable with pl/pgsql, CREATE FUNCTION, and CREATE AGGREGATE
>
> You basically want to write a "sum" function that accepts an array input.
> You will need to explode both the stored state and the incoming data, add
> them together, then reconstruct a new array to put back into the state.  The
> final function would just return the current contents of state.
>
> With a different language you may be able to optimize by using a native
> array capabilities of the language but with pl/pgsql I think you will have
> to do the explode/rebuild.
>
> Another implementation possibility is to keep track of the number of input
> rows but simply concatenate the new data onto the end of the old data.  In
> the final function you would break apart (unnest) the array into a single
> long set and then use division/modulo to identify which items belong
> together and add them.  You then convert the result back into an array and
> return.  This should save considerable array exploding time.

Erwin Brandstetter posted a great answer on StackOverflow http://stackoverflow.com/a/24997565/7355

SELECT ARRAY(
  SELECT sum(arr[rn])
  FROM (
    SELECT arr, generate_subscripts(arr, 1) rn
    FROM tbl t) sub
  GROUP BY rn
  ORDER BY rn
);

I like the simplicity of the solution, vs creating a new aggregate function.

Thanks everyone!
François Beausoleil