Обсуждение: Aggregate "rollup"
I had written a piece of code about two years ago that used the aggregate feature of PostgreSQL to create an array of integers from an aggregate, as: select int_array_aggregate( column ) from table group by column While it seems pointless to create an array on a select, it has a purpose in OLAP. For instance, suppose you do this: create table fast_lookup as select reference, int_array_aggregate(result) from table group by result The "fast_lookup" table now has all the result entries as an array in a single row. In the systems that I have used this, it has provided a dramatic improvement, especially when you have a high number of identical "reference" entries in a classic "one to many" table. The question is, would a more comprehensive solution be wanted? Possible? Something like: create table fast_lookup as select reference, aggregate_array( field ) from table group by field Where the function aggregate_array takes any number of data types. Any thoughts? I think I need to fix the code in the current /contrib/intagg anyway, so is it worth doing the extra work to included multiple data types?
mlw wrote: > I had written a piece of code about two years ago that used the > aggregate feature of PostgreSQL to create an array of integers from an > aggregate, as: > > select int_array_aggregate( column ) from table group by column > > While it seems pointless to create an array on a select, it has a > purpose in OLAP. For instance, suppose you do this: > > create table fast_lookup as select reference, > int_array_aggregate(result) from table group by result > > The "fast_lookup" table now has all the result entries as an array in a > single row. In the systems that I have used this, it has provided a > dramatic improvement, especially when you have a high number of > identical "reference" entries in a classic "one to many" table. > > The question is, would a more comprehensive solution be wanted? > Possible? Something like: > > create table fast_lookup as select reference, aggregate_array( field ) > from table group by field > > Where the function aggregate_array takes any number of data types. > > Any thoughts? I think I need to fix the code in the current > /contrib/intagg anyway, so is it worth doing the extra work to included > multiple data types? It's also useful in conjunction with statistically processing. There is a array_accum function in PL/R; I just made a post to the SQL list the other day on this. (http://archives.postgresql.org/pgsql-sql/2003-03/msg00124.php) Here's the meat of it: CREATE OR REPLACE FUNCTION array_accum (_name, name) RETURNS name[] AS '$libdir/plr','array_accum' LANGUAGE 'C'; CREATE AGGREGATE accumulate ( sfunc = array_accum, basetype = name, stype = _name ); regression=# SELECT accumulate(tablename) as cruft FROM pg_tables WHERE tablename LIKE 'c%'; cruft --------------------------------------- {connectby_int,connectby_text,ct,cth} (1 row) See: http://www.joeconway.com/plr/doc/plr-aggregate-funcs.html and download at: http://www.joeconway.com/plr/ I'd be happy to split the array functions out of PL/R and sumbit them to PATCHES if there is any interest. Joe
mlw <pgsql@mohawksoft.com> writes: > I had written a piece of code about two years ago that used the aggregate > feature of PostgreSQL to create an array of integers from an aggregate, as: > > select int_array_aggregate( column ) from table group by column I found this and am using it extensively. It's extremely helpful, thank you. It goes well with either the *= operators in contrib/array or the gist indexing in contrib/intarray. One problem I've found though is that the optimizer doesn't have any good statistics for estimating the number of matches of such operators. It seems like fixing that would require a lot of changes to the statistics gathered. > create table fast_lookup as select reference, aggregate_array( field ) from > table group by field > > Where the function aggregate_array takes any number of data types. Sure, that seems logical. Actually I already bumped into a situation where I wanted an array of char(1). I just kludged it to use ascii() of that first character, but it would be cleaner and perhaps better for unicode later to use the actual character. Someone else on the list already asked for an function that gave an array of varchar. I think they were pointed at a general purpose function from plr. -- greg
> -----Original Message----- > From: mlw [mailto:pgsql@mohawksoft.com] > Sent: Wednesday, March 05, 2003 3:47 PM > To: pgsql-hackers@postgresql.org > Subject: [HACKERS] Aggregate "rollup" > > I had written a piece of code about two years ago that used the > aggregate feature of PostgreSQL to create an array of integers from an > aggregate, as: > > select int_array_aggregate( column ) from table group by column > Do I understand correctly that this still follows the normal rules for grouping, so that only like values are put in the array? Example: column has values 1,1,1,2,2 spread over 5 rows. Your query returns two rows with row1={1,1,1} and row2 = {2,2}...is this correct? Also, what if your aggregate column is different from the group column: Table t with columns c1, c2 with 5 rows: C1 C2 1, 1 1, 2 1, 3 2, 1 2, 2 Does select C1, int_array_aggregate( C2 ) from table group by C1 return 1, {1, 2, 3} 2, {1, 2} ?? FWIW, I think that's a pretty cool function. This allows the backend to telescope 1 dimension (only) out of a dataset, the most detailed one. In certain situations with large datasets over slow connections, this could be a big payoff. Also, all this talk about XML has got me thinking about how to allow basic query features to provide simple nesting services. consider: select C1, C2 from t for xml; returns: <t> <C1>1</C1><C2>1</C2> <C1>1</C1><C2>2</C2> <C1>1</C1><C2>3</C2> <C1>2</C1><C2>1</C2> <C1>2</C1><C2>2</C2> </t> select C1, xml_aggregate(C2) from t for xml; returns: <t> <C1 value="1"><C2>1</C2><C2>2</C2><C2>3</C2></C1> <C1 value="2"><C2>1</C2><C2>2</C2><C2>3</C2></C1> </t> > create table fast_lookup as select reference, > int_array_aggregate(result) from table group by result > > The question is, would a more comprehensive solution be wanted? > Possible? Something like: > > > Any thoughts? I think I need to fix the code in the current > /contrib/intagg anyway, so is it worth doing the extra work to included > multiple data types? Yes. Just a thought. Merlin
Merlin Moncure wrote: >>-----Original Message----- >>From: mlw [mailto:pgsql@mohawksoft.com] >>Sent: Wednesday, March 05, 2003 3:47 PM >>To: pgsql-hackers@postgresql.org >>Subject: [HACKERS] Aggregate "rollup" >> >>I had written a piece of code about two years ago that used the >>aggregate feature of PostgreSQL to create an array of integers from an >>aggregate, as: >> >>select int_array_aggregate( column ) from table group by column >> >> >> > >Do I understand correctly that this still follows the normal rules for >grouping, so that only like values are put in the array? > >Example: column has values 1,1,1,2,2 spread over 5 rows. >Your query returns two rows with row1={1,1,1} and row2 = {2,2}...is this >correct? > Actually, it is more intended to put all the entries in a "one to many" table in a single column, as: create table classic_one_to_many ( leftside integer, rightside integer ); create table fast_lookup as select leftside, int_array_aggregate(rightside) from classic_one_to_many group by leftside;