Обсуждение: multi-column aggregates

Поиск
Список
Период
Сортировка

multi-column aggregates

От
Chris Kratz
Дата:
Hello All,

Is there any way in postgres to have an aggregate that uses input from two
columns without using composite types?  The example we are working on is a
first or last aggregate which requires a data value and a date column.  The
result would be to find the latest date within the group and return the
associated data value.  Since there may be multiple date columns each with
dependent columns, we can't use a sort by date and a simpler first or last
aggregate.

Thanks,

-Chris
--
Chris Kratz

Re: multi-column aggregates

От
Tom Lane
Дата:
Chris Kratz <chris.kratz@vistashare.com> writes:
> Is there any way in postgres to have an aggregate that uses input from two
> columns without using composite types?

No.

            regards, tom lane

Re: multi-column aggregates

От
Chris Kratz
Дата:
Thanks Tom,

Well for anyone else who may be interested in doing something similar, here is
what we did.  It does require typecasting going into the functions, composite
types and using the dot notation to get the value back out of the composite
object returned.  But it works.

This is what we wanted...

select last(cur_date, some_column) from some_table....

We got this close...

select (last((cur_date, some_column)::last_int_agg)).value as last_int from...

which I think will be useable for what we need.  If anyone has ideas to
simplify this, I would appreciate it.  Example and generation script attached
at end.

I do have to say that the flexibility in postgres for creating our own data
types and aggregate functions is wonderfull.  Kudos again to everyone who has
but so much time and energy into postgres.
---

-Chris

On Thursday 09 March 2006 01:08 pm, Tom Lane wrote:
> Chris Kratz <chris.kratz@vistashare.com> writes:
> > Is there any way in postgres to have an aggregate that uses input from
> > two columns without using composite types?
>
> No.
>
>             regards, tom lane


-------------------------------------------------------------------------------------
simple example test data::
-------------------------------------------------------------------------------------
test=# select id, grouping, cur_date::date, cur_date2::date, integer_column
from test_agg_last;
 id | grouping |  cur_date  | cur_date2  | integer_column
----+----------+------------+------------+----------------
  1 |        1 | 2006-01-05 | 2006-01-03 |              8
  2 |        1 | 2006-01-01 | 2006-01-05 |             78
  3 |        2 | 2006-01-03 | 2006-01-01 |             32
(3 rows)

test=# select
test-#    grouping,
test-#    (last((cur_date, integer_column)::last_int_agg)).value as last_int,
test-#    (first((cur_date, integer_column)::last_int_agg)).value as
first_int,
test-#    (last((cur_date2, integer_column)::last_int_agg)).value as
last_int2,
test-#    (first((cur_date2, integer_column)::last_int_agg)).value as
first_int2
test-# from test_agg_last
test-# group by grouping
test-# order by grouping
test-# ;
 grouping | last_int | first_int | last_int2 | first_int2
----------+----------+-----------+-----------+------------
        1 |        8 |        78 |        78 |          8
        2 |       32 |        32 |        32 |         32
(2 rows)

-------------------------------------------------------------------------------------
First and Last aggregates using an arbitrary date column
-------------------------------------------------------------------------------------
-- aggregate types
create type last_int_agg as (cur_date timestamp, value int);
create type last_txt_agg as (cur_date timestamp, value text);
create type last_rel_agg as (cur_date timestamp, value double precision);
create type last_num_agg as (cur_date timestamp, value numeric(12,2));
create type last_dte_agg as (cur_date timestamp, value date);
create type last_tme_agg as (cur_date timestamp, value time);
create type last_bln_agg as (cur_date timestamp, value boolean);
create type last_ntv_agg as (cur_date timestamp, value interval);

-- generic last accumulator function
CREATE OR REPLACE function last_accum(anyelement, anyelement) returns
anyelement
   AS $$
   BEGIN
      IF $1.cur_date IS NULL and $2.cur_date IS NOT NULL
         THEN RETURN $2;
      ELSEIF $1.cur_date IS NOT NULL and $2.cur_date IS NULL
         THEN RETURN $1;
      ELSEIF $2.cur_date>$1.cur_date
         THEN RETURN $2;
      ELSE RETURN $1;
      END IF;
   END;
$$ LANGUAGE plpgsql;

-- generic first accumulator function
CREATE OR REPLACE function first_accum(anyelement, anyelement) returns
anyelement
   AS $$
   BEGIN
      IF $1.cur_date IS NULL and $2.cur_date IS NOT NULL
         THEN RETURN $2;
      ELSEIF $1.cur_date IS NOT NULL and $2.cur_date IS NULL
         THEN RETURN $1;
      ELSEIF $2.cur_date<$1.cur_date
         THEN RETURN $2;
      ELSE RETURN $1;
      END IF;
   END;
$$ LANGUAGE plpgsql;

-- last aggregate
CREATE AGGREGATE last (
    sfunc = last_accum,
    basetype = anyelement,
    stype = anyelement
);

-- first aggregate
CREATE AGGREGATE first (
    sfunc = first_accum,
    basetype = anyelement,
    stype = anyelement
);

-- test data

create table test_agg_last(
   id serial primary key,
   grouping integer,
   cur_date timestamp,
   cur_date2 timestamp,
   integer_column integer,
   real_column double precision,
   currency_column numeric(12,2),
   text_column text,
   date_column date,
   time_column time without time zone,
   interval_column interval,
   boolean_column boolean);

insert into test_agg_last(grouping, cur_date, cur_date2, integer_column,
real_column, currency_column, text_column, date_column, time_column,
interval_column, boolean_column)
values(1, '1/5/06', '1/3/06', 8, 38.7, '12.00', 'Four score', '12/3/78',
'3:32pm', '1 day', true);

insert into test_agg_last(grouping, cur_date, cur_date2, integer_column,
real_column, currency_column, text_column, date_column, time_column,
interval_column, boolean_column)
values(1, '1/1/06', '1/5/06', 78, 1.998, '35000.00', 'and seven', '6/18/05',
'12:00am', '4 hours', false);

innsert into test_agg_last(grouping, cur_date, cur_date2, integer_column,
real_column, currency_column, text_column, date_column, time_column,
interval_column, boolean_column)
values(2, '1/3/06', '1/1/06', 32, 0.000001, '100000.00', 'years ago',
'1/25/2010', '11:37am', '23 minutes', true);

-- test using multiple date columns with first and last
select
   grouping,
   (last((cur_date, integer_column)::last_int_agg)).value as last_int,
   (first((cur_date, integer_column)::last_int_agg)).value as first_int,
   (last((cur_date2, integer_column)::last_int_agg)).value as last_int2,
   (first((cur_date2, integer_column)::last_int_agg)).value as first_int2
from test_agg_last
group by grouping
order by grouping
;

-- test several different common types
select
   grouping,
   (last((cur_date, integer_column)::last_int_agg)).value as last_int,
   (first((cur_date, integer_column)::last_int_agg)).value as first_int,
   (last((cur_date, real_column)::last_rel_agg)).value as last_real,
   (first((cur_date, real_column)::last_rel_agg)).value as first_real,
   (last((cur_date, currency_column)::last_num_agg)).value as last_currency,
   (first((cur_date, currency_column)::last_num_agg)).value as first_currency,
   (last((cur_date, text_column)::last_txt_agg)).value as last_text,
   (first((cur_date, text_column)::last_txt_agg)).value as first_text,
   (last((cur_date, date_column)::last_dte_agg)).value as last_date,
   (first((cur_date, date_column)::last_dte_agg)).value as first_date,
   (last((cur_date, time_column)::last_tme_agg)).value as last_time,
   (first((cur_date, time_column)::last_tme_agg)).value as first_time,
   (last((cur_date, interval_column)::last_ntv_agg)).value as last_interval,
   (first((cur_date, interval_column)::last_ntv_agg)).value as first_interval,
   (last((cur_date, boolean_column)::last_bln_agg)).value as last_boolean,
   (first((cur_date, boolean_column)::last_bln_agg)).value as first_boolean

from test_agg_last
group by grouping
order by grouping
;

-- cleanup test data

drop table test_agg_last;

Re: multi-column aggregates

От
"Merlin Moncure"
Дата:
Chris Kratz wrote:
> Well for anyone else who may be interested in doing something similar, here is
> what we did.  It does require typecasting going into the functions, composite
> types and using the dot notation to get the value back out of the composite
> object returned.  But it works.
>
> This is what we wanted...
>
> select last(cur_date, some_column) from some_table....
>
> We got this close...
>
> select (last((cur_date, some_column)::last_int_agg)).value as last_int from...

have you looked at new row-wise comparison feature (i might be
misunderstanding your problem)?

select some_column from some_table where (cur_date, some_column) <
'01/01/06',  99999999) order by cur_date desc, some_column desc limit
1;

this will give you the highest value of some_column on the abitrarily
chosen date 01/01/06 (assuming all values of some_column are less than
99999999).

Merlin

Re: multi-column aggregates

От
Berend Tober
Дата:
I may not fully understand your situation but

SELECT distinct grouping,
(SELECT integer_column FROM test.test_agg_last WHERE
grouping=t1.grouping ORDER BY cur_date DESC LIMIT 1) AS last_int,
(SELECT integer_column FROM test.test_agg_last WHERE
grouping=t1.grouping ORDER BY cur_date ASC LIMIT 1) AS first_int,
(SELECT integer_column FROM test.test_agg_last WHERE
grouping=t1.grouping ORDER BY cur_date2 DESC LIMIT 1) AS last_int2,
(SELECT integer_column FROM test.test_agg_last WHERE
grouping=t1.grouping ORDER BY cur_date2 ASC LIMIT 1) AS first_int2
FROM test.test_agg_last t1

produced the same result as at least your first example:

1,8,78,78,8
2,32,32,32,32

I think it does what you are trying to do and can be expanded to your
other examples. Maybe not as cool as using composite aggregates, though.

-- BMT

Chris Kratz wrote:

>Thanks Tom,
>
>Well for anyone else who may be interested in doing something similar, here is
>what we did.  It does require typecasting going into the functions, composite
>types and using the dot notation to get the value back out of the composite
>object returned.  But it works.
>
>This is what we wanted...
>
>select last(cur_date, some_column) from some_table....
>
>We got this close...
>
>select (last((cur_date, some_column)::last_int_agg)).value as last_int from...
>
>which I think will be useable for what we need.  If anyone has ideas to
>simplify this, I would appreciate it.  Example and generation script attached
>at end.
>
>I do have to say that the flexibility in postgres for creating our own data
>types and aggregate functions is wonderfull.  Kudos again to everyone who has
>but so much time and energy into postgres.
>---
>
>-Chris
>
>On Thursday 09 March 2006 01:08 pm, Tom Lane wrote:
>
>
>>Chris Kratz <chris.kratz@vistashare.com> writes:
>>
>>
>>>Is there any way in postgres to have an aggregate that uses input from
>>>two columns without using composite types?
>>>
>>>
>>No.
>>
>>            regards, tom lane
>>
>>
>
>
>-------------------------------------------------------------------------------------
>simple example test data::
>-------------------------------------------------------------------------------------
>test=# select id, grouping, cur_date::date, cur_date2::date, integer_column
>from test_agg_last;
> id | grouping |  cur_date  | cur_date2  | integer_column
>----+----------+------------+------------+----------------
>  1 |        1 | 2006-01-05 | 2006-01-03 |              8
>  2 |        1 | 2006-01-01 | 2006-01-05 |             78
>  3 |        2 | 2006-01-03 | 2006-01-01 |             32
>(3 rows)
>
>test=# select
>test-#    grouping,
>test-#    (last((cur_date, integer_column)::last_int_agg)).value as last_int,
>test-#    (first((cur_date, integer_column)::last_int_agg)).value as
>first_int,
>test-#    (last((cur_date2, integer_column)::last_int_agg)).value as
>last_int2,
>test-#    (first((cur_date2, integer_column)::last_int_agg)).value as
>first_int2
>test-# from test_agg_last
>test-# group by grouping
>test-# order by grouping
>test-# ;
> grouping | last_int | first_int | last_int2 | first_int2
>----------+----------+-----------+-----------+------------
>        1 |        8 |        78 |        78 |          8
>        2 |       32 |        32 |        32 |         32
>(2 rows)
>
>-------------------------------------------------------------------------------------
>First and Last aggregates using an arbitrary date column
>-------------------------------------------------------------------------------------
>-- aggregate types
>create type last_int_agg as (cur_date timestamp, value int);
>create type last_txt_agg as (cur_date timestamp, value text);
>create type last_rel_agg as (cur_date timestamp, value double precision);
>create type last_num_agg as (cur_date timestamp, value numeric(12,2));
>create type last_dte_agg as (cur_date timestamp, value date);
>create type last_tme_agg as (cur_date timestamp, value time);
>create type last_bln_agg as (cur_date timestamp, value boolean);
>create type last_ntv_agg as (cur_date timestamp, value interval);
>
>-- generic last accumulator function
>CREATE OR REPLACE function last_accum(anyelement, anyelement) returns
>anyelement
>   AS $$
>   BEGIN
>      IF $1.cur_date IS NULL and $2.cur_date IS NOT NULL
>         THEN RETURN $2;
>      ELSEIF $1.cur_date IS NOT NULL and $2.cur_date IS NULL
>         THEN RETURN $1;
>      ELSEIF $2.cur_date>$1.cur_date
>         THEN RETURN $2;
>      ELSE RETURN $1;
>      END IF;
>   END;
>$$ LANGUAGE plpgsql;
>
>-- generic first accumulator function
>CREATE OR REPLACE function first_accum(anyelement, anyelement) returns
>anyelement
>   AS $$
>   BEGIN
>      IF $1.cur_date IS NULL and $2.cur_date IS NOT NULL
>         THEN RETURN $2;
>      ELSEIF $1.cur_date IS NOT NULL and $2.cur_date IS NULL
>         THEN RETURN $1;
>      ELSEIF $2.cur_date<$1.cur_date
>         THEN RETURN $2;
>      ELSE RETURN $1;
>      END IF;
>   END;
>$$ LANGUAGE plpgsql;
>
>-- last aggregate
>CREATE AGGREGATE last (
>    sfunc = last_accum,
>    basetype = anyelement,
>    stype = anyelement
>);
>
>-- first aggregate
>CREATE AGGREGATE first (
>    sfunc = first_accum,
>    basetype = anyelement,
>    stype = anyelement
>);
>
>-- test data
>
>create table test_agg_last(
>   id serial primary key,
>   grouping integer,
>   cur_date timestamp,
>   cur_date2 timestamp,
>   integer_column integer,
>   real_column double precision,
>   currency_column numeric(12,2),
>   text_column text,
>   date_column date,
>   time_column time without time zone,
>   interval_column interval,
>   boolean_column boolean);
>
>insert into test_agg_last(grouping, cur_date, cur_date2, integer_column,
>real_column, currency_column, text_column, date_column, time_column,
>interval_column, boolean_column)
>values(1, '1/5/06', '1/3/06', 8, 38.7, '12.00', 'Four score', '12/3/78',
>'3:32pm', '1 day', true);
>
>insert into test_agg_last(grouping, cur_date, cur_date2, integer_column,
>real_column, currency_column, text_column, date_column, time_column,
>interval_column, boolean_column)
>values(1, '1/1/06', '1/5/06', 78, 1.998, '35000.00', 'and seven', '6/18/05',
>'12:00am', '4 hours', false);
>
>innsert into test_agg_last(grouping, cur_date, cur_date2, integer_column,
>real_column, currency_column, text_column, date_column, time_column,
>interval_column, boolean_column)
>values(2, '1/3/06', '1/1/06', 32, 0.000001, '100000.00', 'years ago',
>'1/25/2010', '11:37am', '23 minutes', true);
>
>-- test using multiple date columns with first and last
>select
>   grouping,
>   (last((cur_date, integer_column)::last_int_agg)).value as last_int,
>   (first((cur_date, integer_column)::last_int_agg)).value as first_int,
>   (last((cur_date2, integer_column)::last_int_agg)).value as last_int2,
>   (first((cur_date2, integer_column)::last_int_agg)).value as first_int2
>from test_agg_last
>group by grouping
>order by grouping
>;
>
>-- test several different common types
>select
>   grouping,
>   (last((cur_date, integer_column)::last_int_agg)).value as last_int,
>   (first((cur_date, integer_column)::last_int_agg)).value as first_int,
>   (last((cur_date, real_column)::last_rel_agg)).value as last_real,
>   (first((cur_date, real_column)::last_rel_agg)).value as first_real,
>   (last((cur_date, currency_column)::last_num_agg)).value as last_currency,
>   (first((cur_date, currency_column)::last_num_agg)).value as first_currency,
>   (last((cur_date, text_column)::last_txt_agg)).value as last_text,
>   (first((cur_date, text_column)::last_txt_agg)).value as first_text,
>   (last((cur_date, date_column)::last_dte_agg)).value as last_date,
>   (first((cur_date, date_column)::last_dte_agg)).value as first_date,
>   (last((cur_date, time_column)::last_tme_agg)).value as last_time,
>   (first((cur_date, time_column)::last_tme_agg)).value as first_time,
>   (last((cur_date, interval_column)::last_ntv_agg)).value as last_interval,
>   (first((cur_date, interval_column)::last_ntv_agg)).value as first_interval,
>   (last((cur_date, boolean_column)::last_bln_agg)).value as last_boolean,
>   (first((cur_date, boolean_column)::last_bln_agg)).value as first_boolean
>
>from test_agg_last
>group by grouping
>order by grouping
>;
>
>-- cleanup test data
>
>drop table test_agg_last;
>
>---------------------------(end of broadcast)---------------------------
>TIP 5: don't forget to increase your free space map settings
>
>
>


--
Regards,
Berend Tober
Seaworthy Systems, Inc.
860-767-9061


Re: multi-column aggregates

От
Bruno Wolff III
Дата:
On Thu, Mar 09, 2006 at 12:56:21 -0500,
  Chris Kratz <chris.kratz@vistashare.com> wrote:
> Hello All,
>
> Is there any way in postgres to have an aggregate that uses input from two
> columns without using composite types?  The example we are working on is a
> first or last aggregate which requires a data value and a date column.  The
> result would be to find the latest date within the group and return the
> associated data value.  Since there may be multiple date columns each with
> dependent columns, we can't use a sort by date and a simpler first or last
> aggregate.

It sounds like you could use the (postgres specific) DISTINCT ON construct
to do what you are looking for. Something like:
SELECT DISTINCT ON (groupcol) datacol ORDER BY groupcol DESC, datecol DESC;
If there are multiple versions of this being done at the same time, you
can do them separately and then join then on the group key (groupcol in
the example).

Re: multi-column aggregates

От
Chris Kratz
Дата:
Hello Bruno,

Yes, we have used the distinct on operator in the past and that works quite
well when you have a single ordering column or multiples which don't
contradict each other.  The joins would work, but I was hoping for a simpler
solution as this is sql generated from a general purpose query tool.  Anyway,
thanks for the suggestion.

Thanks,

-Chris

On Thursday 09 March 2006 05:28 pm, Bruno Wolff III wrote:
> On Thu, Mar 09, 2006 at 12:56:21 -0500,
>
>   Chris Kratz <chris.kratz@vistashare.com> wrote:
> > Hello All,
> >
> > Is there any way in postgres to have an aggregate that uses input from
> > two columns without using composite types?  The example we are working on
> > is a first or last aggregate which requires a data value and a date
> > column.  The result would be to find the latest date within the group and
> > return the associated data value.  Since there may be multiple date
> > columns each with dependent columns, we can't use a sort by date and a
> > simpler first or last aggregate.
>
> It sounds like you could use the (postgres specific) DISTINCT ON construct
> to do what you are looking for. Something like:
> SELECT DISTINCT ON (groupcol) datacol ORDER BY groupcol DESC, datecol DESC;
> If there are multiple versions of this being done at the same time, you
> can do them separately and then join then on the group key (groupcol in
> the example).
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

--
Chris Kratz

Re: multi-column aggregates

От
Chris Kratz
Дата:
Hello Berend,

Thanks for the reply.  Yes, subselects would work very well and in some ways
are more elegant then the hand waving we had to do to get the multi-column
aggregates to work.  The reason we moved away from the subselects is that the
queries tend to be quite complex and all of the joins and where clause and
other odds and ends which are used to limit the parent select would need to
be in each subselect for it to accurately reflect the first or last within
the record set shown.  My example didn't really reflect the fact that the
parent query is often quite complex with multiple joins, subselects of it's
own, etc.

Probably the best thing would be to use intermediate temp tables to get the
rowset that needs grouping and then use subselects to get the appropriate
first or last item in each case.  I was just hoping we could come up with a
aggregate so we could easily slip it into place beside the normal aggregates
provided (sum, avg, count, etc) by the application.

-Chris

On Thursday 09 March 2006 03:20 pm, Berend Tober wrote:
> I may not fully understand your situation but
>
> SELECT distinct grouping,
> (SELECT integer_column FROM test.test_agg_last WHERE
> grouping=t1.grouping ORDER BY cur_date DESC LIMIT 1) AS last_int,
> (SELECT integer_column FROM test.test_agg_last WHERE
> grouping=t1.grouping ORDER BY cur_date ASC LIMIT 1) AS first_int,
> (SELECT integer_column FROM test.test_agg_last WHERE
> grouping=t1.grouping ORDER BY cur_date2 DESC LIMIT 1) AS last_int2,
> (SELECT integer_column FROM test.test_agg_last WHERE
> grouping=t1.grouping ORDER BY cur_date2 ASC LIMIT 1) AS first_int2
> FROM test.test_agg_last t1
>
> produced the same result as at least your first example:
>
> 1,8,78,78,8
> 2,32,32,32,32
>
> I think it does what you are trying to do and can be expanded to your
> other examples. Maybe not as cool as using composite aggregates, though.
>
> -- BMT
>
> Chris Kratz wrote:
> >Thanks Tom,
> >
> >Well for anyone else who may be interested in doing something similar,
> > here is what we did.  It does require typecasting going into the
> > functions, composite types and using the dot notation to get the value
> > back out of the composite object returned.  But it works.
> >
> >This is what we wanted...
> >
> >select last(cur_date, some_column) from some_table....
> >
> >We got this close...
> >
> >select (last((cur_date, some_column)::last_int_agg)).value as last_int
> > from...
> >
> >which I think will be useable for what we need.  If anyone has ideas to
> >simplify this, I would appreciate it.  Example and generation script
> > attached at end.
> >
> >I do have to say that the flexibility in postgres for creating our own
> > data types and aggregate functions is wonderfull.  Kudos again to
> > everyone who has but so much time and energy into postgres.
> >---
> >
> >-Chris
> >
> >On Thursday 09 March 2006 01:08 pm, Tom Lane wrote:
> >>Chris Kratz <chris.kratz@vistashare.com> writes:
> >>>Is there any way in postgres to have an aggregate that uses input from
> >>>two columns without using composite types?
> >>
> >>No.
> >>
> >>            regards, tom lane
> >
> >--------------------------------------------------------------------------
> >----------- simple example test data::
> >--------------------------------------------------------------------------
> >----------- test=# select id, grouping, cur_date::date, cur_date2::date,
> > integer_column from test_agg_last;
> > id | grouping |  cur_date  | cur_date2  | integer_column
> >----+----------+------------+------------+----------------
> >  1 |        1 | 2006-01-05 | 2006-01-03 |              8
> >  2 |        1 | 2006-01-01 | 2006-01-05 |             78
> >  3 |        2 | 2006-01-03 | 2006-01-01 |             32
> >(3 rows)
> >
> >test=# select
> >test-#    grouping,
> >test-#    (last((cur_date, integer_column)::last_int_agg)).value as
> > last_int, test-#    (first((cur_date,
> > integer_column)::last_int_agg)).value as first_int,
> >test-#    (last((cur_date2, integer_column)::last_int_agg)).value as
> >last_int2,
> >test-#    (first((cur_date2, integer_column)::last_int_agg)).value as
> >first_int2
> >test-# from test_agg_last
> >test-# group by grouping
> >test-# order by grouping
> >test-# ;
> > grouping | last_int | first_int | last_int2 | first_int2
> >----------+----------+-----------+-----------+------------
> >        1 |        8 |        78 |        78 |          8
> >        2 |       32 |        32 |        32 |         32
> >(2 rows)
> >
> >--------------------------------------------------------------------------
> >----------- First and Last aggregates using an arbitrary date column
> >--------------------------------------------------------------------------
> >----------- -- aggregate types
> >create type last_int_agg as (cur_date timestamp, value int);
> >create type last_txt_agg as (cur_date timestamp, value text);
> >create type last_rel_agg as (cur_date timestamp, value double precision);
> >create type last_num_agg as (cur_date timestamp, value numeric(12,2));
> >create type last_dte_agg as (cur_date timestamp, value date);
> >create type last_tme_agg as (cur_date timestamp, value time);
> >create type last_bln_agg as (cur_date timestamp, value boolean);
> >create type last_ntv_agg as (cur_date timestamp, value interval);
> >
> >-- generic last accumulator function
> >CREATE OR REPLACE function last_accum(anyelement, anyelement) returns
> >anyelement
> >   AS $$
> >   BEGIN
> >      IF $1.cur_date IS NULL and $2.cur_date IS NOT NULL
> >         THEN RETURN $2;
> >      ELSEIF $1.cur_date IS NOT NULL and $2.cur_date IS NULL
> >         THEN RETURN $1;
> >      ELSEIF $2.cur_date>$1.cur_date
> >         THEN RETURN $2;
> >      ELSE RETURN $1;
> >      END IF;
> >   END;
> >$$ LANGUAGE plpgsql;
> >
> >-- generic first accumulator function
> >CREATE OR REPLACE function first_accum(anyelement, anyelement) returns
> >anyelement
> >   AS $$
> >   BEGIN
> >      IF $1.cur_date IS NULL and $2.cur_date IS NOT NULL
> >         THEN RETURN $2;
> >      ELSEIF $1.cur_date IS NOT NULL and $2.cur_date IS NULL
> >         THEN RETURN $1;
> >      ELSEIF $2.cur_date<$1.cur_date
> >         THEN RETURN $2;
> >      ELSE RETURN $1;
> >      END IF;
> >   END;
> >$$ LANGUAGE plpgsql;
> >
> >-- last aggregate
> >CREATE AGGREGATE last (
> >    sfunc = last_accum,
> >    basetype = anyelement,
> >    stype = anyelement
> >);
> >
> >-- first aggregate
> >CREATE AGGREGATE first (
> >    sfunc = first_accum,
> >    basetype = anyelement,
> >    stype = anyelement
> >);
> >
> >-- test data
> >
> >create table test_agg_last(
> >   id serial primary key,
> >   grouping integer,
> >   cur_date timestamp,
> >   cur_date2 timestamp,
> >   integer_column integer,
> >   real_column double precision,
> >   currency_column numeric(12,2),
> >   text_column text,
> >   date_column date,
> >   time_column time without time zone,
> >   interval_column interval,
> >   boolean_column boolean);
> >
> >insert into test_agg_last(grouping, cur_date, cur_date2, integer_column,
> >real_column, currency_column, text_column, date_column, time_column,
> >interval_column, boolean_column)
> >values(1, '1/5/06', '1/3/06', 8, 38.7, '12.00', 'Four score', '12/3/78',
> >'3:32pm', '1 day', true);
> >
> >insert into test_agg_last(grouping, cur_date, cur_date2, integer_column,
> >real_column, currency_column, text_column, date_column, time_column,
> >interval_column, boolean_column)
> >values(1, '1/1/06', '1/5/06', 78, 1.998, '35000.00', 'and seven',
> > '6/18/05', '12:00am', '4 hours', false);
> >
> >innsert into test_agg_last(grouping, cur_date, cur_date2, integer_column,
> >real_column, currency_column, text_column, date_column, time_column,
> >interval_column, boolean_column)
> >values(2, '1/3/06', '1/1/06', 32, 0.000001, '100000.00', 'years ago',
> >'1/25/2010', '11:37am', '23 minutes', true);
> >
> >-- test using multiple date columns with first and last
> >select
> >   grouping,
> >   (last((cur_date, integer_column)::last_int_agg)).value as last_int,
> >   (first((cur_date, integer_column)::last_int_agg)).value as first_int,
> >   (last((cur_date2, integer_column)::last_int_agg)).value as last_int2,
> >   (first((cur_date2, integer_column)::last_int_agg)).value as first_int2
> >from test_agg_last
> >group by grouping
> >order by grouping
> >;
> >
> >-- test several different common types
> >select
> >   grouping,
> >   (last((cur_date, integer_column)::last_int_agg)).value as last_int,
> >   (first((cur_date, integer_column)::last_int_agg)).value as first_int,
> >   (last((cur_date, real_column)::last_rel_agg)).value as last_real,
> >   (first((cur_date, real_column)::last_rel_agg)).value as first_real,
> >   (last((cur_date, currency_column)::last_num_agg)).value as
> > last_currency, (first((cur_date, currency_column)::last_num_agg)).value
> > as first_currency, (last((cur_date, text_column)::last_txt_agg)).value as
> > last_text, (first((cur_date, text_column)::last_txt_agg)).value as
> > first_text, (last((cur_date, date_column)::last_dte_agg)).value as
> > last_date, (first((cur_date, date_column)::last_dte_agg)).value as
> > first_date, (last((cur_date, time_column)::last_tme_agg)).value as
> > last_time, (first((cur_date, time_column)::last_tme_agg)).value as
> > first_time, (last((cur_date, interval_column)::last_ntv_agg)).value as
> > last_interval, (first((cur_date, interval_column)::last_ntv_agg)).value
> > as first_interval, (last((cur_date, boolean_column)::last_bln_agg)).value
> > as last_boolean, (first((cur_date, boolean_column)::last_bln_agg)).value
> > as first_boolean
> >
> >from test_agg_last
> >group by grouping
> >order by grouping
> >;
> >
> >-- cleanup test data
> >
> >drop table test_agg_last;
> >
> >---------------------------(end of broadcast)---------------------------
> >TIP 5: don't forget to increase your free space map settings

Re: multi-column aggregates

От
Chris Kratz
Дата:
On Thursday 09 March 2006 02:18 pm, Merlin Moncure wrote:
> Chris Kratz wrote:
> > Well for anyone else who may be interested in doing something similar,
> > here is what we did.  It does require typecasting going into the
> > functions, composite types and using the dot notation to get the value
> > back out of the composite object returned.  But it works.
> >
> > This is what we wanted...
> >
> > select last(cur_date, some_column) from some_table....
> >
> > We got this close...
> >
> > select (last((cur_date, some_column)::last_int_agg)).value as last_int
> > from...
>
> have you looked at new row-wise comparison feature (i might be
> misunderstanding your problem)?
>
> select some_column from some_table where (cur_date, some_column) <
> '01/01/06',  99999999) order by cur_date desc, some_column desc limit
> 1;
>
> this will give you the highest value of some_column on the abitrarily
> chosen date 01/01/06 (assuming all values of some_column are less than
> 99999999).
>
> Merlin
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

Hello Merlin,

I have to confess I'm not real familiar with rowwise comparisons.  Would this
work when you have a large number of rows.  For example, give me all
individuals and their income their favorite TV Show the  first and last times
they were contacted.  ie | Person | First Favorite | Last Favorite | ...

Would you use a subselect for each rowwise comparison and use the result as
the value for the column?

Thanks,

-Chris

Re: multi-column aggregates

От
Tom Lane
Дата:
Chris Kratz <chris.kratz@vistashare.com> writes:
> Thanks for the reply.  Yes, subselects would work very well and in some ways
> are more elegant then the hand waving we had to do to get the multi-column
> aggregates to work.

BTW, there is not any fundamental reason why we can't support aggregate
functions with multiple inputs.  I looked at this not long ago and
determined that the major stumbling blocks would be
(1) trying to keep the code in nodeAgg.c from getting a lot more
complicated and hence slower for the single-input case;
(2) figuring out how to change the syntax for CREATE AGGREGATE.

It certainly seems doable if someone wants to spend time on it.

            regards, tom lane

Re: multi-column aggregates

От
"Merlin Moncure"
Дата:
> I have to confess I'm not real familiar with rowwise comparisons.  Would this
> work when you have a large number of rows.  For example, give me all
> individuals and their income their favorite TV Show the  first and last times
> they were contacted.  ie | Person | First Favorite | Last Favorite | ...

oh, just fyi row-wise comparison enhancment is part of postgresql 8.2.

well, hm. maybe not. all row-wise comparisons do is allow you an easy
way to compare groups of fields instead of one field at a time.  Also,
if there is an available index on those fields, it gets an opportunity
to be used to invoke the comparison.

> Would you use a subselect for each rowwise comparison and use the result as
> the value for the column?

this might be possible, and would be trivial to express in a function,
but could be problematic for performance as the subquery has to get
evaluated for every instance  of the parent record..in other words the
query will scale poorly with the size of the result set. row-wise
comparison is just syntax sure and a small performance enhancement...

create or replace function last_contact_date(in_person bigint) returns date as
$$
  select contact_date from contact_occurance where (person,
contact_date) < ($1, '01/01/9999'::date) order by person desc,
contact_date desc limit 1;
$$ language sql;

the above query will use an index on person, contact_date if it
exists.  Your existing solution might be better though: I need to read
through it some more and understand it!

merlin

Re: multi-column aggregates

От
Berend Tober
Дата:
Tom Lane wrote:

>Chris Kratz <chris.kratz@vistashare.com> writes:
>
>
>>Thanks for the reply.  Yes, subselects would work very well and in some ways
>>are more elegant then the hand waving we had to do to get the multi-column
>>aggregates to work.
>>
>>
>
>BTW, there is not any fundamental reason why we can't support aggregate
>functions with multiple inputs.  I looked at this not long ago and
>determined that the major stumbling blocks would be
>(1) trying to keep the code in nodeAgg.c from getting a lot more
>complicated and hence slower for the single-input case;
>(2) figuring out how to change the syntax for CREATE AGGREGATE.
>
>It certainly seems doable if someone wants to spend time on it.
>
>
I did some work a couple months ago (and posted here
"http://archives.postgresql.org/pgsql-general/2005-11/msg01214.php") on
developing composite aggregates for linear-regression, and while at first I was a bit dissatisfied with having to cast
mydata pairs as POINT data type, once I got confortable with that it made a lot of sense. 

Regards,
Berend Tober



Re: multi-column aggregates

От
Chris Kratz
Дата:
On Monday 13 March 2006 03:21 pm, Tom Lane wrote:
> Chris Kratz <chris.kratz@vistashare.com> writes:
> > Thanks for the reply.  Yes, subselects would work very well and in some
> > ways are more elegant then the hand waving we had to do to get the
> > multi-column aggregates to work.
>
> BTW, there is not any fundamental reason why we can't support aggregate
> functions with multiple inputs.  I looked at this not long ago and
> determined that the major stumbling blocks would be
> (1) trying to keep the code in nodeAgg.c from getting a lot more
> complicated and hence slower for the single-input case;
> (2) figuring out how to change the syntax for CREATE AGGREGATE.
>
> It certainly seems doable if someone wants to spend time on it.
>
>             regards, tom lane

That is interesting to know.  Unfortunately, I don't believe my programming
foo is up to hacking pg code at this point.  If I were to spend time on it
though, I think I'd rather spend time on true materialized views rather then
this since I have a working solution.  :-)

When I started this, I would have been in much greater favor of multi-column
aggregates.  As it stands, after learning about composite types, they are a
fairly elegant solution to the problem without making the code more complex
for the single column variant.  Having said that, there are certain problem
domains where multi-column aggregates are really useful and if it were to
show up in a future release I certainly wouldn't complain.

BTW, I found it interesting that MS SQL Server doesn't appear to support
multi-column aggregates either.  Of course I may not have known where to look
either.

One option for 2 above might be to use composite/row types for the create
aggregate call.  If the Parens are optional for the single column versions
then it would be backwards compatible with aggregate definitions now.

ie
CREATE AGGREGATE some_agg (
    BASETYPE = (input_data_type, input data_type2, ...),
    SFUNC = sfunc,
    STYPE = (state_data_type, state_data_type2, ...),
 )

The main difference between this and what we have today would be that the user
doesn't have to create composite data types, or use typecasting if sfunc is
overloaded.

Just a thought.

-Chris