Обсуждение: Re: how to "group" several records with same timestamp into one line?

Поиск
Список
Период
Сортировка

Re: how to "group" several records with same timestamp into one line?

От
"Brent Wood"
Дата:
You need to use a self relation, not a group by, as no data are being aggregated into a new single value, which is what
thegroup by achieves. 

This joins a table to itself, so that columns in it can be replicated. The key is that the where clause in each case
needs to just select one channel, so it acts like a similar table but contains different rows.

Because you used a char() instead of varchar() for channel, you may find your string 'channel1' has spaces in it to pad
itto the specified length, in which case the where clauses below can use "like '%channel1%'" instead of "= 'channel1'" 
or you can strip the spaces before the comparison, eg: where "trim(a.channel)='channel1'". I hope this makes sense.

eg: select a.data,
                a.unit,
                b.data,
                b.unit,
                c.data,
                c.unit,
                d.data,
                d.unit,
                a.create_on
   from record data a,
          record-data b,
          record_data c,
          record_data d
   where a.channel='channel1'
       and b.channel='channel2'
       and c.channel='channel3'
       and d.channel=channel4
       and b.create_on = a.create_on
       and c.create_on = a.create_on
       and d.create_on = a.create on;

Thus table a comprises all records from channel1, etc... and they are joined on a common timestamp.

NOTE: if any of the channels are null for any given timestamp, you will get no record for that timestamp using this
syntax,even if other channels had data, because the query uses an inner join. If this is a problem then you'll need to
rewordthe query as an outer join. 

HTH,

   Brent Wood


Brent Wood
DBA/GIS consultant
NIWA, Wellington
New Zealand
>>> "zxo102 ouyang" <zxo102@gmail.com> 11/13/08 3:15 AM >>>
 Hi everyone,
     My data with same timestamp "2008-11-12 12:12:12" in postgresql are as
follows

rowid  data   unit            channel          create_on
------------------------------------------------------------------------------
 1        1.5     MPa          channel1  2008-11-12 12:12:12
 2        2.5     M3           channel2   2008-11-12 12:12:12
  3        3.5     M3           channel3   2008-11-12 12:12:12
  4        4.5     t               channel4   2008-11-12 12:12:12
------------------------------------------------------------------------------

I would like to "group" them into one line with SQL like

   1.5     MPa   2.5    M3  3.5  M3   4.5   t   2008-11-12 12:12:12


The table is defined as

CREATE TABLE record_data
(
  rowid serial NOT NULL,
  data double precision,
  unit character(10),
   channel character(20),
  create_on timestamp
)

Is that possible?   Thanks for your help in advance.

Ouyang

NIWA is the trading name of the National Institute of Water & Atmospheric Research Ltd.

Re: how to "group" several records with same timestamp into one line?

От
Sam Mason
Дата:
On Thu, Nov 13, 2008 at 06:56:43AM +1300, Brent Wood wrote:
> You need to use a self relation, not a group by, as no data are
> being aggregated into a new single value, which is what the group by
> achieves.

It's perfectly possible to use a GROUP BY clause; all rows from one time
period want to be accumulated into a single row.  To get somewhat close
to Brent's query, the OP could do something like:

  SELECT create_on,
    array_accum(CASE channel when 'channel1' THEN data END) AS data1,
    array_accum(CASE channel when 'channel1' THEN unit END) AS unit1,
    array_accum(CASE channel when 'channel2' THEN data END) AS data2,
    array_accum(CASE channel when 'channel2' THEN unit END) AS unit2,
    array_accum(CASE channel when 'channel3' THEN data END) AS data3,
    array_accum(CASE channel when 'channel3' THEN unit END) AS unit3,
    array_accum(CASE channel when 'channel4' THEN data END) AS data4,
    array_accum(CASE channel when 'channel4' THEN unit END) AS unit4
  FROM record_data
  GROUP BY create_on;

If the number of channels were unknown, a possibility would be:

  SELECT create_on, array_accum(channel||' '||data||' '||unit)
  FROM record_data
  GROUP BY create_on;

If this is being used for things outside PG, turning the resulting
arrays into text can make things easier; array_to_string() is good for
this.  More docs are in:

  http://www.postgresql.org/docs/current/static/functions-aggregate.html
  http://www.postgresql.org/docs/current/static/functions-array.html

If you've got a unique constraint on (create_on,channel) then you
could replace the array_accum() aggregate with MIN.  I've also just
realized that PG doesn't come with array_accum by default, you can find
a definition of it here:

  http://www.postgresql.org/docs/current/static/xaggr.html


  Sam

Re: how to "group" several records with same timestamp into one line?

От
Eus
Дата:
Hi Ho!

--- On Thu, 11/13/08, Brent Wood <b.wood@niwa.co.nz> wrote:

> You need to use a self relation, not a group by, as no data
> are being aggregated into a new single value, which is what
> the group by achieves.
>
> This joins a table to itself, so that columns in it can be
> replicated. The key is that the where clause in each case
> needs to just select one channel, so it acts like a similar
> table but contains different rows.
>
> Because you used a char() instead of varchar() for channel,
> you may find your string 'channel1' has spaces in it
> to pad it to the specified length, in which case the where
> clauses below can use "like '%channel1%'"
> instead of "= 'channel1'"
> or you can strip the spaces before the comparison, eg:
> where "trim(a.channel)='channel1'". I hope
> this makes sense.
>
> eg: select a.data,
>                 a.unit,
>                 b.data,
>                 b.unit,
>                 c.data,
>                 c.unit,
>                 d.data,
>                 d.unit,
>                 a.create_on
>    from record data a,
>           record-data b,
>           record_data c,
>           record_data d
>    where a.channel='channel1'
>        and b.channel='channel2'
>        and c.channel='channel3'
>        and d.channel=channel4
>        and b.create_on = a.create_on
>        and c.create_on = a.create_on
>        and d.create_on = a.create on;
>
> Thus table a comprises all records from channel1, etc...
> and they are joined on a common timestamp.
>
> NOTE: if any of the channels are null for any given
> timestamp, you will get no record for that timestamp using
> this syntax, even if other channels had data, because the
> query uses an inner join. If this is a problem then
> you'll need to reword the query as an outer join.

Isn't that something like this is better handled at the application level instead of the DB level?

IOW, isn't that the cost of doing the query above far more expensive than doing a little coding at the application
level?

May I know your opinion?

Thanks.

> HTH,
>
>    Brent Wood
>
>
> Brent Wood
> DBA/GIS consultant
> NIWA, Wellington
> New Zealand

Best regards,
Eus (FSF member #4445)

In this digital era, where computing technology is pervasive,
your freedom depends on the software controlling those computing devices.

Join free software movement today!
It is free as in freedom, not as in free beer!

Join: http://www.fsf.org/jf?referrer=4445




Re: how to "group" several records with same timestamp into one line?

От
Joshua Tolley
Дата:
On Thu, Nov 13, 2008 at 06:39:47PM -0800, Eus wrote:
> Isn't that something like this is better handled at the application level instead of the DB level?
>
> IOW, isn't that the cost of doing the query above far more expensive than doing a little coding at the application
level?

That's something you'll have to test to really find out, but keep in
mind the database has several advantages over the application in the
general case. First, the database has indexes (presumably), and the
application would just have a raw dataset. The dataset might be
delivered in sorted order if you query it right, which might obviate
that advantage. Or it might not. Second, the database has all the data
locally. If your application lives on some other server, the data
returned by your initial query needs to be streamed across a network,
and converted to data types your application language understands. This
makes a particularly noticeable difference when the result data set is
much smaller, due to filtering in the query, than the original tables.

Other considerations depend on your definition of "expensive". Getting
the programming right to do in application code what you could do in one
query might be a fair bit of work. There has been quite a lot of effort
spent on the database to make sure it performs correctly and
efficiently; probably more so than has been spent on the code you're
writing to replace the query. That said, the database is designed to
support generic operations, and your code might exploit some
characteristic of the data the database can't know about to achieve
better results. In general, it's probably best to let the database
handle data, and let your application handle "business logic" that
transcends the data. Interpretations of the above vary from person to
person :)

All that said, my first guess is that you're better off writing the
query than doing the processing in the application.

- Josh / eggyknap

Вложения