Re: partial time stamp query
От | Kirk Wythers |
---|---|
Тема | Re: partial time stamp query |
Дата | |
Msg-id | 14FB805F-E32C-4E58-8156-06B902FBB999@umn.edu обсуждение исходный текст |
Ответ на | Re: partial time stamp query (Misa Simic <misa.simic@gmail.com>) |
Список | pgsql-general |
On Feb 4, 2013, at 7:03 PM, Misa Simic <misa.simic@gmail.com> wrote: > Select time2::date, extract('hour' from time2), AVG(avg) from = tablename group by time2::date, extract('hour' from time2) Thanks Misa,=20 But this gives the same result as the way I was using date_trunc (not = GROUPING BY the hour portion of the timestamp, or in this case the = re-cast date). I have simplified the query, as much as I can, and it is = below: --COPY ( SELECT derived_tsoil_fifteen_min_stacked.time2::date, =20 extract('hour' FROM derived_tsoil_fifteen_min_stacked.time2), data_key.plot, data_key.variable_name, AVG(derived_tsoil_fifteen_min_stacked.value) FROM data_key, derived_tsoil_fifteen_min_stacked WHERE data_key.variable_channel =3D = derived_tsoil_fifteen_min_stacked.variable AND data_key.block_name =3D = derived_tsoil_fifteen_min_stacked.block_name AND data_key.plot =3D 'a2' AND derived_tsoil_fifteen_min_stacked.time2::date =3D = '2010-07-07'::date AND derived_tsoil_fifteen_min_stacked.variable =3D = 'tsoil_avg1_sc' GROUP BY derived_tsoil_fifteen_min_stacked.time2::date,=20 extract('hour' FROM derived_tsoil_fifteen_min_stacked.time2), derived_tsoil_fifteen_min_stacked.time2, data_key.variable_name, data_key.plot ORDER BY derived_tsoil_fifteen_min_stacked.time2 --) TO '/tmp/derived_tsoil_hourly.csv' WITH CSV HEADER ; This query returns 96 records (again, one for each 15 minute interval in = the 24 hour day).=20 2010-07-07 0 a2 tsoil_sc 21.0599994659424 2010-07-07 0 a2 tsoil_sc 20.9599990844727 2010-07-07 0 a2 tsoil_sc 20.8799991607666 2010-07-07 0 a2 tsoil_sc 20.7999992370605 2010-07-07 1 a2 tsoil_sc 20.7199993133545 2010-07-07 1 a2 tsoil_sc 20.6399993896484 2010-07-07 1 a2 tsoil_sc 20.5499992370605 2010-07-07 1 a2 tsoil_sc 20.4699993133545 2010-07-07 2 a2 tsoil_sc 20.3899993896484 2010-07-07 2 a2 tsoil_sc 20.3099994659424 2010-07-07 2 a2 tsoil_sc 20.25 2010-07-07 2 a2 tsoil_sc 20.1599998474121 2010-07-07 3 a2 tsoil_sc 20.1000003814697 2010-07-07 3 a2 tsoil_sc 20.0100002288818 2010-07-07 3 a2 tsoil_sc 19.9400005340576 2010-07-07 3 a2 tsoil_sc 19.8600006103516 2010-07-07 4 a2 tsoil_sc 19.7700004577637 2010-07-07 4 a2 tsoil_sc 19.7199993133545 2010-07-07 4 a2 tsoil_sc 19.6499996185303 2010-07-07 4 a2 tsoil_sc 19.5599994659424 etc=85. Could there be anything in the JOIN part of this query that is causing = problems? I'm really grasping at straws now! Thanks again, Kirk >=20 >=20 > On Monday, February 4, 2013, Kirk Wythers wrote: > Hi Brent, >=20 > Nice to hear from you. I hope your world is good. >=20 > On Feb 4, 2013, at 2:14 PM, Brent Wood <Brent.Wood@niwa.co.nz> wrote: >=20 > > Hi Kirk, > > > > We have a (near) real time data database for instrument observations = from our research vessels. All observations (summarised to one minute = intervals - the actual raw data is in netCDF, this database makes for = easier access & meets most users needs) go into a single table, with = other tables for metadata about platforms, instruments, etc. Now = approaching 350m records, so reasonably substantial. > > > > Underlying optimisations include > > > > partitioned readings table, with a separate partition for each year = (now 23 years) > > clustered index on timestamp for the previous years partitions. > > largeish filesystem block size - tested to work well with the = clustered index & small size records) > > > > These generally make a big difference to performance. To address one = issue, much like yours, where some users want hourly data for a year, = some want daily data for 10 years & some want 1 minute data for the last = month (& some, no doubt, want one minute data for 20+ years!) I = introduced an integer column called timer. This value is set according = to the time (not date) of each record. >=20 > Very similar to what I need to do. Our main table consists of records = that have been standardized to 15 minute timestamps. Here is a = simplified example >=20 > record timestamp variable value > 1 12:00:00 temp 12.6 > 2 12:15:00 temp 12.3 > 3 12:30:00 temp 11.7 > 4 12:45:00 temp 12.3 > 5 13:00:00 temp 13.9 > 6 13:15:00 temp 12.5 > 7 13.30:00 temp 13.7 > 8 13:45:00 temp 12.0 >=20 > You are exactly right, some people will want the original 15 minute = version, some people will want these summarized to hourly data, and = others will want these summarized to daily data. Still others may be = satisfied with monthly summaries. >=20 > > > > Along the lines of (from memory) :an even no of minutes after the = hour is 2, 5 minutes is 4, 10 minutes is 8, 15 minute is 16, 30 minutes = is 32, 60 minutes is 64, 6 hourly is 128, 12:00 AM is 256 & 12:00PM is = 512. When any timestamp is in more than one category (eg: 12:00 is all = of even, 5, 15m 30m 60 minutes), the timer value is set to the largest = appropriate one. >=20 > I'm not quite following. In my case, if I want hourly data, I'd be = looking for=85 >=20 > record timestamp variable value > 1 12:00:00 temp 12.225 > 2 13:00:00 temp 13.025 >=20 > Are you saying that I could use an approach that WHILE statement? = Something like: >=20 > WHILE data_truc('hour', timestamp) =3D 12:00:00, then calulate = AVG(value)? >=20 > > > > So a request for: > > 1 minute data is select from table; > > 2 minute data is select from table where timer >=3D2 and timer !=3D15 = and timer !=3D4; > > hourly data is select from table where timer >=3D64 and timer !=3D = 15 and timer !=3D 4; > > etc > > > > 5 & 15 minute add a bit of complexity, but we gave the users what = they wanted. This has worked well for years now, & we have an internal = web (mapserver/openlayers based) application allowing users to visualise = & download their selected data - they choose from an interval pick list = & the SQL is hidden. Some extra enhancements are the automatic collation = of lat & lon gps readings into a Postgis point for each reading record, = & the automatic aggregation of daily points into daily track lines, so = the track for any selected set of dates can easily be displayed on a map = (the platforms are mobile vessels - not fixed sites) > > > > You might adapt some of these ideas for your use case? > > > > Cheers > > > > Brent Wood > > > > Programme leader: Environmental Information Delivery > > NIWA > > DDI: +64 (4) 3860529 > > ________________________________________ > > From: pgsql-general-owner@postgresql.org = [pgsql-general-owner@postgresql.org] on behalf of Kirk Wythers = [wythe001@umn.edu] > > Sent: Tuesday, February 05, 2013 5:58 AM > > To: pgsql-general@postgresql.org > > Subject: Fwd: [GENERAL] partial time stamp query > > > > Thanks. That worked great! Now I am trying to aggregate these same = fifteen minute to hourly. I have tried using date_trunk: > > > > date_trunc('hour', derived_tsoil_fifteen_min_stacked.time2), > > > > but date_truck only seems to aggriage the timestamp. I thought I = could use > > > > AVG(derived_tsoil_fifteen_min_stacked.value) > > > > in combination with date_trunk, but I still get 15 minute values, = not the hourly average from the four 15 minute records. > > > > rowid date_truck time2 site canopy plot variable name value avg > > 2010-07-07_00:00:00_b4warm_a 2010-07-07 00:00:00 1 2010-07-07 = 00:00:00 cfc closed a2 tsoil_sc tsoil_avg1_sc 21.06 21.0599994659424 > > 2010-07-07_00:15:00_b4warm_a 2010-07-07 00:00:00 1 2010-07-07 = 00:15:00 cfc closed a2 tsoil_sc tsoil_avg1_sc 20.96 20.9599990844727 > > 2010-07-07_00:30:00_b4warm_a 2010-07-07 00:00:00 1 2010-07-07 = 00:30:00 cfc closed a2 tsoil_sc tsoil_avg1_sc 20.88 20.8799991607666 > > 2010-07-07_00:45:00_b4warm_a 2010-07-07 00:00:00 1 2010-07-07 = 00:45:00 cfc closed a2 tsoil_sc tsoil_avg1_sc 20.8 20.7999992370605 > > 2010-07-07_01:00:00_b4warm_a 2010-07-07 01:00:00 1 2010-07-07 = 01:00:00 cfc closed a2 tsoil_sc tsoil_avg1_sc 20.72 20.7199993133545 > > 2010-07-07_01:15:00_b4warm_a 2010-07-07 01:00:00 1 2010-07-07 = 01:15:00 cfc closed a2 tsoil_sc tsoil_avg1_sc 20.64 20.6399993896484 > > 2010-07-07_01:30:00_b4warm_a 2010-07-07 01:00:00 1 2010-07-07 = 01:30:00 cfc closed a2 tsoil_sc tsoil_avg1_sc 20.55 20.5499992370605 > > 2010-07-07_01:45:00_b4warm_a 2010-07-07 01:00:00 1 2010-07-07 = 01:45:00 cfc closed a2 tsoil_sc tsoil_avg1_sc 20.47 20.4699993133545 > > > > I was tying to get two records out of this set, with the 'avg" = column representing the mean of the first and last four of each 15 = minute records. > > > > Perhaps date_trunk only works for the timestamp? > > > > > > > > On Feb 4, 2013, at 8:50 AM, Misa Simic = <misa.simic@gmail.com<mailto:misa.simic@gmail.com>> wrote: > > > > WHERE derived_tsoil_fifteen_min_stacked.time2::date =3D = '2010-07-07'::date > > > > On Monday, February 4, 2013, Kirk Wythers wrote: > > I am trying to write a query that grabs one particular day from a = timestamp column. The data are ordered in 15 minute chunks like this: > > > > 2010-07-07 12:45:00 > > 2010-07-07 13:00:00 > > 2010-07-07 13:15:00 > > 2010-07-07 13:30:00 > > etc=85 > > > > If I wanted all records from july 7th 2010, I would expect 4 x 24 =3D = 96 records per day. > > > > I have tried the '=3D' operator, like this > > > > WHERE derived_tsoil_fifteen_min_stacked.time2 =3D '2010-07-07*' > > > > but that grabs nothing, and using the '~' operator grabs everything = with a 2010 or 07 in it=85 in other word
В списке pgsql-general по дате отправления: