Re: grouping subsets

Поиск
Список
Период
Сортировка
От Rainer Stengele
Тема Re: grouping subsets
Дата
Msg-id 4C547E2A.2040402@diplan.de
обсуждение исходный текст
Ответ на Re: grouping subsets  ("Oliveiros d'Azevedo Cristina" <oliveiros.cristina@marktest.pt>)
Список pgsql-sql
 The analysis will have to be done over a calendar range which the user 
selects via web interface. So - yes - normally not the whole table, but
maybe he chooses one month and thats already a lot of rows in the table ...

Best, too,
Rainer

Am 30.07.2010 17:07, schrieb Oliveiros d'Azevedo Cristina:
> I See.
>
> And the analysis you need to do, the sum of the rows with the same
> keys (until they change) will have to be done over all
> table?
> Or just over some predefined interval ?
>
> Best,
> Oliveiros
> ----- Original Message ----- From: "Rainer Stengele"
> <rainer.stengele@diplan.de>
> Newsgroups: gmane.comp.db.postgresql.sql
> To: "Oliveiros d'Azevedo Cristina" <oliveiros.cristina@marktest.pt>
> Cc: <pgsql-sql@postgresql.org>
> Sent: Friday, July 30, 2010 10:35 AM
> Subject: Re: grouping subsets
>
>
>> the table may include up to maybe 30 entries per day, average maybe
>> 10-15
>> After a year this makes about 10.000 entries - maximum, average about
>> 5000 entries.
>>
>> For the problem described I have to use a Microsoft SQL database and
>> would like to use pure SQL.
>> As I use postgres on my Linux servers I found this newsgroup and
>> thought I ask here.
>>
>> Thanks!
>> Rainer
>>
>> Am 29.07.2010 15:31, schrieb Oliveiros d'Azevedo Cristina:
>>> Fine.
>>>
>>> Please advice me,
>>>
>>> How long can
>>> your table be? Thousands? Millions of records?
>>>
>>> Do you really need it in pure SQL
>>> ?
>>>
>>> It seems to me that it might be possible, I'm just affraid that the
>>> query would become too complex and thus slow...
>>>
>>> Best,
>>> Oliveiros
>>>
>>> ----- Original Message ----- From: "Rainer Stengele"
>>> <rainer.stengele@diplan.de>
>>> To: "Oliveiros d'Azevedo Cristina" <oliveiros.cristina@marktest.pt>
>>> Cc: <pgsql-sql@postgresql.org>
>>> Sent: Thursday, July 29, 2010 1:10 PM
>>> Subject: Re: grouping subsets
>>>
>>>
>>>> No. This is by accident.
>>>> We have to assume that the combinations do change anytime, and many
>>>> times per day.
>>>>
>>>> So
>>>>
>>>> "Or is it possible to have the same combination on one day with
>>>> several sets?"
>>>>
>>>> YES!
>>>>
>>>> Rainer
>>>>
>>>>
>>>>
>>>> Am 29.07.2010 13:47, schrieb Oliveiros d'Azevedo Cristina:
>>>>> Yes. This is somewhat more complicated because it has more
>>>>> constraints.
>>>>> I've noticed that a given combination doesn't appear with holes on
>>>>> a certain day.
>>>>>
>>>>> For ex, on a daily basis, we have every three key combinations
>>>>> together.
>>>>>
>>>>> We dont have things like
>>>>> 2010-7-01  1726 3212 1428
>>>>> 2010-7-01  1726 3212 1428
>>>>> ...                318   1846 1012
>>>>> 2010-7-01  1726 3212 1428
>>>>>
>>>>> Can I assume that, for a certain day , the records for the same
>>>>> three combination are all together? There is just one set per day
>>>>> for a given combination?
>>>>>
>>>>> Or is it possible to have the same combination on one day with
>>>>> several sets?
>>>>>
>>>>> Best,
>>>>> Oliveiros
>>>>>
>>>>>
>>>>> ----- Original Message ----- From: "Rainer Stengele"
>>>>> <rainer.stengele@diplan.de>
>>>>> Newsgroups: gmane.comp.db.postgresql.sql
>>>>> To: "Oliveiros d'Azevedo Cristina" <oliveiros.cristina@marktest.pt>
>>>>> Cc: <>
>>>>> Sent: Thursday, July 29, 2010 10:41 AM
>>>>> Subject: Re: grouping subsets
>>>>>
>>>>>
>>>>>> Howdy Cristina,
>>>>>>
>>>>>> unfortunately things are more complicated. I have inserted an
>>>>>> excerpt of the real data here:
>>>>>>
>>>>>> ================================================================================
>>>>>>
>>>>>> TableID         MasterID dtBegin                           dtEnd
>>>>>> idR idL      idB     consumption
>>>>>> 4057312           295530 2010-07-01 00:59:21.077          
>>>>>> 2010-07-01 01:32:59.670           1726     3212     1428     279
>>>>>> 4061043           295574 2010-07-01 01:59:31.137          
>>>>>> 2010-07-01 02:32:09.373           1726     3212     1428     183
>>>>>> 4083397           295838 2010-07-01 07:57:51.327          
>>>>>> 2010-07-01 08:28:28.117           318      1846     1012     30
>>>>>> 4090858           295920 2010-07-01 09:52:33.777          
>>>>>> 2010-07-01 10:31:34.393           318      1846     1012     487
>>>>>> 4094589           295961 2010-07-01 10:47:59.370          
>>>>>> 2010-07-01 11:32:20.903           318      1846     1012     472
>>>>>> 4098330           296013 2010-07-01 11:58:53.890          
>>>>>> 2010-07-01 12:31:35.730           318      1846     1012     195
>>>>>> 4102069           296058 2010-07-01 12:36:19.170          
>>>>>> 2010-07-01 13:32:13.950           318      1846     1012     338
>>>>>> 4105809           296102 2010-07-01 13:58:53.170          
>>>>>> 2010-07-01 14:02:57.710           318      1846     1012     105
>>>>>> 4109555           296150 2010-07-01 14:59:11.663          
>>>>>> 2010-07-01 15:32:33.810           318      1846     1012     187
>>>>>> 4113305           296194 2010-07-01 15:59:01.797          
>>>>>> 2010-07-01 16:02:27.260           318      1846     1012     108
>>>>>> 4117048           296238 2010-07-01 16:20:47.997          
>>>>>> 2010-07-01 17:32:49.367           318      1846     1012     179
>>>>>> 4120791           296282 2010-07-01 17:58:27.657          
>>>>>> 2010-07-01 18:29:01.733           318      1846     1012     256
>>>>>> 4128291           296370 2010-07-01 19:54:17.687          
>>>>>> 2010-07-01 20:32:53.850           318      1846     1012     239
>>>>>> 4132044           296413 2010-07-01 20:31:37.653          
>>>>>> 2010-07-01 21:29:13.497           318      1846     1012     39
>>>>>> 4135797           296458 2010-07-01 21:59:13.983          
>>>>>> 2010-07-01 22:32:46.503           318      1846     1012     157
>>>>>> 4139572           296506 2010-07-01 22:58:49.530          
>>>>>> 2010-07-01 23:32:22.543           318      1846     1012     218
>>>>>> 4142941           296554 2010-07-01 23:59:13.857          
>>>>>> 2010-07-02 00:32:30.390           318      1846     1012     248
>>>>>> 4146289           296598 2010-07-02 00:58:55.763          
>>>>>> 2010-07-02 01:32:41.983           318      1846     1012     204
>>>>>> 4149616           296642 2010-07-02 01:46:57.357          
>>>>>> 2010-07-02 02:32:56.983           318      1846     1012     42
>>>>>> 4152952           296686 2010-07-02 02:55:19.653          
>>>>>> 2010-07-02 03:32:28.013           318      1846     1012     135
>>>>>> 4156289           296730 2010-07-02 03:43:52.777          
>>>>>> 2010-07-02 04:32:55.250           318      1846     1012     743
>>>>>> 4159624           296774 2010-07-02 04:43:15.310          
>>>>>> 2010-07-02 05:32:44.547           318      1846     1012     277
>>>>>> 4162961           296817 2010-07-02 05:58:59.483          
>>>>>> 2010-07-02 06:32:37.340           318      1846     1012     121
>>>>>> 4166303           296862 2010-07-02 06:58:50.733          
>>>>>> 2010-07-02 07:32:39.113           318      1846     1012     239
>>>>>> 4172981           296950 2010-07-02 07:28:55.293          
>>>>>> 2010-07-02 09:33:01.200           318      1846     1012     512
>>>>>> 4176322           296993 2010-07-02 09:59:04.607          
>>>>>> 2010-07-02 10:33:01.903           318      1846     1012     139
>>>>>> 4179667           297038 2010-07-02 10:55:27.760          
>>>>>> 2010-07-02 11:32:56.560           318      1846     1012     722
>>>>>> 4183012           297082 2010-07-02 11:59:33.650          
>>>>>> 2010-07-02 12:32:14.700            318      1846     1012     163
>>>>>> 4186351           297126 2010-07-02 12:23:45.997          
>>>>>> 2010-07-02 13:32:59.500            318      1846     1012     284
>>>>>> 4189689           297169 2010-07-02 13:44:21.253          
>>>>>> 2010-07-02 14:18:05.080            318      1846     1012     254
>>>>>> 4196371           297258 2010-07-02 16:16:19.123          
>>>>>> 2010-07-02 16:32:53.437           1706     3541     1511     161
>>>>>> 4199720           297301 2010-07-02 16:59:35.127          
>>>>>> 2010-07-02 17:32:57.950           1706     3541     1511     250
>>>>>> 4203068           297346 2010-07-02 17:59:34.027          
>>>>>> 2010-07-02 18:32:54.337           1706     3541     1511     302
>>>>>> 4206413           297389 2010-07-02 18:59:28.730          
>>>>>> 2010-07-02 19:32:37.950           1706     3541     1511     276
>>>>>> 4209758           297434 2010-07-02 19:54:00.243          
>>>>>> 2010-07-02 20:32:57.433           1706     3541     1511     209
>>>>>> 4213102           297473 2010-07-02 20:49:10.963          
>>>>>> 2010-07-02 21:30:44.540           1706     3541     1511     76
>>>>>> 4216447           297511 2010-07-02 21:59:34.810          
>>>>>> 2010-07-02 22:33:00.603           1706     3541     1511     287
>>>>>> 4219818           297569 2010-07-02 22:56:52.750          
>>>>>> 2010-07-02 23:59:31.607           1706     3541     1511     1877
>>>>>> 4219819           297570 2010-07-02 23:59:21.577          
>>>>>> 2010-07-03 00:54:40.153           1706     3541     1511     1798
>>>>>> 4219821           297572 2010-07-03 00:48:03.310          
>>>>>> 2010-07-03 01:59:37.920           1706     3541     1511     1125
>>>>>> 4219823           297574 2010-07-03 01:51:01.057          
>>>>>> 2010-07-03 02:59:45.433           1706     3541     1511     1629
>>>>>> 4219820           297571 2010-07-03 02:59:29.393          
>>>>>> 2010-07-03 03:59:54.920           1706     3541     1511     2462
>>>>>> 4219822           297573 2010-07-03 03:59:18.663          
>>>>>> 2010-07-03 04:01:48.810           1706     3541     1511     70
>>>>>> 4225738           297656 2010-07-03 06:13:34.980          
>>>>>> 2010-07-03 06:28:09.697           1726     3212     1428     46
>>>>>> 4228694           297695 2010-07-03 06:59:15.560          
>>>>>> 2010-07-03 07:32:45.653           1726     3212     1428     251
>>>>>> 4231649           297733 2010-07-03 07:59:11.937          
>>>>>> 2010-07-03 08:32:57.217           1726     3212     1428     284
>>>>>> 4234604           297771 2010-07-03 08:57:00.357          
>>>>>> 2010-07-03 09:32:47.903           1726     3212     1428     227
>>>>>> 4237559           297809 2010-07-03 09:59:19.813          
>>>>>> 2010-07-03 10:33:02.063           1726     3212     1428     285
>>>>>> 4261156           298596 2010-07-04 22:59:09.863          
>>>>>> 2010-07-04 23:33:45.530           1726     3212     1428     1286
>>>>>> 4264114           298646 2010-07-04 23:59:16.967          
>>>>>> 2010-07-05 00:33:08.107           1726     3212     1428     297
>>>>>> 4267067           298690 2010-07-05 00:59:15.187          
>>>>>> 2010-07-05 01:32:48.300           1726     3212     1428     333
>>>>>> 4270023           298734 2010-07-05 01:59:02.497          
>>>>>> 2010-07-05 02:32:48.780           1726     3212     1428     270
>>>>>> 4272977           298778 2010-07-05 02:41:43.737          
>>>>>> 2010-07-05 03:32:56.043           1726     3212     1428     317
>>>>>> 4275927           298822 2010-07-05 03:59:17.027          
>>>>>> 2010-07-05 04:33:14.947           1726     3212     1428     1623
>>>>>> ================================================================================
>>>>>>
>>>>>>
>>>>>> Description:
>>>>>> 1. Column: some ID
>>>>>> 2. Column: reference to another table
>>>>>> 3. and 4. column: timestamp from/to of the item
>>>>>> 5. Column: ID R
>>>>>> 6. Column: ID L
>>>>>> 7. Column: ID B
>>>>>> 8. Column: Sum of components
>>>>>>
>>>>>> Requirement:
>>>>>> Sum over all components (from column 8) for each combination of
>>>>>> ID R, ID L, ID B, but (!)
>>>>>> rows with same keys (R,L,B) should be summed up only until the
>>>>>> keys change.
>>>>>> Do not sum up the components for identical keys, if there are
>>>>>> other keys between them.
>>>>>>
>>>>>> Example result:
>>>>>>
>>>>>> idR       idL      idB                 SUM
>>>>>> 1726     3212     1428                 462
>>>>>> 318     1846     1012                 ...
>>>>>> 1706     3541     1511                 ...
>>>>>> 1726     3212     1428                 ...
>>>>>>
>>>>>>
>>>>>> Note that the first and last entry here has the same keys
>>>>>>
>>>>>> Maybe you find a similar monster SQL solving such a requirement.
>>>>>> Thanks for considering!
>>>>>>
>>>>>> Rainer
>>>>>>
>>>>>>
>>>>>>
>>>>>> Am 27.07.2010 12:37, schrieb Oliveiros d'Azevedo Cristina:
>>>>>>> Howdy, Rainer.
>>>>>>>
>>>>>>> It's been a while, so I don't know if you are still interested
>>>>>>> in this problem or if you, in the meantime, found yourself a
>>>>>>> solution,
>>>>>>> but I've tried this on a local copy of the example you
>>>>>>> provided   and it seems to work.
>>>>>>>
>>>>>>> The problem is that I suspect that if you have several thousands
>>>>>>> of records on your table it will become slow...
>>>>>>>
>>>>>>> Best,
>>>>>>> Oliveiros
>>>>>>>
>>>>>>> SELECT SUM(tudo.parcela),tudo.a
>>>>>>> FROM
>>>>>>> (
>>>>>>> SELECT fo.parcela,fo.a,fo.b,fo.c,MIN(th.c) as d
>>>>>>> FROM
>>>>>>> (
>>>>>>> SELECT se.a as parcela,se.b as a,se.c as b, MAX(pr.c) as c
>>>>>>> FROM
>>>>>>> yourTable se
>>>>>>> LEFT JOIN
>>>>>>> (
>>>>>>> SELECT a.*
>>>>>>> FROM yourTable a
>>>>>>> JOIN yourTable b
>>>>>>> ON (b.b <> a.b)
>>>>>>> AND ((age(a.c,b.c) = '1 day'::interval)
>>>>>>>
>>>>>>> )
>>>>>>> ) pr
>>>>>>> ON pr.b = se.b
>>>>>>> AND    se.c >= pr.c
>>>>>>> GROUP BY se.a,se.b,se.c
>>>>>>> ) fo
>>>>>>> LEFT JOIN
>>>>>>> (
>>>>>>> SELECT a.*
>>>>>>> FROM yourTable a
>>>>>>> JOIN yourTable b
>>>>>>> ON (b.b <> a.b)
>>>>>>> AND ((age(a.c,b.c) = '-1 day'::interval)
>>>>>>> )
>>>>>>> ) th
>>>>>>> ON fo.a = th.b
>>>>>>> AND fo.b <= th.c
>>>>>>> GROUP BY fo.parcela,fo.a,fo.b,fo.c
>>>>>>> ) tudo
>>>>>>> GROUP BY tudo.a,tudo.c,tudo.d
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> To: <pgsql-sql@postgresql.org>
>>>>>>> Sent: Thursday, July 22, 2010 9:09 AM
>>>>>>> Subject: [SQL] grouping subsets
>>>>>>>
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> having a table similar to
>>>>>>>>
>>>>>>>> | 1 | B | [2010-07-15 Do] |
>>>>>>>> | 1 | B | [2010-07-16 Fr] |
>>>>>>>> |---+---+-----------------|
>>>>>>>> | 2 | C | [2010-07-17 Sa] |
>>>>>>>> | 2 | C | [2010-07-18 So] |
>>>>>>>> |---+---+-----------------|
>>>>>>>> | 1 | B | [2010-07-19 Mo] |
>>>>>>>> | 1 | B | [2010-07-20 Di] |
>>>>>>>> | 1 | B | [2010-07-21 Mi] |
>>>>>>>> | 1 | B | [2010-07-22 Do] |
>>>>>>>> |---+---+-----------------|
>>>>>>>> | 3 | D | [2010-07-23 Fr] |
>>>>>>>>
>>>>>>>> a simple group by gives me:
>>>>>>>>
>>>>>>>> | 6 | B |
>>>>>>>> | 4 | C |
>>>>>>>> | 3 | D |
>>>>>>>>
>>>>>>>>
>>>>>>>> What I want to get is the values grouped by "subset", where a
>>>>>>>> subset is a set of rows with identical column until the colum
>>>>>>>> changes.
>>>>>>>> Is there a way to get
>>>>>>>>
>>>>>>>> | 2 | B |
>>>>>>>> | 4 | C |
>>>>>>>> | 4 | B |
>>>>>>>> | 3 | D |
>>>>>>>>
>>>>>>>> by SQL only?
>>>>>>>>
>>>>>>>> - Rainer
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> -- 
>>>>>>>> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
>>>>>>>> To make changes to your subscription:
>>>>>>>> http://www.postgresql.org/mailpref/pgsql-sql
>>>>>>>
>>>>>>>
>>>>>
>>>
>>>
>



В списке pgsql-sql по дате отправления:

Предыдущее
От: Joshua Tolley
Дата:
Сообщение: Re: grouping subsets
Следующее
От: Viktor Bojović
Дата:
Сообщение: aggregate function