Обсуждение: SQL group by help

Поиск
Список
Период
Сортировка

SQL group by help

От
Chris Stephens
Дата:
I'm trying to create a visual representation of a 6x8 grid of samples on a rack using the following SQL format:

with rack_display as (
    select sr.ts rack_ts
        , sr.rack_id
        , r.rack_barcode
        , 1 as row_pos
        , max(case when rack_well = 0 then 'A1: '||sample_barcode end) as col1
        , max(case when rack_well = 1 then 'A2: '||sample_barcode end) as col2
        , max(case when rack_well = 2 then 'A3: '||sample_barcode end) as col3
        , max(case when rack_well = 3 then 'A4: '||sample_barcode end) as col4
        , max(case when rack_well = 4 then 'A5: '||sample_barcode end) as col5
        , max(case when rack_well = 5 then 'A6: '||sample_barcode end) as col6
        from rack r
                        , sample_rack sr
            , sample s
        where r.rack_id = sr.rack_id
           and sr.sample_id = s.sample_id
           and sr.rack_well < 6
        group by sr.ts, sr.rack_id, r.rack_barcode, row_pos
    union all
    select sr.ts rack_ts
        , sr.rack_id
        , r.rack_barcode
        , 2 as row_pos
        , max(case when rack_well = 6 then 'B1: '||sample_barcode end) as col1
        , max(case when rack_well = 7 then 'B2: '||sample_barcode end) as col2
        , max(case when rack_well = 8 then 'B3: '||sample_barcode end) as col3
        , max(case when rack_well = 9 then 'B4: '||sample_barcode end) as col4
        , max(case when rack_well = 10 then 'B5: '||sample_barcode end) as col5
        , max(case when rack_well = 11 then 'B6: '||sample_barcode end) as col6
        from rack r
                        , sample_rack sr
            , sample s
        where r.rack_id = sr.rack_id
           and sr.sample_id = s.sample_id
           and sr.rack_well >= 6
           and sr.rack_well < 12
        group by sr.ts, sr.rack_id, r.rack_barcode, row_pos
    union all
...
)
select * from rack_display order by rack_ts, rack_id, row_pos;

the "union all"s continue for another 6 blocks. reports would filter on rack_id and timestamp. 

if timestamps for each load of a rack were guaranteed to be the same, this would work. however, the "sr.ts" values may vary by a few seconds so there is potential for the "group by" to break. ts differences will be a minimum of 5 minutes for each distinct load of a rack. 

what i think i need is to manufacture a group by column based off rows in "sample_rack" that have "ts" values that are < 1 minute from each other and rack_id is the same.  i'm coming up blank on how to accomplish that though. my first thought was to create an interval of +/- 1 min then find all rows that overlap and assign a group number but i'm not sure how to accomplish that. 

there's also no guarantee an entire rack is full of samples so some "cells" of display might be null. i think that makes the use of tablefunc crosstab a little harder. if i remember correctly, it does not handle missing values well. i'm open to any pivoting strategy.

anyways, i thought i'd reach out for ideas while i do my own digging.

thanks for any input! let me know if i'm not being clear on the problem and desired outcome.


Re: SQL group by help

От
Rob Sargent
Дата:
What

> On Dec 11, 2020, at 10:24 AM, Chris Stephens <cstephens16@gmail.com> wrote:
>
> 
> I'm trying to create a visual representation of a 6x8 grid of samples on a rack using the following SQL format:
>
> with rack_display as (
>     select sr.ts rack_ts
>         , sr.rack_id
>         , r.rack_barcode
>         , 1 as row_pos
>         , max(case when rack_well = 0 then 'A1: '||sample_barcode end) as col1
>         , max(case when rack_well = 1 then 'A2: '||sample_barcode end) as col2
>         , max(case when rack_well = 2 then 'A3: '||sample_barcode end) as col3
>         , max(case when rack_well = 3 then 'A4: '||sample_barcode end) as col4
>         , max(case when rack_well = 4 then 'A5: '||sample_barcode end) as col5
>         , max(case when rack_well = 5 then 'A6: '||sample_barcode end) as col6
>         from rack r
>                         , sample_rack sr
>             , sample s
>         where r.rack_id = sr.rack_id
>            and sr.sample_id = s.sample_id
>            and sr.rack_well < 6
>         group by sr.ts, sr.rack_id, r.rack_barcode, row_pos
>     union all
>     select sr.ts rack_ts
>         , sr.rack_id
>         , r.rack_barcode
>         , 2 as row_pos
>         , max(case when rack_well = 6 then 'B1: '||sample_barcode end) as col1
>         , max(case when rack_well = 7 then 'B2: '||sample_barcode end) as col2
>         , max(case when rack_well = 8 then 'B3: '||sample_barcode end) as col3
>         , max(case when rack_well = 9 then 'B4: '||sample_barcode end) as col4
>         , max(case when rack_well = 10 then 'B5: '||sample_barcode end) as col5
>         , max(case when rack_well = 11 then 'B6: '||sample_barcode end) as col6
>         from rack r
>                         , sample_rack sr
>             , sample s
>         where r.rack_id = sr.rack_id
>            and sr.sample_id = s.sample_id
>            and sr.rack_well >= 6
>            and sr.rack_well < 12
>         group by sr.ts, sr.rack_id, r.rack_barcode, row_pos
>     union all
> ...
> )
> select * from rack_display order by rack_ts, rack_id, row_pos;
>
> the "union all"s continue for another 6 blocks. reports would filter on rack_id and timestamp.
>
> if timestamps for each load of a rack were guaranteed to be the same, this would work. however, the "sr.ts" values
mayvary by a few seconds so there is potential for the "group by" to break. ts differences will be a minimum of 5
minutesfor each distinct load of a rack.  
>
> what i think i need is to manufacture a group by column based off rows in "sample_rack" that have "ts" values that
are< 1 minute from each other and rack_id is the same.  i'm coming up blank on how to accomplish that though. my first
thoughtwas to create an interval of +/- 1 min then find all rows that overlap and assign a group number but i'm not
surehow to accomplish that.  
>
> there's also no guarantee an entire rack is full of samples so some "cells" of display might be null. i think that
makesthe use of tablefunc crosstab a little harder. if i remember correctly, it does not handle missing values well.
i'mopen to any pivoting strategy. 
>
> anyways, i thought i'd reach out for ideas while i do my own digging.
>
> thanks for any input! let me know if i'm not being clear on the problem and desired outcome.
>
>

What stack are you usinglr. I would hope you could hand data off to front end which should have tools necessary


Re: SQL group by help

От
Alban Hertroys
Дата:
> On 11 Dec 2020, at 18:24, Chris Stephens <cstephens16@gmail.com> wrote:
>
> I'm trying to create a visual representation of a 6x8 grid of samples on a rack using the following SQL format:
>
> with rack_display as (
>     select sr.ts rack_ts
>         , sr.rack_id
>         , r.rack_barcode
>         , 1 as row_pos
>         , max(case when rack_well = 0 then 'A1: '||sample_barcode end) as col1
>         , max(case when rack_well = 1 then 'A2: '||sample_barcode end) as col2
>         , max(case when rack_well = 2 then 'A3: '||sample_barcode end) as col3
>         , max(case when rack_well = 3 then 'A4: '||sample_barcode end) as col4
>         , max(case when rack_well = 4 then 'A5: '||sample_barcode end) as col5
>         , max(case when rack_well = 5 then 'A6: '||sample_barcode end) as col6
>         from rack r
>                         , sample_rack sr
>             , sample s
>         where r.rack_id = sr.rack_id
>            and sr.sample_id = s.sample_id
>            and sr.rack_well < 6
>         group by sr.ts, sr.rack_id, r.rack_barcode, row_pos
>     union all
>     select sr.ts rack_ts
>         , sr.rack_id
>         , r.rack_barcode
>         , 2 as row_pos
>         , max(case when rack_well = 6 then 'B1: '||sample_barcode end) as col1
>         , max(case when rack_well = 7 then 'B2: '||sample_barcode end) as col2
>         , max(case when rack_well = 8 then 'B3: '||sample_barcode end) as col3
>         , max(case when rack_well = 9 then 'B4: '||sample_barcode end) as col4
>         , max(case when rack_well = 10 then 'B5: '||sample_barcode end) as col5
>         , max(case when rack_well = 11 then 'B6: '||sample_barcode end) as col6
>         from rack r
>                         , sample_rack sr
>             , sample s
>         where r.rack_id = sr.rack_id
>            and sr.sample_id = s.sample_id
>            and sr.rack_well >= 6
>            and sr.rack_well < 12
>         group by sr.ts, sr.rack_id, r.rack_barcode, row_pos
>     union all
> ...
> )
> select * from rack_display order by rack_ts, rack_id, row_pos;
>
> the "union all"s continue for another 6 blocks. reports would filter on rack_id and timestamp.

Is time really what groups these batches? I would double-check whether you may be omitting to store some data relevant
tothis process. 

> if timestamps for each load of a rack were guaranteed to be the same, this would work. however, the "sr.ts" values
mayvary by a few seconds so there is potential for the "group by" to break. ts differences will be a minimum of 5
minutesfor each distinct load of a rack.  
>
> what i think i need is to manufacture a group by column based off rows in "sample_rack" that have "ts" values that
are< 1 minute from each other and rack_id is the same.  i'm coming up blank on how to accomplish that though. my first
thoughtwas to create an interval of +/- 1 min then find all rows that overlap and assign a group number but i'm not
surehow to accomplish that.  

You could date_trunc those timestamps to the minute and group on that.

> there's also no guarantee an entire rack is full of samples so some "cells" of display might be null. i think that
makesthe use of tablefunc crosstab a little harder. if i remember correctly, it does not handle missing values well.
i'mopen to any pivoting strategy. 

Many reporting tools have features to support just that. We use WebFOCUS, which calls those ACROSS columns. It’s a
commonrequirement in reporting. 


Alban Hertroys
--
There is always an exception to always.