Обсуждение: Can null values be sorted low?

Поиск
Список
Период
Сортировка

Can null values be sorted low?

От
ann hedley
Дата:
The problem...
 From the table below I want to select distinct clus_id and the data in
3 db columns, even if that data is null.

When there are multiple contigs for a clus_id I want to select the row
where 1st, the most db columns have a value, 2nd the total value of the
db columns is highest.

So for
LRC00006 I want contig 3,
LRC00010 I want contig 4, because it has 3 db values (even though contig
2 would total more)
LRC00001 I want contig 2, because it would total more if the null value
was treated as zero in the calculation
and for the rest I want contig 1.

Any hints on writing such a query would be much appreciated.

lumbribase=# select clus_id,contig,db0,db1,db4,(db0+db1+db4) as total
from venn order by clus_id,total DESC;
 clus_id  | contig | db0  | db1  | db4  | total
----------+--------+------+------+------+-------
 LRC00001 |      1 | 77.4 |      | 63.9 |
 LRC00001 |      2 | 77.4 |      | 82.7 |
 LRC00002 |      1 |  325 |  343 |  313 |   981
 LRC00003 |      1 |      |      |      |
 LRC00004 |      1 |      |      |      |
 LRC00005 |      1 |  294 |  294 |  116 |   704
 LRC00006 |      1 |  100 | 72.8 |      |
 LRC00006 |      3 |  120 |  122 | 63.9 | 305.9
 LRC00006 |      2 |  117 |  112 | 58.5 | 287.5
 LRC00007 |      1 |  178 |      |      |
 LRC00008 |      1 |      |      |      |
 LRC00009 |      1 |  416 |      |      |
 LRC00010 |      2 |  324 |  167 |     |
 LRC00010 |      4 |  146 |  168 |  172 |   486
 LRC00010 |      1 |  146 |  166 |  171 |   483
 LRC00010 |      3 |  145 |  160 |  159 |   464
 LRC00011 |      1 |  179 |  100 | 95.1 | 374.1
 LRC00012 |      1 |  639 |  639 |  633 |  1911
 LRC00012 |      3 |  505 |  509 |  508 |  1522
 LRC00012 |      2 |  390 |  391 |  392 |  1173



--
Ann

"In a world without walls and fences - who needs Windows and Gates ?"
                                                           (unknown)



Re: Can null values be sorted low?

От
Jeff Eckermann
Дата:
--- ann hedley <ann.hedley@ed.ac.uk> wrote:
> The problem...
>  From the table below I want to select distinct
> clus_id and the data in
> 3 db columns, even if that data is null.
>
> When there are multiple contigs for a clus_id I want
> to select the row
> where 1st, the most db columns have a value, 2nd the
> total value of the
> db columns is highest.
>
> So for
> LRC00006 I want contig 3,
> LRC00010 I want contig 4, because it has 3 db values
> (even though contig
> 2 would total more)
> LRC00001 I want contig 2, because it would total
> more if the null value
> was treated as zero in the calculation
> and for the rest I want contig 1.
>
> Any hints on writing such a query would be much

If you don't mind using a PostgreSQL-only extension,
"SELECT DISTINCT ON" may do what you want: check the
"SELECT" page in the "SQL Commands" section of the
manual.

You will also want to use "coalesce" to replace null
values, e.g. "select coalesce(db0,0)"

> appreciated.
>
> lumbribase=# select
> clus_id,contig,db0,db1,db4,(db0+db1+db4) as total
> from venn order by clus_id,total DESC;
>  clus_id  | contig | db0  | db1  | db4  | total
> ----------+--------+------+------+------+-------
>  LRC00001 |      1 | 77.4 |      | 63.9 |
>  LRC00001 |      2 | 77.4 |      | 82.7 |
>  LRC00002 |      1 |  325 |  343 |  313 |   981
>  LRC00003 |      1 |      |      |      |
>  LRC00004 |      1 |      |      |      |
>  LRC00005 |      1 |  294 |  294 |  116 |   704
>  LRC00006 |      1 |  100 | 72.8 |      |
>  LRC00006 |      3 |  120 |  122 | 63.9 | 305.9
>  LRC00006 |      2 |  117 |  112 | 58.5 | 287.5
>  LRC00007 |      1 |  178 |      |      |
>  LRC00008 |      1 |      |      |      |
>  LRC00009 |      1 |  416 |      |      |
>  LRC00010 |      2 |  324 |  167 |     |
>  LRC00010 |      4 |  146 |  168 |  172 |   486
>  LRC00010 |      1 |  146 |  166 |  171 |   483
>  LRC00010 |      3 |  145 |  160 |  159 |   464
>  LRC00011 |      1 |  179 |  100 | 95.1 | 374.1
>  LRC00012 |      1 |  639 |  639 |  633 |  1911
>  LRC00012 |      3 |  505 |  509 |  508 |  1522
>  LRC00012 |      2 |  390 |  391 |  392 |  1173
>
>
>
> --
> Ann
>
> "In a world without walls and fences - who needs
> Windows and Gates ?"
>
>       (unknown)
>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
> majordomo@postgresql.org
>



__________________________________
Yahoo! Mail Mobile
Take Yahoo! Mail with you! Check email on your mobile phone.
http://mobile.yahoo.com/learn/mail

Re: Can null values be sorted low?

От
ann hedley
Дата:
OK

If your interested this is my solution to the problem. I'm sure there
are more efficient ways but this way works and that's what's important
at the moment.

While looking up Jeff's tips I discovered the CASE clause and used it to
generate two columns, a count of the dbs and a total across the dbs
avoiding the null values. I could then order these to get the required
row first.

SELECT distinct on (clus_id) clus_id,contig,db0,db1,db4,
 CASE
  WHEN db0 is not null and
       db1 is not null and
       db4 is not null
  THEN 3
  WHEN db0 is not null and
       db1 is not null
  THEN 2
  WHEN db0 is not null and
       db4 is not null
  THEN 2
  WHEN db1 is not null and
       db4 is not null
  THEN 2
 END as db_count,
 CASE
  WHEN db0 is not null and
       db1 is not null and
       db4 is not null
  THEN (db0+db1+db4)
  WHEN db0 is not null and
       db1 is not null
  THEN (db0+db1)
  WHEN db0 is not null and
       db4 is not null
  THEN (db0+db4)
  WHEN db1 is not null and
       db4 is not null
  THEN (db0+db1)
  WHEN db0 is not null
  THEN db0
  WHEN db1 is not null
  THEN db1
  WHEN db4 is not null
  THEN db4
 END as total
from venn order by clus_id,db_count DESC,total DESC;

 clus_id  | contig | db0    | db1  | db4  | db_count | total
----------+--------+---------+------+------+----------+-------
 LRC00001 |      2 | 77.4 |         | 82.7 |          2 | 160.1
 LRC00002 |      1 |  325 |  343 |  313 |          3 |   981
 LRC00003 |      1 |         |         |         |             |
 LRC00004 |      1 |         |         |         |             |
 LRC00005 |      1 |  294 |  294 |  116 |          3 |   704
 LRC00006 |      3 |  120 |  122 | 63.9 |          3 | 305.9
 LRC00007 |      1 |  178 |         |         |             |   178
 LRC00008 |      1 |         |         |         |             |
 LRC00009 |      1 |  416 |         |         |             |   416
 LRC00010 |      4 |  146 |  168 |  172 |          3 |   486
 LRC00011 |      1 |  179 |  100 | 95.1 |          3 | 374.1
 LRC00012 |      1 |  639 |  639 |  633 |          3 |  1911


Jeff Eckermann wrote:

>--- ann hedley <ann.hedley@ed.ac.uk> wrote:
>
>
>>The problem...
>> From the table below I want to select distinct
>>clus_id and the data in
>>3 db columns, even if that data is null.
>>
>>When there are multiple contigs for a clus_id I want
>>to select the row
>>where 1st, the most db columns have a value, 2nd the
>>total value of the
>>db columns is highest.
>>
>>So for
>>LRC00006 I want contig 3,
>>LRC00010 I want contig 4, because it has 3 db values
>>(even though contig
>>2 would total more)
>>LRC00001 I want contig 2, because it would total
>>more if the null value
>>was treated as zero in the calculation
>>and for the rest I want contig 1.
>>
>>Any hints on writing such a query would be much
>>
>>
>
>If you don't mind using a PostgreSQL-only extension,
>"SELECT DISTINCT ON" may do what you want: check the
>"SELECT" page in the "SQL Commands" section of the
>manual.
>
>You will also want to use "coalesce" to replace null
>values, e.g. "select coalesce(db0,0)"
>
>
>
>>appreciated.
>>
>>lumbribase=# select
>>clus_id,contig,db0,db1,db4,(db0+db1+db4) as total
>>from venn order by clus_id,total DESC;
>> clus_id  | contig | db0  | db1  | db4  | total
>>----------+--------+------+------+------+-------
>> LRC00001 |      1 | 77.4 |      | 63.9 |
>> LRC00001 |      2 | 77.4 |      | 82.7 |
>> LRC00002 |      1 |  325 |  343 |  313 |   981
>> LRC00003 |      1 |      |      |      |
>> LRC00004 |      1 |      |      |      |
>> LRC00005 |      1 |  294 |  294 |  116 |   704
>> LRC00006 |      1 |  100 | 72.8 |      |
>> LRC00006 |      3 |  120 |  122 | 63.9 | 305.9
>> LRC00006 |      2 |  117 |  112 | 58.5 | 287.5
>> LRC00007 |      1 |  178 |      |      |
>> LRC00008 |      1 |      |      |      |
>> LRC00009 |      1 |  416 |      |      |
>> LRC00010 |      2 |  324 |  167 |     |
>> LRC00010 |      4 |  146 |  168 |  172 |   486
>> LRC00010 |      1 |  146 |  166 |  171 |   483
>> LRC00010 |      3 |  145 |  160 |  159 |   464
>> LRC00011 |      1 |  179 |  100 | 95.1 | 374.1
>> LRC00012 |      1 |  639 |  639 |  633 |  1911
>> LRC00012 |      3 |  505 |  509 |  508 |  1522
>> LRC00012 |      2 |  390 |  391 |  392 |  1173
>>
>>
>>
>>--
>>Ann
>>
>>"In a world without walls and fences - who needs
>>Windows and Gates ?"
>>
>>      (unknown)
>>
>>
>>
>>---------------------------(end of
>>broadcast)---------------------------
>>TIP 1: subscribe and unsubscribe commands go to
>>majordomo@postgresql.org
>>
>>
>>
>
>
>
>__________________________________
>Yahoo! Mail Mobile
>Take Yahoo! Mail with you! Check email on your mobile phone.
>http://mobile.yahoo.com/learn/mail
>
>---------------------------(end of broadcast)---------------------------
>TIP 9: the planner will ignore your desire to choose an index scan if your
>      joining column's datatypes do not match
>
>

--
Ann

"In a world without walls and fences - who needs Windows and Gates ?"
                                                           (unknown)