Обсуждение: Help with queries.

Поиск
Список
Период
Сортировка

Help with queries.

От
Adrian Johnson
Дата:
Dear group,

I just started learning postgres and I have to analyze my data.  So
please bear with me, for all the simple questions that I am asking.
Apologies.


I have a table:

snps table

id   |   sample_id | chromosome | from  | to |
----------------------------------------------------------------
1         1                chr1             10       11
2          1              chr1               14       15
3          2              chr1               14       15
4         2               chr1                9         10
5         3              chr1                14        15
6          3             chr14              35        36
7          3             chr14              39        40
8          2             chr14              39        40
9          2             chr14              43        44
10        1             chr14              43        44


gene table:

id |   chromosome | from | to  | genename
1         chr1            4       20    SRC
2         chr1           25      45    SRC
3         chr1           80      100   CSK
4         chr1          120     140   CSK

My aim is to for a gene in gene table (SRC that has two ranges 4-20 and 25-45),
I want to map all coordinates in snps table.
Then I want to get those entries where sample 1 and sample 2 have in
common and sample 1 and sampl3 in common.


I do not know how to get these entries. Could any one help me please.

thank you.

adrian


Re: Help with queries.

От
"Oliveiros d'Azevedo Cristina"
Дата:
Howdy, Adrian,
Please see my questions below

> snps table
>
> id   |   sample_id | chromosome | from  | to |
> ----------------------------------------------------------------
> 1         1                chr1             10       11
> 2          1              chr1               14       15
> 3          2              chr1               14       15
> 4         2               chr1                9         10
> 5         3              chr1                14        15
> 6          3             chr14              35        36
> 7          3             chr14              39        40
> 8          2             chr14              39        40
> 9          2             chr14              43        44
> 10        1             chr14              43        44
>
>
> gene table:
>
> id |   chromosome | from | to  | genename
> 1         chr1            4       20    SRC
> 2         chr1           25      45    SRC
> 3         chr1           80      100   CSK
> 4         chr1          120     140   CSK
>
> My aim is to for a gene in gene table (SRC that has two ranges 4-20 and 
> 25-45),
> I want to map all coordinates in snps table.

Please advice me,
What do you call coordinates to, exactly? Sorry, my genetics knowledge is 
very poor.
You call the from and to fields coordinates?
Do they have the same meaning as the from and to fields from table snps ?
What does it mean to map all coordinates in table snps?
Do you wanna know which  entries from snps map in the SRC gene?
Coordinates are the segment where the gene/sample fit in the chromosome? Is 
my understanding correct ?

> Then I want to get those entries where sample 1 and sample 2 have in
> common and sample 1 and sampl3 in common.
>

Could you give an example of your desired output, for the data you have ?

>
> I do not know how to get these entries. Could any one help me please.
>
> thank you.
>
> adrian
>

Best,
Oliver

> -- 
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql 



Re: Help with queries.

От
Adrian Johnson
Дата:
Hello Oliver.
thanks for your reply. Here are my answers. Sorry I shot e-mail in
morning and I was not clear about it.
I am afraid, I am not jamming you with all information.

Thank you very much for your help.


====================================== your questions
====================================


First, answers to questions you asked:



> Please advice me,
> What do you call coordinates to, exactly? Sorry, my genetics knowledge is
> very poor.
== I call coordinates cto - crom

> You call the from and to fields coordinates?
==yes

> Do they have the same meaning as the from and to fields from table snps ?
==  cfrom - cto fields have yes.


> What does it mean to map all coordinates in table snps?
== Means I want to take  snps.cfrom and snps.cto and gene.gene by
checking if they are in range of gene.cfrom and gene.cto from gene
table.

> Do you wanna know which  entries from snps map in the SRC gene?
== yes and more than that.

> Coordinates are the segment where the gene/sample fit in the chromosome? Is
> my understanding correct ?
==  yes I am calling   on  "chr19"  (cfrom - cto)  (38162736 -
38163007) are coordinates.



=================================== illustration of example
==========================
I know basic queries, but I am finding it difficult to get what I am
describing below.
I am illustrating an example here .



Here I query for a gene RHPN2 from gene table:
 gid  | chromosome |  cfrom   |   cto    | frame | gene
-------+------------+----------+----------+-------+-------93538 | chr19      | 38162736 | 38163007 | -     | RHPN293540
|chr19      | 38173254 | 38173420 | -     | RHPN293541 | chr19      | 38174563 | 38174720 | -     | RHPN293542 | chr19
   | 38176717 | 38176804 | -     | RHPN293543 | chr19      | 38178766 | 38178971 | -     | RHPN293544 | chr19      |
38182326| 38182456 | -     | RHPN293545 | chr19      | 38184987 | 38185154 | -     | RHPN293546 | chr19      | 38185553
|38185751 | -     | RHPN293547 | chr19      | 38190754 | 38190931 | -     | RHPN293548 | chr19      | 38194419 |
38194554| -     | RHPN293549 | chr19      | 38195387 | 38195475 | -     | RHPN293551 | chr19      | 38204311 | 38204397
|-     | RHPN293553 | chr19      | 38209244 | 38209383 | -     | RHPN293555 | chr19      | 38226989 | 38227115 | -
|RHPN293556 | chr19      | 38247524 | 38247603 | -     | RHPN2
 







Now taking:cto = 38162736 first row in above tablecfrom = 38247603 last row in above table
I want to get all those entries from snp table


# select sample_id, chromosome,cfrom, cto, refbase, consbase from snps
where cto > 38162736 and cfrom < 38247603 and chromosome = 'chr19' ;
sample_id | chromosome |  cfrom   |   cto    | refbase | consbase
-----------+------------+----------+----------+---------+----------        2 | chr19      | 38178828 | 38178829 | C
 | Y        5 | chr19      | 38182405 | 38182406 | T        | Y        5 | chr19      | 38182424 | 38182425 | G       |
R       5 | chr19      | 38185101 | 38185102 | C       | Y        3 | chr19      | 38182424 | 38182425 | G       | R
   1 | chr19      | 38178828 | 38178829 | C       | Y        1 | chr19      | 38182424 | 38182425 | G       | R
7| chr19      | 38185101 | 38185102 | C       | Y        6 | chr19      | 38185101 | 38185102 | C       | Y        4 |
chr19     | 38178828 | 38178829 | C       | Y        4 | chr19      | 38182424 | 38182425 | G       | R
 

From this result, we get that all samples range from 1 to 7 (sample_id).




Now, from above result (snps table) I want to filter:
1. Those that have any letter in consbase other than ATGC (although in
this case there are no A or T or G or C they are there)
2. Those entries that have sample_id 1, 2 and 3.
3. Unique entries that have cfrom and cto common to 1 and 2  and 1 and
3 but not 2 and 3.

For example see below:  (say this is result X)gene  | sample_id | chromosome |  cfrom   |   cto
-------+-----------+------------+----------+----------+---------+----------RHPN2 |         2 | chr19      | 38178828 |
38178829RHPN2|         1 | chr19      | 38178828 | 38178829RHPN2 |         3 | chr19      | 38182424 | 38182425RHPN2 |
      1 | chr19      | 38182424 | 38182425
 

Here for gene RHPN2:
cfrom - cto (38178828 | 38178829) is common to samples 1 and 2
cfrom - cto ( 38182424| 38182425) is common to samples 1 and 3.

But both these samples 1 and 2   and  1 and 3 belong same gene (RHPN2 ).


Now I  have another table coverage, where  for sample positions above,
I have the following data:
(say this is result y)sample_id | chromosome |  cfrom   |   cto    | abase | tbase | gbase | cbase
-----------+------------+----------+----------+-------+-------+-------+-------        2 | chr19      | 38178828 |
38178829|     0 |    29 |     2 |    44        1 | chr19      | 38178828 | 38178829 |     0 |    52 |     0 |    32
  3 | chr19      | 38178828 | 38178829 |     0 |     0 |     0
 
|    28        1 | chr19      | 38182424 | 38182425 |    14 |     0 |    54 |     0        2 | chr19      | 38182424 |
38182425|    17 |     0 |    55 |     0        3 | chr19      | 38182424 | 38182425 |    13 |     0 |    26 |     0
 


Now from table snps, I want to get results that satisfy conditions
that gave result X  and combine this with results Y.

How is it possible to do this?

Thank you for your help.

Adrian.