Обсуждение: Organization of tables
Hi,
I have a question regarding the organization of a table I want to
create in my database: I have the following setup:
Table Fragments (simplified example):
Primary key = {mass}
Approach (1)
fragment | mass (of fragment) | peptide
---------------+--------------------------------+--------------
A | x | Peptide1, Peptide2
Q | y | Peptide1
K | z | Peptide 2, Peptide3
The idea here is that a peptide can be composed of many fragments e.g.
Peptide 2 is made up of fragments A and K; Peptide1 is made up of A
and Q and so on.
My idea is to create an index on the mass column and be able to
retrieve all Peptides that contain a certain fragment mass e.g
SELECT peptide
FROM Fragments
WHERE mass = x;
Should give me: Peptide1, Peptide2
The alternative way I have thought of to organize this table is to
have something as follows:
Approach (2)
Primary Key = {fragment, mass, peptide}
fragment | mass (of fragment) | peptide
---------------+--------------------------------+--------------
A | x | Peptide1
A | x | Peptide2
Q | y | Peptide1
K | z | Peptide 2
K | z | Peptide 3
If I consider 2500 unique fragments then, using approach (1), table
Fragments will hold 2,500 tuples. If I consider the same number of
fragments then table Fragments using approach 2 holds 15,000 tuples.
I have considered using approach (1) whereby I would have less tuples
to search but if I wanted to access the peptides they belong to I
would retrieve the list of corresponding peptides e.g the string
"Peptide1, Peptide2" and process it in my program. However this seems
like a hack around the way a database table should be organised. The
problem increases further when I have to scale up and consider more
unique fragments (>2500).
Any help on how best to structure such data would be mostly appreciated.
- Salman Tahir
On 6/14/07, Salman Tahir <salmantahir1@gmail.com> wrote: > Any help on how best to structure such data would be mostly appreciated. See: http://en.wikipedia.org/wiki/Database_normalization *** Grossly oversimplified example follows *** CREATE TABLE PEPTIDE( NAME TEXT PRIMARY KEY ); CREATE TABLE FRAGMENT( NAME TEXT PRIMARY KEY , MASS TEXT ); CREATE TABLE PEPTIDE_FRAGMENT( FRAGMENT TEXT NOT NULL REFERENCES FRAGMENT(NAME) , PEPTIDE TEXT NOT NULL REFERENCES PEPTIDE(NAME) ); INSERT INTO PEPTIDE VALUES ('Peptide 1'),('Peptide 2'),('Peptide 3'); INSERT INTO FRAGMENT VALUES ('A','x'),('Q','y'),('K','z'); INSERT INTO PEPTIDE_FRAGMENT VALUES ('A','Peptide 1'),('A','Peptide 2'),('Q','Peptide 1') ,('K','Peptide 2'),('K','Peptide 3'); SELECT F.NAME AS FRAGMENT, F.MASS , (SELECT ARRAY_TO_STRING(ARRAY( SELECT PEPTIDE FROM PEPTIDE_FRAGMENT WHERE FRAGMENT = F.NAME ORDER BY NAME ), ','))AS PEPTIDE FROM FRAGMENT F; fragment | mass | peptide ----------+------+---------------------A | x | Peptide 1,Peptide 2Q | y | Peptide 1K | z |Peptide 2,Peptide 3
Hi Salman,
will this achieve your needs?
peptide_table
primary_key
name (ie, Peptide1,
Peptide2)
mass
fragment_table
primary_key
name (ie, A, Q,
K)
link_table
primary_key
peptide_id
fragment_id
it gives you easy access to peptide mass and it allows you to link multiple fragments to individual peptides.
if the peptide mass ultimately comes form the fragments, you could enter the mass value there and then have the db do a
calculationto sum up the masses for a given peptide.
sorry if i've misunderstood you problem.
best of luck,
oe1
----- Original Message ----
From: Salman Tahir <salmantahir1@gmail.com>
To: pgsql-sql@postgresql.org
Sent: Thursday, June 14, 2007 4:21:06 AM
Subject: [SQL] Organization of tables
Hi,
I have a question regarding the organization of a table I want to
create in my database: I have the following setup:
Table Fragments (simplified example):
Primary key = {mass}
Approach (1)
fragment | mass (of fragment) | peptide
---------------+--------------------------------+--------------
A | x | Peptide1, Peptide2
Q | y | Peptide1
K | z | Peptide 2, Peptide3
The idea here is that a peptide can be composed of many fragments e.g.
Peptide 2 is made up of fragments A and K; Peptide1 is made up of A
and Q and so on.
My idea is to create an index on the mass column and be able to
retrieve all Peptides that contain a certain fragment mass e.g
SELECT peptide
FROM Fragments
WHERE mass = x;
Should give me: Peptide1, Peptide2
The alternative way I have thought of to organize this table is to
have something as follows:
Approach (2)
Primary Key = {fragment, mass, peptide}
fragment | mass (of fragment) | peptide
---------------+--------------------------------+--------------
A | x | Peptide1
A | x | Peptide2
Q | y | Peptide1
K | z | Peptide 2
K | z | Peptide 3
If I consider 2500 unique fragments then, using approach (1), table
Fragments will hold 2,500 tuples. If I consider the same number of
fragments then table Fragments using approach 2 holds 15,000 tuples.
I have considered using approach (1) whereby I would have less tuples
to search but if I wanted to access the peptides they belong to I
would retrieve the list of corresponding peptides e.g the string
"Peptide1, Peptide2" and process it in my program. However this seems
like a hack around the way a database table should be organised. The
problem increases further when I have to scale up and consider more
unique fragments (>2500).
Any help on how best to structure such data would be mostly appreciated.
- Salman Tahir
---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faq
____________________________________________________________________________________
TV dinner still cooling?
Check out "Tonight's Picks" on Yahoo! TV.
http://tv.yahoo.com/