Обсуждение: Organization of tables
Hi, I have a question regarding the organization of a table I want to create in my database: I have the following setup: Table Fragments (simplified example): Primary key = {mass} Approach (1) fragment | mass (of fragment) | peptide ---------------+--------------------------------+-------------- A | x | Peptide1, Peptide2 Q | y | Peptide1 K | z | Peptide 2, Peptide3 The idea here is that a peptide can be composed of many fragments e.g. Peptide 2 is made up of fragments A and K; Peptide1 is made up of A and Q and so on. My idea is to create an index on the mass column and be able to retrieve all Peptides that contain a certain fragment mass e.g SELECT peptide FROM Fragments WHERE mass = x; Should give me: Peptide1, Peptide2 The alternative way I have thought of to organize this table is to have something as follows: Approach (2) Primary Key = {fragment, mass, peptide} fragment | mass (of fragment) | peptide ---------------+--------------------------------+-------------- A | x | Peptide1 A | x | Peptide2 Q | y | Peptide1 K | z | Peptide 2 K | z | Peptide 3 If I consider 2500 unique fragments then, using approach (1), table Fragments will hold 2,500 tuples. If I consider the same number of fragments then table Fragments using approach 2 holds 15,000 tuples. I have considered using approach (1) whereby I would have less tuples to search but if I wanted to access the peptides they belong to I would retrieve the list of corresponding peptides e.g the string "Peptide1, Peptide2" and process it in my program. However this seems like a hack around the way a database table should be organised. The problem increases further when I have to scale up and consider more unique fragments (>2500). Any help on how best to structure such data would be mostly appreciated. - Salman Tahir
On 6/14/07, Salman Tahir <salmantahir1@gmail.com> wrote: > Any help on how best to structure such data would be mostly appreciated. See: http://en.wikipedia.org/wiki/Database_normalization *** Grossly oversimplified example follows *** CREATE TABLE PEPTIDE( NAME TEXT PRIMARY KEY ); CREATE TABLE FRAGMENT( NAME TEXT PRIMARY KEY , MASS TEXT ); CREATE TABLE PEPTIDE_FRAGMENT( FRAGMENT TEXT NOT NULL REFERENCES FRAGMENT(NAME) , PEPTIDE TEXT NOT NULL REFERENCES PEPTIDE(NAME) ); INSERT INTO PEPTIDE VALUES ('Peptide 1'),('Peptide 2'),('Peptide 3'); INSERT INTO FRAGMENT VALUES ('A','x'),('Q','y'),('K','z'); INSERT INTO PEPTIDE_FRAGMENT VALUES ('A','Peptide 1'),('A','Peptide 2'),('Q','Peptide 1') ,('K','Peptide 2'),('K','Peptide 3'); SELECT F.NAME AS FRAGMENT, F.MASS , (SELECT ARRAY_TO_STRING(ARRAY( SELECT PEPTIDE FROM PEPTIDE_FRAGMENT WHERE FRAGMENT = F.NAME ORDER BY NAME ), ','))AS PEPTIDE FROM FRAGMENT F; fragment | mass | peptide ----------+------+---------------------A | x | Peptide 1,Peptide 2Q | y | Peptide 1K | z |Peptide 2,Peptide 3
Hi Salman, will this achieve your needs? peptide_table primary_key name (ie, Peptide1, Peptide2) mass fragment_table primary_key name (ie, A, Q, K) link_table primary_key peptide_id fragment_id it gives you easy access to peptide mass and it allows you to link multiple fragments to individual peptides. if the peptide mass ultimately comes form the fragments, you could enter the mass value there and then have the db do a calculationto sum up the masses for a given peptide. sorry if i've misunderstood you problem. best of luck, oe1 ----- Original Message ---- From: Salman Tahir <salmantahir1@gmail.com> To: pgsql-sql@postgresql.org Sent: Thursday, June 14, 2007 4:21:06 AM Subject: [SQL] Organization of tables Hi, I have a question regarding the organization of a table I want to create in my database: I have the following setup: Table Fragments (simplified example): Primary key = {mass} Approach (1) fragment | mass (of fragment) | peptide ---------------+--------------------------------+-------------- A | x | Peptide1, Peptide2 Q | y | Peptide1 K | z | Peptide 2, Peptide3 The idea here is that a peptide can be composed of many fragments e.g. Peptide 2 is made up of fragments A and K; Peptide1 is made up of A and Q and so on. My idea is to create an index on the mass column and be able to retrieve all Peptides that contain a certain fragment mass e.g SELECT peptide FROM Fragments WHERE mass = x; Should give me: Peptide1, Peptide2 The alternative way I have thought of to organize this table is to have something as follows: Approach (2) Primary Key = {fragment, mass, peptide} fragment | mass (of fragment) | peptide ---------------+--------------------------------+-------------- A | x | Peptide1 A | x | Peptide2 Q | y | Peptide1 K | z | Peptide 2 K | z | Peptide 3 If I consider 2500 unique fragments then, using approach (1), table Fragments will hold 2,500 tuples. If I consider the same number of fragments then table Fragments using approach 2 holds 15,000 tuples. I have considered using approach (1) whereby I would have less tuples to search but if I wanted to access the peptides they belong to I would retrieve the list of corresponding peptides e.g the string "Peptide1, Peptide2" and process it in my program. However this seems like a hack around the way a database table should be organised. The problem increases further when I have to scale up and consider more unique fragments (>2500). Any help on how best to structure such data would be mostly appreciated. - Salman Tahir ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq ____________________________________________________________________________________ TV dinner still cooling? Check out "Tonight's Picks" on Yahoo! TV. http://tv.yahoo.com/