Обсуждение: blanking out repeated columns in rows
Here's one for all you SQL jockeys out there (hope this is a good group to post this query) I'm currently (happily) using PostgreSQL for doing DNA expression analysis, and I often end up with joins where columns on the left side get repeated to matching differing For example, a select output ends up looking like this: (from psql. i've removed the - and | lines) tag gb_id pos descrip ACTATTTTTAGAGACCC NM_032685.1 307 hypothetical protein MGC13005 (MGC13005), AGAAAAAAAAAAAAAAA NM_019110.2 27 hypothetical protein P1 p373c6 (P1P373C6) AGAAAAAAAAAAAAAAA NM_019110.2 27 hypothetical protein P1 p373c6 (P1P373C6) AGCCACCACGCCTGGTC NM_003693.1 260 acetyl LDL receptor; SREC=scavenger AGCCACCGCGCCCGGCC NM_007081.1 486 RAB, member of RAS oncogene family-like 2B AGCCACCGCGCCCGGCC NM_013412.1 486 RAB, member of RAS oncogene family-like 2A AGCCACCGCGCCTGGCC NM_000651.2 229 complement component (3b/4b) receptor 1, AGCCACCGCGCCTGGCC NM_000573.2 229 complement component (3b/4b) receptor 1, ATCAAAAAAAAAAAAAA NM_079421.1 25 cyclin-dependent kinase inhibitor 2D but I (and my colleagues) would much rather see this, which draws attention to the duplicates (or multiples) in the left column. tag gb_id pos descrip ACTATTTTTAGAGACCC NM_032685.1 307 hypothetical protein MGC13005 (MGC13005), AGAAAAAAAAAAAAAAA NM_019110.2 27 hypothetical protein P1 p373c6 (P1P373C6) NM_019110.2 27 hypotheticalprotein P1 p373c6 (P1P373C6) AGCCACCACGCCTGGTC NM_003693.1 260 acetyl LDL receptor; SREC=scavenger AGCCACCGCGCCCGGCC NM_007081.1 486 RAB, member of RAS oncogene family-like 2B NM_013412.1 486 RAB, memberof RAS oncogene family-like 2A AGCCACCGCGCCTGGCC NM_000651.2 229 complement component (3b/4b) receptor 1, NM_000573.2 229 complement component(3b/4b) receptor 1, ATCAAAAAAAAAAAAAA NM_079421.1 25 cyclin-dependent kinase inhibitor 2D Is there any slick way to accomplish this in SQL? Or is there a switch or something I've missed in postgresql? Currently I'm postprocessing the output with a perl script, but I'd prefer the SQL to generate it directly, if possible. Thanks, folks! =sean ------------------------------------------------------------------------------- Sean R. McCorkle mccorkle@bnl.gov | Haven't you heard, that aside Genome Group (631) 344-4270 ph | from the details, there's Biology Department (631) 344-3407 fax | nothing further to be found? Brookhaven National Laboratory | --Master Dogen's 300 Koan Upton, New York 11973 | Shobogenzo (commentary)
On Tue, 7 May 2002 11:03:50 -0400 Sean McCorkle <mccorkle@avenger.bio.bnl.gov> wrote: > but I (and my colleagues) would much rather see this, which draws > attention to the duplicates (or multiples) in the left column. > > tag gb_id pos descrip > > ACTATTTTTAGAGACCC NM_032685.1 307 hypothetical protein MGC13005 (MGC13005), > AGAAAAAAAAAAAAAAA NM_019110.2 27 hypothetical protein P1 p373c6 (P1P373C6) > NM_019110.2 27 hypothetical protein P1 p373c6 (P1P373C6) > AGCCACCACGCCTGGTC NM_003693.1 260 acetyl LDL receptor; SREC=scavenger > AGCCACCGCGCCCGGCC NM_007081.1 486 RAB, member of RAS oncogene family-like 2B > NM_013412.1 486 RAB, member of RAS oncogene family-like 2A > AGCCACCGCGCCTGGCC NM_000651.2 229 complement component (3b/4b) receptor 1, > NM_000573.2 229 complement component (3b/4b) receptor 1, > ATCAAAAAAAAAAAAAA NM_079421.1 25 cyclin-dependent kinase inhibitor 2D How about this method of appending sequences as unique indices? First: create temp sequence dna_rownum1; create temp sequence dna_rownum2; Secand: SELECT setval('dna_rownum1', 1, false); -- (1) SELECT setval('dna_rownum2', 1, false); -- (2) SELECT (CASE WHEN t1.idx = t3.idx THEN t1.tag ELSE NULL END) AS tag, t1.gb_id, t1.pos, t1.descrip FROM (SELECT *, nextval('dna_rownum1') AS idx FROM dna ORDER BY idx ) AS t1, (SELECT t2.tag, MIN(t2.idx) AS idx FROM (SELECT tag, nextval('dna_rownum2')AS idx FROM dna ORDER BY idx) AS t2 GROUP BY t2.tag ) AS t3WHERE t1.tag = t3.tagORDER BY t1.tag, t1.idx; -- (3) Note: (1) and (2) need to be executed at the same time, but (3) doesn't. And CREATE TEMP SEQUENCE is practicable in7.2 or later. Regards, Masaru Sugawara
On Thu, 09 May 2002 01:51:59 +0900 Masaru Sugawara <rk73@sea.plala.or.jp> wrote: > > How about this method of appending sequences as unique indices? > > > First: > create temp sequence dna_rownum1; > create temp sequence dna_rownum2; > > Secand: > SELECT setval('dna_rownum1', 1, false); -- (1) > SELECT setval('dna_rownum2', 1, false); -- (2) > SELECT (CASE WHEN t1.idx = t3.idx THEN t1.tag ELSE NULL END) AS tag, I'm sorry. This one is simpler. SELECT (CASE WHEN t1.oid = t2.idx THEN t1.tag ELSE NULL END) AS tag, t1.gb_id, t1.pos, t1.descrip FROM dna AS t1, (SELECT tag, MIN(oid) AS idx FROM dna GROUP BYtag ) AS t2WHERE t1.tag = t2.tagORDER BY t1.tag, t1.oid; Regards, Masaru Sugawara