Обсуждение: How to count from a second table in an aggregate query?
Hi, I'm trying to figure out how to do something which I'd guess is easy for a sql whiz but has me stumped. I would greatly appreciate any help on this - it's a form of SQL query that I've never figured out, but have wanted to use many times over the years.. I want to generate an analysis report that counts the values in two separate tables. I've been able to accomplish what I want with two separate queries that I then merge together in Excel. Essentially what I need is a "horizontal UNION" statement (or something like that). I've included some DDL and sample SQL queries that explain what I want better than I can in English, but the general idea is: get a FK id and count of a certain column in one table, based on some criteria-> for each FK id, get the count of a different column in a different table Display the counts from both queries side-by-side along with the FK id's in a single result set Thanks for any assistance on this! Steve /*SQL STARTS*/ drop table if exists contact_log; drop table if exists contact_property; create table contact_log(id serial NOT null, src_contact_id integer, log_type character varying(63), CONSTRAINT contact_log_pkey PRIMARY KEY (id)); create table contact_property(id serial NOT null, contact_id integer, property_id integer, CONSTRAINT contact_property_pkey PRIMARY KEY (id), CONSTRAINT contact_property_cid_pid UNIQUE (contact_id, property_id)); insert into contact_log (src_contact_id, log_type) values(1, 'web'); insert into contact_log (src_contact_id, log_type) values(1, 'web'); insert into contact_log (src_contact_id, log_type) values(1, 'web'); insert into contact_log (src_contact_id, log_type) values(1, 'detail'); insert into contact_log (src_contact_id, log_type) values(1, 'detail'); insert into contact_log (src_contact_id, log_type) values(2, 'detail'); insert into contact_log (src_contact_id, log_type) values(2, 'detail'); insert into contact_log (src_contact_id, log_type) values(2, 'web'); insert into contact_log (src_contact_id, log_type) values(2, 'foobar'); insert into contact_log (src_contact_id, log_type) values(3, 'foobar'); insert into contact_log (src_contact_id, log_type) values(4, 'web'); insert into contact_property (contact_id, property_id) values(1, 20); insert into contact_property (contact_id, property_id) values(1, 21); insert into contact_property (contact_id, property_id) values(1, 22); insert into contact_property (contact_id, property_id) values(2, 23); insert into contact_property (contact_id, property_id) values(2, 24); insert into contact_property (contact_id, property_id) values(1, 50); insert into contact_property (contact_id, property_id) values(3, 51); insert into contact_property (contact_id, property_id) values(5, 52); -- This gets what I want from contact_log select src_contact_id, count(log_type) from contact_log where contact_log.src_contact_id in (select contact_id from contact_property) and log_type in ('web', 'detail') and src_contact_id in (select contact_id from contact_property) group by src_contact_id order by src_contact_id; -- correct output is : 1|5, 2|3 -- This gets what I want from contact_property select contact_id, count(property_id) from contact_property where contact_id in (select src_contact_id from contact_log where log_type in ('web', 'detail')) group by contact_id order by contact_id; -- correct output is: 1|4, 2|2 -- THIS DOESN'T WORK (of course - but what would?) select src_contact_id, count(log_type), count(property_id) from contact_log join contact_property cp on cp.contact_id = contact_log.src_contact_id where contact_log.src_contact_id in (select contact_id from contact_property) and log_type in ('web', 'detail') group by src_contact_id order by src_contact_id -- correct output *should be* : 1|5|4, 2|3|2 /*SQL ENDS*/
On Apr 15, 2009, at 1:15 PM, Steve Midgley wrote: > Hi, > > I'm trying to figure out how to do something which I'd guess is easy > for a sql whiz but has me stumped. I would greatly appreciate any > help on this - it's a form of SQL query that I've never figured out, > but have wanted to use many times over the years.. > > I want to generate an analysis report that counts the values in two > separate tables. I've been able to accomplish what I want with two > separate queries that I then merge together in Excel. Essentially > what I need is a "horizontal UNION" statement (or something like > that). > > I've included some DDL and sample SQL queries that explain what I > want better than I can in English, but the general idea is: > > get a FK id and count of a certain column in one table, based on > some criteria > -> for each FK id, get the count of a different column in a > different table > Display the counts from both queries side-by-side along with the FK > id's in a single result set > > Thanks for any assistance on this! > > Steve > > /*SQL STARTS*/ > drop table if exists contact_log; > drop table if exists contact_property; > create table contact_log(id serial NOT null, src_contact_id integer, > log_type character varying(63), CONSTRAINT contact_log_pkey PRIMARY > KEY (id)); > create table contact_property(id serial NOT null, contact_id > integer, property_id integer, > CONSTRAINT contact_property_pkey PRIMARY KEY (id), CONSTRAINT > contact_property_cid_pid UNIQUE (contact_id, property_id)); > insert into contact_log (src_contact_id, log_type) values(1, 'web'); > insert into contact_log (src_contact_id, log_type) values(1, 'web'); > insert into contact_log (src_contact_id, log_type) values(1, 'web'); > insert into contact_log (src_contact_id, log_type) values(1, > 'detail'); > insert into contact_log (src_contact_id, log_type) values(1, > 'detail'); > insert into contact_log (src_contact_id, log_type) values(2, > 'detail'); > insert into contact_log (src_contact_id, log_type) values(2, > 'detail'); > insert into contact_log (src_contact_id, log_type) values(2, 'web'); > insert into contact_log (src_contact_id, log_type) values(2, > 'foobar'); > insert into contact_log (src_contact_id, log_type) values(3, > 'foobar'); > insert into contact_log (src_contact_id, log_type) values(4, 'web'); > insert into contact_property (contact_id, property_id) values(1, 20); > insert into contact_property (contact_id, property_id) values(1, 21); > insert into contact_property (contact_id, property_id) values(1, 22); > insert into contact_property (contact_id, property_id) values(2, 23); > insert into contact_property (contact_id, property_id) values(2, 24); > insert into contact_property (contact_id, property_id) values(1, 50); > insert into contact_property (contact_id, property_id) values(3, 51); > insert into contact_property (contact_id, property_id) values(5, 52); > > > -- This gets what I want from contact_log > select src_contact_id, count(log_type) > from contact_log > where > contact_log.src_contact_id in (select contact_id from > contact_property) > and log_type in ('web', 'detail') > and src_contact_id in (select contact_id from contact_property) > group by src_contact_id > order by src_contact_id; > -- correct output is : 1|5, 2|3 > > -- This gets what I want from contact_property > select contact_id, count(property_id) > from contact_property > where > contact_id in (select src_contact_id from contact_log where log_type > in ('web', 'detail')) > group by contact_id > order by contact_id; > -- correct output is: 1|4, 2|2 > > -- THIS DOESN'T WORK (of course - but what would?) > select src_contact_id, count(log_type), count(property_id) > from contact_log > join contact_property cp on cp.contact_id = contact_log.src_contact_id > where > contact_log.src_contact_id in (select contact_id from > contact_property) > and log_type in ('web', 'detail') > group by src_contact_id > order by src_contact_id > -- correct output *should be* : 1|5|4, 2|3|2 > /*SQL ENDS*/ First, in that last query, working or not, you don't need the "contact_log.src_contact_id in (select contact_id from contact_property)" clause as you've already covered that with the join condtion "cp.contact_id = contact_log.src_contact_id". Anyways, on to your actual question, you can't do that in one level from what I can see as the query first does the join and the executes the aggregates on the results of the join. Let's check out the results of that join without the aggregates (I'm ignoring the id values here since they don't come into play and it will help demonstrate what's happening later): select cl.src_contact_id, cl.log_type, cp.contact_id, cp.property_id from contact_log cl, contact_property cp where cl.src_contact_id = cp.contact_id and cl.log_type in ('web', 'detail'); src_contact_id | log_type | contact_id | property_id ----------------+----------+------------+------------- 1 | detail | 1 | 20 1 | detail | 1 | 20 1 | web | 1 | 20 1 | web | 1 | 20 1 | web | 1 | 20 1 | detail | 1 | 21 1 | detail | 1 | 21 1 | web | 1 | 21 1 | web | 1 | 21 1 | web | 1 | 21 1 | detail | 1 | 22 1 | detail | 1 | 22 1 | web | 1 | 22 1 | web | 1 | 22 1 | web | 1 | 22 2 | web | 2 | 23 2 | detail | 2 | 23 2 | detail | 2 | 23 2 | web | 2 | 24 2 | detail | 2 | 24 2 | detail | 2 | 24 1 | detail | 1 | 50 1 | detail | 1 | 50 1 | web | 1 | 50 1 | web | 1 | 50 1 | web | 1 | 50 That is exactly what gets processed by the aggregates. Aggregates process *a* relation, here the relation produced by the join. Now, that query with the counts makes more sense: select cl.src_contact_id, count(log_type), count(property_id) from contact_log cl, contact_property cp where cl.src_contact_id = cp.contact_id and cl.log_type in ('web', 'detail') group by cl.src_contact_id src_contact_id | count | count ----------------+-------+------- 1 | 20 | 20 2 | 6 | 6 Joining against a subquery for the second count does the trick: select src_contact_id, count(log_type), cp.count from contact_log , (select contact_id, count(property_id) from contact_property group by contact_id) as cp where src_contact_id = cp.contact_id and log_type in ('web', 'detail') group by src_contact_id, cp.count order by src_contact_id src_contact_id | count | count ----------------+-------+------- 1 | 5 | 4 2 | 3 | 2 Note that you have to add the count pulled from subquery to the group by since at that point it's a constant and not an aggregate function anymore, else you get an SQL error in the outer group by clause. Here's the query without the aggregate in the outer query to help make that clear: select src_contact_id, log_type, cp.count from contact_log , (select contact_id, count(property_id) from contact_property group by contact_id) as cp where src_contact_id = cp.contact_id and log_type in ('web', 'detail') order by src_contact_id; src_contact_id | log_type | count ----------------+----------+------- 1 | web | 4 1 | detail | 4 1 |web | 4 1 | web | 4 1 | detail | 4 2 | detail | 2 2 | detail | 2 2 | web | 2 That says, "Give me each src_contact_id and log_type pair from contact_log along with the count from contact_property where contact_id = src_contact_id". Erik Jones, Database Administrator Engine Yard Support, Scalability, Reliability 866.518.9273 x 260 Location: US/Pacific IRC: mage2k
Erik Jones wrote: > > On Apr 15, 2009, at 1:15 PM, Steve Midgley wrote: > >> I want to generate an analysis report that counts the values in two >> separate tables. I've been able to accomplish what I want with two >> separate queries that I then merge together in Excel. Essentially >> what I need is a "horizontal UNION" statement (or something like that). >> >> get a FK id and count of a certain column in one table, based on some >> criteria >> -> for each FK id, get the count of a different column in a different >> table >> Display the counts from both queries side-by-side along with the FK >> id's in a single result set > > Joining against a subquery for the second count does the trick: > > select src_contact_id, count(log_type), cp.count > from contact_log , > (select contact_id, count(property_id) > from contact_property > group by contact_id) as cp > where src_contact_id = cp.contact_id > and log_type in ('web', 'detail') > group by src_contact_id, cp.count > order by src_contact_id > > src_contact_id | count | count > ----------------+-------+------- > 1 | 5 | 4 > 2 | 3 | 2 > Thanks Erik! This is perfect. Oliveiros showed me another neat solution a while back that involved a select statement in the from clause, but I kind filed that solution mentally as a cool parlor trick. Now I see that I'm going to have to learn and study this form of SQL more closely, as it's extremely flexible and powerful. Thanks for the very complete and patiently instructive response - it makes perfect sense. I'll work to share this along as I go. Steve
<pre wrap="">Date: Wed, 15 Apr 2009 21:23:04 -0700 From: Steve Midgley <a class="moz-txt-link-rfc2396E" href="mailto:science@misuse.org"><science@misuse.org></a> To: Erik Jones <a class="moz-txt-link-rfc2396E" href="mailto:ejones@engineyard.com"><ejones@engineyard.com></a> Subject: Re: How to count from a second table in an aggregate query? Message-ID: <a class="moz-txt-link-rfc2396E" href="mailto:49E6B2A8.5040308@misuse.org"><49E6B2A8.5040308@misuse.org></a> Erik Jones wrote: </pre><blockquote type="cite"><pre wrap=""><span class="moz-txt-citetags">></span> <span class="moz-txt-citetags">> </span>On Apr 15, 2009, at 1:15 PM, Steve Midgley wrote: <span class="moz-txt-citetags">></span> </pre><blockquote type="cite"><pre wrap=""><span class="moz-txt-citetags">>></span>I want to generate an analysis report that counts the values in two <span class="moz-txt-citetags">>> </span>separate tables. I've been able to accomplish what I want with two <span class="moz-txt-citetags">>> </span>separate queries that I then merge together in Excel. Essentially <span class="moz-txt-citetags">>> </span>what I need is a "horizontal UNION" statement (or something like that). <span class="moz-txt-citetags">>></span> <span class="moz-txt-citetags">>> </span>get a FK id and count of a certain column in one table, based on some <span class="moz-txt-citetags">>> </span>criteria <span class="moz-txt-citetags">>> </span>-> for each FK id, get the count of a different column in a different <span class="moz-txt-citetags">>> </span>table <span class="moz-txt-citetags">>> </span>Display the counts from both queries side-by-side along with the FK <span class="moz-txt-citetags">>> </span>id's in a single result set </pre></blockquote><pre wrap=""><span class="moz-txt-citetags">></span> <span class="moz-txt-citetags">> </span>Joining against a subquery for the second count does the trick: <span class="moz-txt-citetags">></span> <span class="moz-txt-citetags">> </span>select src_contact_id, count(log_type), cp.count <span class="moz-txt-citetags">> </span>from contact_log , <span class="moz-txt-citetags">> </span> (select contact_id, count(property_id) <span class="moz-txt-citetags">> </span> from contact_property <span class="moz-txt-citetags">> </span> group by contact_id) as cp <span class="moz-txt-citetags">> </span>where src_contact_id = cp.contact_id <span class="moz-txt-citetags">> </span> and log_type in ('web', 'detail') <span class="moz-txt-citetags">> </span>group by src_contact_id, cp.count <span class="moz-txt-citetags">> </span>order by src_contact_id <span class="moz-txt-citetags">></span> <span class="moz-txt-citetags">> </span> src_contact_id | count | count <span class="moz-txt-citetags">> </span>----------------+-------+------- <span class="moz-txt-citetags">> </span> 1 | 5 | 4 <span class="moz-txt-citetags">> </span> 2 | 3 | 2 <span class="moz-txt-citetags">></span> </pre></blockquote><pre wrap="">A friend of mine off-list provided an alternativeSQL version which I thought the list might have interest in: select src_contact_id, count(distinct contact_log.id), count(distinct contact_property.id) from contact_log, contact_property where contact_log.src_contact_id = contact_property.contact_id and contact_log.log_type in ('web', 'detail') group by src_contact_id; Credit to Matt Gainsborough for that one. Makes perfect sense as I look at it. It's nice to see two alternate paths to thesame solution. ANSI-92 joins work just as well as his ANSI-89 join syntax for this (i.e. using the "JOIN" keyword to setthe relation). Steve </pre>