Обсуждение: subquery/alias question
Hi all,
I've read 7.2.1.3 (as short as it is) in the PgSQL docs, but don't
see what I am doing wrong... Maybe you can help?
I've got a query;
SELECT
d.dom_id,
d.dom_name,
(SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id)
AS
usr_count
FROM
domains d
ORDER BY d.dom_name ASC;
Where 'usr_count' returns the number of entries in 'users' that point
to a given entry in 'domains'. Pretty straight forward so far. The
trouble is:
SELECT
d.dom_id,
d.dom_name,
(SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id)
AS
usr_count
FROM
domains d
WHERE
usr_count > 0
ORDER BY d.dom_name ASC;
Causes the error:
ERROR: column "usr_count" does not exist
It works if I use:
SELECT
d.dom_id,
d.dom_name,
(SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id)
AS
usr_count
FROM
domains d
WHERE
(SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) > 0
ORDER BY d.dom_name ASC;
This seems terribly inefficient (and ugly), and I can't see why the
results from 'usr_count' can't be counted... I can use 'usr_count' to
sort the results...
Thanks all!
Madi
On Sep 25, 2007, at 16:59 , Madison Kelly wrote:
> SELECT
> d.dom_id,
> d.dom_name,
> (SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id)
> AS
> usr_count
> FROM
> domains d
> WHERE
> (SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) > 0
> ORDER BY d.dom_name ASC;
Why not just use a join? Something like this would work, I should think:
select dom_id,
dom_name,
usr_count
from domains
natural join (select usr_dom_id as dom_id,
count(usr_dom_id) as usr_count
from users) u
where usr_count > 0
order by dom_name;
Michael Glaesemann
grzm seespotcode net
Michael Glaesemann wrote: > > On Sep 25, 2007, at 16:59 , Madison Kelly wrote: > >> SELECT >> d.dom_id, >> d.dom_name, >> (SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) >> AS >> usr_count >> FROM >> domains d >> WHERE >> (SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) > 0 >> ORDER BY d.dom_name ASC; > > Why not just use a join? Something like this would work, I should think: > > select dom_id, > dom_name, > usr_count > from domains > natural join (select usr_dom_id as dom_id, > count(usr_dom_id) as usr_count > from users) u > where usr_count > 0 > order by dom_name; Maybe the usr_count should be tested in a HAVING clause instead of WHERE? And put the count(*) in the result list instead of a subselect. That feels more natural to me anyway. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Sep 25, 2007, at 17:30 , Alvaro Herrera wrote: > Michael Glaesemann wrote: >> >> select dom_id, >> dom_name, >> usr_count >> from domains >> natural join (select usr_dom_id as dom_id, >> count(usr_dom_id) as usr_count >> from users) u >> where usr_count > 0 >> order by dom_name; > > Maybe the usr_count should be tested in a HAVING clause instead of > WHERE? And put the count(*) in the result list instead of a > subselect. > That feels more natural to me anyway. I believe you'd have to write it like select dom_id, dom_name, count(usr_dom_id) as usr_count from domains join users on (usr_dom_id = dom_id) having count(usr_dom_id) > 0 order by dom_name; I don't know how the performance would compare. I think the backend is smart enough to know it doesn't need to perform two seq scans to calculate count(usr_dom_id), but I wasn't sure. Madison, how do the two queries compare with explain analyze? Michael Glaesemann grzm seespotcode net
Michael Glaesemann <grzm@seespotcode.net> writes:
> I believe you'd have to write it like
> select dom_id, dom_name, count(usr_dom_id) as usr_count
> from domains
> join users on (usr_dom_id = dom_id)
> having count(usr_dom_id) > 0
> order by dom_name;
> I don't know how the performance would compare. I think the backend
> is smart enough to know it doesn't need to perform two seq scans to
> calculate count(usr_dom_id), but I wasn't sure.
It has been smart enough for a few years now --- don't recall when
exactly, but nodeAgg.c quoth
* Perform lookups of aggregate function info, and initialize the
* unchanging fields of the per-agg data. We also detect duplicate
* aggregates (for example, "SELECT sum(x) ... HAVING sum(x) > 0"). When
* duplicates are detected, we only make an AggStatePerAgg struct for the
* first one. The clones are simply pointed at the same result entry by
* giving them duplicate aggno values.
... which in English means we just do the calculation once ...
regards, tom lane
On Sep 25, 2007, at 21:44 , Tom Lane wrote: > ... which in English means we just do the calculation once ... As always, thanks, Tom, for the explanation (and Alvaro, who probably already knew this :)) Michael Glaesemann grzm seespotcode net
Michael Glaesemann wrote: > > On Sep 25, 2007, at 17:30 , Alvaro Herrera wrote: > >> Michael Glaesemann wrote: >>> >>> select dom_id, >>> dom_name, >>> usr_count >>> from domains >>> natural join (select usr_dom_id as dom_id, >>> count(usr_dom_id) as usr_count >>> from users) u >>> where usr_count > 0 >>> order by dom_name; >> >> Maybe the usr_count should be tested in a HAVING clause instead of >> WHERE? And put the count(*) in the result list instead of a subselect. >> That feels more natural to me anyway. > > I believe you'd have to write it like > > select dom_id, dom_name, count(usr_dom_id) as usr_count > from domains > join users on (usr_dom_id = dom_id) > having count(usr_dom_id) > 0 > order by dom_name; > > I don't know how the performance would compare. I think the backend is > smart enough to know it doesn't need to perform two seq scans to > calculate count(usr_dom_id), but I wasn't sure. > > Madison, how do the two queries compare with explain analyze? Thanks for your reply! Unfortunately, in both cases I get the error: nmc=> SELECT dom_id, dom_name, COUNT(usr_dom_id) AS usr_count FROM domains JOIN users ON (usr_dom_id=dom_id) HAVING COUNT (usr_dom_id) > 0 ORDER BY dom_name; ERROR: syntax error at or near "COUNT" at character 25 LINE 1: SELECT dom_id, dom_name COUNT(usr_dom_id) AS usr_count FROM ... I've been struggling with some deadlines, so for now I'm using just: SELECT d.dom_id, d.dom_name FROM domains d WHERE (SELECT COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) > 0 ORDER BY d.dom_name ASC; Which gives me just the domains with at least one user under them, but not the count. This is not ideal, and I will have to come back to it next week. In the meantime, any idea what the GROUP BY error is? If not, I'll read through the docs on 'GROUP'ing once I get this deadline out of the way. Thank you all for your help! I am sure I will have more question(s) next week as soon as I can get back to this. Madi
Madison Kelly wrote: > Thanks for your reply! > > Unfortunately, in both cases I get the error: > > nmc=> SELECT dom_id, dom_name, COUNT(usr_dom_id) AS usr_count FROM domains > JOIN users ON (usr_dom_id=dom_id) HAVING COUNT (usr_dom_id) > 0 ORDER BY > dom_name; > ERROR: syntax error at or near "COUNT" at character 25 > LINE 1: SELECT dom_id, dom_name COUNT(usr_dom_id) AS usr_count FROM ... Try to avoid missing the comma before the COUNT (and do not cheat when cut'n pasting ...) Also it seems you will need a GROUP BY clause: GROUP BY dom_id, dom_name (placed just before the HAVING clause). -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 "Some men are heterosexual, and some are bisexual, and some men don't think about sex at all... they become lawyers" (Woody Allen)
"Madison Kelly" <linux@alteeve.com> writes:
> SELECT d.dom_id, d.dom_name FROM domains d WHERE (SELECT COUNT(*) FROM users u
> WHERE u.usr_dom_id=d.dom_id) > 0 ORDER BY d.dom_name ASC;
>
> Which gives me just the domains with at least one user under them, but not
> the count. This is not ideal, and I will have to come back to it next week. In
> the meantime, any idea what the GROUP BY error is? If not, I'll read through
> the docs on 'GROUP'ing once I get this deadline out of the way.
I think you just want simply:
SELECT dom_id, dom_name, count(*)
FROM users
JOIN domains ON (usr_dom_id=dom_id)
GROUP BY dom_id, dom_nmae
ORDER BY dom_name
You don't actually need the HAVING (though it wouldn't do any harm either)
since only domains which match a user will come out of the join anyways.
You can also write it using a subquery instead of a join
SELECT *
FROM (
SELECT dom_id, dom_name,
(SELECT count(*) FROM users WHERE user_dom_id = dom_id) as nusers
FROM domains
) as subq
WHERE nusers > 0
ORDER BY dom_name
But that will perform worse in many cases.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Alvaro Herrera wrote:
> Madison Kelly wrote:
>
>> Thanks for your reply!
>>
>> Unfortunately, in both cases I get the error:
>>
>> nmc=> SELECT dom_id, dom_name, COUNT(usr_dom_id) AS usr_count FROM domains
>> JOIN users ON (usr_dom_id=dom_id) HAVING COUNT (usr_dom_id) > 0 ORDER BY
>> dom_name;
>> ERROR: syntax error at or near "COUNT" at character 25
>> LINE 1: SELECT dom_id, dom_name COUNT(usr_dom_id) AS usr_count FROM ...
>
> Try to avoid missing the comma before the COUNT (and do not cheat when
> cut'n pasting ...)
>
> Also it seems you will need a GROUP BY clause:
> GROUP BY dom_id, dom_name
> (placed just before the HAVING clause).
Bingo!
Now to answer the performance questions (using my actual queries,
unedited so they are a little longer):
-=-=-=-=-=-
nmc=> EXPLAIN ANALYZE SELECT dom_id, dom_name, dom_note,
COUNT(usr_dom_id) AS usr_count FROM domains JOIN users ON
(usr_dom_id=dom_id) GROUP BY dom_id, dom_name, dom_note HAVING COUNT
(usr_dom_id) > 0 ORDER BY dom_name;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Sort (cost=10.70..10.78 rows=31 width=72) (actual time=2.107..2.133
rows=17 loops=1)
Sort Key: domains.dom_name
-> HashAggregate (cost=9.39..9.93 rows=31 width=72) (actual
time=1.899..1.956 rows=17 loops=1)
Filter: (count(usr_dom_id) > 0)
-> Hash Join (cost=7.20..9.00 rows=31 width=72) (actual
time=0.942..1.411 rows=96 loops=1)
Hash Cond: ("outer".dom_id = "inner".usr_dom_id)
-> Seq Scan on domains (cost=0.00..1.31 rows=31
width=68) (actual time=0.227..0.321 rows=31 loops=1)
-> Hash (cost=6.96..6.96 rows=96 width=4) (actual
time=0.673..0.673 rows=96 loops=1)
-> Seq Scan on users (cost=0.00..6.96 rows=96
width=4) (actual time=0.010..0.371 rows=96 loops=1)
Total runtime: 2.454 ms
(10 rows)
-=-=-=-=-=-
Versus:
-=-=-=-=-=-
nmc=> EXPLAIN ANALYZE SELECT d.dom_id, d.dom_name, d.dom_note, (SELECT
COUNT(*) FROM users u WHERE u.usr_dom_id=d.dom_id) AS usr_count FROM
domains d WHERE (SELECT COUNT(*) FROM users u WHERE
u.usr_dom_id=d.dom_id) > 0 ORDER BY d.dom_name ASC;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Sort (cost=297.37..297.39 rows=10 width=68) (actual
time=10.171..10.196 rows=17 loops=1)
Sort Key: dom_name
-> Seq Scan on domains d (cost=0.00..297.20 rows=10 width=68)
(actual time=0.508..10.013 rows=17 loops=1)
Filter: ((subplan) > 0)
SubPlan
-> Aggregate (cost=7.21..7.21 rows=1 width=0) (actual
time=0.203..0.204 rows=1 loops=31)
-> Seq Scan on users u (cost=0.00..7.20 rows=1
width=0) (actual time=0.127..0.189 rows=3 loops=31)
Filter: (usr_dom_id = $0)
-> Aggregate (cost=7.21..7.21 rows=1 width=0) (actual
time=0.184..0.186 rows=1 loops=17)
-> Seq Scan on users u (cost=0.00..7.20 rows=1
width=0) (actual time=0.058..0.164 rows=6 loops=17)
Filter: (usr_dom_id = $0)
Total runtime: 10.593 ms
(12 rows)
-=-=-=-=-=-
So using the JOIN you all helped me with, the query returns in 2.454
ms compared to my early query of 10.593 ms!
I have not yet looked into any indexing either. I am waiting until
the program is done and then will go back and review queries to look for
bottlenecks.
Thanks to all of you!!
Madi
On Sep 26, 2007, at 7:41 , Madison Kelly wrote:
> Unfortunately, in both cases I get the error:
Um, the two cases could not be giving the same error as they don't
both contain the syntax that the error is complaining about: the
first case uses count in a subquery so it couldn't throw this exact
error.
> nmc=> SELECT dom_id, dom_name, COUNT(usr_dom_id) AS usr_count FROM
> domains JOIN users ON (usr_dom_id=dom_id) HAVING COUNT (usr_dom_id)
> > 0 ORDER BY dom_name;
> ERROR: syntax error at or near "COUNT" at character 25
> LINE 1: SELECT dom_id, dom_name COUNT(usr_dom_id) AS usr_count
> FROM ...
The error message doesn't match the query you've provided. Note that
in the line marked LINE 1, there's no comma after dom_name, which I
assume is what the server is complaining about. However, the query
you show *does* have this comma. Something isn't right. Is this an
exact copy and paste from psql?
> I've been struggling with some deadlines, so for now I'm using just:
>
> SELECT d.dom_id, d.dom_name FROM domains d WHERE (SELECT COUNT(*)
> FROM users u WHERE u.usr_dom_id=d.dom_id) > 0 ORDER BY d.dom_name ASC;
>
> Which gives me just the domains with at least one user under
> them, but not the count. This is not ideal, and I will have to come
> back to it next week. In the meantime, any idea what the GROUP BY
> error is?
Ah. You haven't actually shown us a GROUP BY error. A GROUP BY clause
is needed when you've got columns that aren't included in the
aggregate (COUNT in this case), e.g.,
select dom_id,
dom_name,
usr_count
from domains
natural join (select usr_dom_id as dom_id,
count(usr_dom_id) as usr_count
from users
group by dom_id) u
where usr_count > 0
order by dom_name;
select dom_id, dom_name, count(usr_dom_id) as usr_count
from domains
join users on (usr_dom_id = dom_id)
group by dom_id, dom_name
having count(usr_dom_id) > 0
order by dom_name;
Michael Glaesemann
grzm seespotcode net
Gregory Stark wrote: > "Madison Kelly" <linux@alteeve.com> writes: > >> SELECT d.dom_id, d.dom_name FROM domains d WHERE (SELECT COUNT(*) FROM users u >> WHERE u.usr_dom_id=d.dom_id) > 0 ORDER BY d.dom_name ASC; >> >> Which gives me just the domains with at least one user under them, but not >> the count. This is not ideal, and I will have to come back to it next week. In >> the meantime, any idea what the GROUP BY error is? If not, I'll read through >> the docs on 'GROUP'ing once I get this deadline out of the way. > > I think you just want simply: > > SELECT dom_id, dom_name, count(*) > FROM users > JOIN domains ON (usr_dom_id=dom_id) > GROUP BY dom_id, dom_nmae > ORDER BY dom_name > > You don't actually need the HAVING (though it wouldn't do any harm either) > since only domains which match a user will come out of the join anyways. > > You can also write it using a subquery instead of a join > > SELECT * > FROM ( > SELECT dom_id, dom_name, > (SELECT count(*) FROM users WHERE user_dom_id = dom_id) as nusers > FROM domains > ) as subq > WHERE nusers > 0 > ORDER BY dom_name > > But that will perform worse in many cases. > You are right, the 'HAVING' clause does seem to be redundant. I removed it and ran several 'EXPLAIN ANALYZE's on it with and without the 'HAVING' clause and found no perceivable difference. I removed the 'HAVING' clause anyway, since I like to keep queries as minimal as possible. Thank you! Madi