Обсуждение: bug in JOIN or COUNT or ... ?
Sorry, worst Subject I've ever come up with, but this is one of those "I
haven't got a clue how to describe" emails ...
Simple query:
SELECT distinct s.gid, s.created, i.title FROM status s LEFT JOIN images i ON (s.gid = i.gid AND i.active),
personal_datapd, relationship_wanted rw WHERE s.active AND s.status != 0 AND s.gid = 17111 AND (s.gid = pd.gid
ANDpd.gender = 0) AND (s.gid = rw.gid AND rw.gender = 0 );
Produces:
gid | created | title
-------+------------------------+--------17111 | 2000-10-19 15:20:46-04 | image117111 | 2000-10-19 15:20:46-04 |
image217111| 2000-10-19 15:20:46-04 | image3
(3 rows)
Great, what I expect ...
But:
SELECT distinct s.gid, s.created, count(i.title) AS images FROM status s LEFT JOIN images i ON (s.gid = i.gid AND
i.active),personal_data pd, relationship_wanted rw WHERE s.active AND s.status != 0 AND s.gid = 17111 AND (s.gid
=pd.gid AND pd.gender = 0) AND (s.gid = rw.gid AND rw.gender = 0 )
GROUP BY s.gid, s.created;
Produces:
/tmp/psql.edit.70.62491: 7 lines, 353 characters. gid | created | images
-------+------------------------+--------17111 | 2000-10-19 15:20:46-04 | 15
(1 row)
So why is it counting 12 more images then are actually found/exist:
testdb=# select title from images where gid = 17111;title
--------image1image3image2
(3 rows)
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
The Hermit Hacker <scrappy@hub.org> writes:
> So why is it counting 12 more images then are actually found/exist:
Hm. Could we see the EXPLAIN output for both of those?
regards, tom lane
On Sat, 12 May 2001, Tom Lane wrote:
> The Hermit Hacker <scrappy@hub.org> writes:
> > So why is it counting 12 more images then are actually found/exist:
>
> Hm. Could we see the EXPLAIN output for both of those?
without count:
NOTICE: QUERY PLAN:
Unique (cost=8.66..8.67 rows=1 width=37) -> Sort (cost=8.66..8.66 rows=1 width=37) -> Nested Loop
(cost=0.00..8.65rows=1 width=37) -> Nested Loop (cost=0.00..6.52 rows=1 width=33) ->
NestedLoop (cost=0.00..4.26 rows=1 width=29) -> Index Scan using status_gid on status s
(cost=0.00..2.23rows=1 width=12) -> Index Scan using images_gid on images i (cost=0.00..2.02
rows=1width=17) -> Index Scan using personal_data_gid on personal_data pd (cost=0.00..2.25 rows=1
width=4) -> Index Scan using relationship_wanted_gid on relationship_wanted rw (cost=0.00..2.11 rows=1
width=4)
EXPLAIN
with count:
NOTICE: QUERY PLAN:
Unique (cost=8.68..8.69 rows=1 width=37) -> Sort (cost=8.68..8.68 rows=1 width=37) -> Aggregate
(cost=8.66..8.67rows=1 width=37) -> Group (cost=8.66..8.67 rows=1 width=37) -> Sort
(cost=8.66..8.66rows=1 width=37) -> Nested Loop (cost=0.00..8.65 rows=1 width=37)
-> Nested Loop (cost=0.00..6.52 rows=1 width=33) -> Nested Loop
(cost=0.00..4.26 rows=1 width=29) -> Index Scan using status_gid on status s
(cost=0.00..2.23 rows=1 width=12) -> Index Scan using images_gid on images i
(cost=0.00..2.02 rows=1 width=17) -> Index Scan using personal_data_gid on
personal_datapd (cost=0.00..2.25 rows=1 width=4) -> Index Scan using
relationship_wanted_gidon relationship_wanted rw (cost=0.00..2.11 rows=1 width=4)
EXPLAIN
>
> regards, tom lane
>
Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Ah, I see it: your join against relationship_wanted isn't unique.
globalmatch=# select count(*) from personal_data pd
globalmatch-# where pd.gid = 17111 AND pd.gender = 0;count
------- 1
(1 row)
globalmatch=# select count(*) from relationship_wanted rw
globalmatch-# where rw.gid = 17111 AND rw.gender = 0;count
------- 5
(1 row)
globalmatch=#
So that inflates the number of rows coming out of the join by 5.
regards, tom lane
On Sat, 12 May 2001, Tom Lane wrote: > Ah, I see it: your join against relationship_wanted isn't unique. > > globalmatch=# select count(*) from personal_data pd > globalmatch-# where pd.gid = 17111 AND pd.gender = 0; > count > ------- > 1 > (1 row) > > globalmatch=# select count(*) from relationship_wanted rw > globalmatch-# where rw.gid = 17111 AND rw.gender = 0; > count > ------- > 5 > (1 row) > > globalmatch=# > > So that inflates the number of rows coming out of the join by 5. Okay, then I'm lost ... why wouldn't that show up without the COUNT()? I doubt doubt your analysis, I just want to understand why ...
On Sat, 12 May 2001, Tom Lane wrote: > The Hermit Hacker <scrappy@hub.org> writes: > >> So that inflates the number of rows coming out of the join by 5. > > > Okay, then I'm lost ... why wouldn't that show up without the COUNT()? I > > doubt doubt your analysis, I just want to understand why ... > > You had DISTINCT on your query, which hid the duplicated rows from you. > But that happens *after* aggregate processing, so it doesn't hide the > dups from COUNT(). Ahhhh, okay, that makes sense ... thanks for taking the time to check it for me ... and explaining what I was missing ...
The Hermit Hacker <scrappy@hub.org> writes:
>> So that inflates the number of rows coming out of the join by 5.
> Okay, then I'm lost ... why wouldn't that show up without the COUNT()? I
> doubt doubt your analysis, I just want to understand why ...
You had DISTINCT on your query, which hid the duplicated rows from you.
But that happens *after* aggregate processing, so it doesn't hide the
dups from COUNT().
regards, tom lane