Обсуждение: Tricky join question
Hi all
I have a join problem that seems to be too difficult for me to solve:
I have:
table person
id integer,
name varchar(32)
data:
1,"Jack"
2,"Jill"
3,"Bob"
table course
id integer,
name varchar(32)
data:
1,"SQL Beginner"
2,"SQL Advanced"
table person_course
person_id number,
course_id number
data:
(currently empty)
Now, I would like to know for every person the courses they have taken.
In mysql, the following statement:
SELECT c.id, c.name, pc.person_id
FROM person as p
left outer join person_course as pc on p.id = pc.person_id
right outer join course as c on pc.course_id = c.id
where p.id = 2 order by 1;
will get me the following result:
+---+-----------------------------------------+----------+
| id| name | person_id|
+---+-----------------------------------------+----------+
| 1 | SQL Beginner | |
| 2 | SQL Advanced | |
+---+-----------------------------------------+----------+
Can I get Postgres to give me the same result somehow? The above
statement will return nothing at all under postgres.
Bye
Tim
am Fri, dem 22.12.2006, um 12:12:06 +0100 mailte Tim Tassonis folgendes: > Hi all > > I have a join problem that seems to be too difficult for me to solve: > > I have: > > table person > id integer, > name varchar(32) > > data: > > 1,"Jack" > 2,"Jill" > 3,"Bob" > > > table course > id integer, > name varchar(32) > > data: > > 1,"SQL Beginner" > 2,"SQL Advanced" > > > > table person_course > person_id number, > course_id number > > data: First, you should use referential integrity: test=# create table person(id int primary key, name text); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "person_pkey" for table "person" CREATE TABLE test=# create table course(id int primary key, name text); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "course_pkey" for table "course" CREATE TABLE test=# create table person_course(person_id int references person, course_id int references course ); CREATE TABLE > > (currently empty) Okay, i insert some data: test=# insert into person_course values (1,1); INSERT 0 1 test=# insert into person_course values (3,1); INSERT 0 1 test=# insert into person_course values (3,2); INSERT 0 1 > > > Now, I would like to know for every person the courses they have taken. Similar to your result: test=# select b.id, b.name, c.id from person_course a left join course b on a.course_id=b.id left join person c on a.person_id=c.id; id | name | id ----+--------------+---- 1 | SQL Beginner | 1 1 | SQL Beginner | 3 2 | SQL Advanced | 3 (3 rows) In my opinion better: test=# select c.id, c.name, b.name from person_course a left join course b on a.course_id=b.id left join person c on a.person_id=c.id; id | name | name ----+------+-------------- 1 | Jack | SQL Beginner 3 | Bob | SQL Beginner 3 | Bob | SQL Advanced (3 rows) Please, read more about referential integrity. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47215, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
Hi Andreas
> First, you should use referential integrity:
I do, that is not the point. It was a simplified data model. Of course I
have primary keys and stuff, but they don't affect join behaviour at all.
>
> test=# create table person(id int primary key, name text);
> NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "person_pkey" for table "person"
> CREATE TABLE
> test=# create table course(id int primary key, name text);
> NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "course_pkey" for table "course"
> CREATE TABLE
> test=# create table person_course(person_id int references person, course_id int references course );
> CREATE TABLE
>
Well, you shouldn't, thats not my problem.
>
>
>
>>
>> Now, I would like to know for every person the courses they have taken.
>
> Similar to your result:
>
> test=# select b.id, b.name, c.id from person_course a left join course b on a.course_id=b.id left join person c on
a.person_id=c.id;
> id | name | id
> ----+--------------+----
> 1 | SQL Beginner | 1
> 1 | SQL Beginner | 3
> 2 | SQL Advanced | 3
> (3 rows)
This is absolutely not what I want. I want a row for every person and
every course, regardless whether the person has taken the course or not.
If the person has not taken the course, I want a null value in the
person id column:
SELECT c.id, c.name, pc.person_id
FROM person as p
left outer join person_course as pc on p.id = pc.person_id
right outer join course as c on pc.course_id = c.id
where p.id = 2 order by 1;
+---+-----------------------------------------+----------+
| id| name | person_id|
+---+-----------------------------------------+----------+
| 1 | SQL Beginner | |
| 2 | SQL Advanced | |
+---+-----------------------------------------+----------+
Note here that I restrict my select to the person with the ID 2. Since
this person has not taken any course, the person_id is null. If I leave
the restriction on the person, I get person times courses rows, the
person_id only filled when a person has actually taken a course.
With the rows you added person_course and without restrictin to a
specific person, the result of your query should be:
+---+-----------------------------------------+------------+
| id| name | person_id |
+---+-----------------------------------------+------------+
| 1 | SQL Beginner | 1 |
| 1 | SQL Beginner | |
| 1 | SQL Beginner | 3 |
| 2 | SQL Advanced | |
| 2 | SQL Advanced | |
| 2 | SQL Advanced | 3 |
+---+-----------------------------------------+------------+
In mysql, you get this with the following clause:
SELECT c.id, c.name, pc.person_id
FROM person as p
left outer join person_course as pc on p.id = pc.person_id
right outer join course as c on pc.course_id = c.id
order by 1;
> Please, read more about referential integrity.
Thanks, but I already have read a lot about it 14 years ago.
Bye
Tim
Hi,
Without restriction you're getting:
On Fri, Dec 22, 2006 at 02:55:56PM +0100, Tim Tassonis wrote:
> +---+-----------------------------------------+------------+
> | id| name | person_id |
> +---+-----------------------------------------+------------+
> | 1 | SQL Beginner | 1 |
> | 1 | SQL Beginner | |
> | 1 | SQL Beginner | 3 |
> | 2 | SQL Advanced | |
> | 2 | SQL Advanced | |
> | 2 | SQL Advanced | 3 |
> +---+-----------------------------------------+------------+
There are no rows in the table with person_id=2, so PostgreSQL is
returning the correct result (no rows). Seems like a bug in MySQL.
> In mysql, you get this with the following clause:
>
> SELECT c.id, c.name, pc.person_id
> FROM person as p
> left outer join person_course as pc on p.id = pc.person_id
> right outer join course as c on pc.course_id = c.id
> order by 1;
I think what you want is to apply to restriction on person earlier,
maybe:
SELECT c.id, c.name, pc.person_id
FROM person as p
left outer join person_course as pc on (p.id = pc.person_id and p.id = 2)
right outer join course as c on pc.course_id = c.id
order by 1;
Hope this helps,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.
Вложения
Tim Tassonis wrote: > Hi Andreas > >> First, you should use referential integrity: > > > I do, that is not the point. It was a simplified data model. Of course I > have primary keys and stuff, but they don't affect join behaviour at all. > >> >> test=# create table person(id int primary key, name text); >> NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index >> "person_pkey" for table "person" >> CREATE TABLE >> test=# create table course(id int primary key, name text); >> NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index >> "course_pkey" for table "course" >> CREATE TABLE >> test=# create table person_course(person_id int references person, >> course_id int references course ); >> CREATE TABLE >> > > Well, you shouldn't, thats not my problem. > >> >> >> >>> >>> Now, I would like to know for every person the courses they have taken. >> >> >> Similar to your result: >> >> test=# select b.id, b.name, c.id from person_course a left join course >> b on a.course_id=b.id left join person c on a.person_id=c.id; >> id | name | id >> ----+--------------+---- >> 1 | SQL Beginner | 1 >> 1 | SQL Beginner | 3 >> 2 | SQL Advanced | 3 >> (3 rows) > > > This is absolutely not what I want. I want a row for every person and > every course, regardless whether the person has taken the course or not. > If the person has not taken the course, I want a null value in the > person id column: > > SELECT c.id, c.name, pc.person_id > FROM person as p > left outer join person_course as pc on p.id = pc.person_id > right outer join course as c on pc.course_id = c.id > where p.id = 2 order by 1; > > +---+-----------------------------------------+----------+ > | id| name | person_id| > +---+-----------------------------------------+----------+ > | 1 | SQL Beginner | | > | 2 | SQL Advanced | | > +---+-----------------------------------------+----------+ > > Note here that I restrict my select to the person with the ID 2. Since > this person has not taken any course, the person_id is null. If I leave > the restriction on the person, I get person times courses rows, the > person_id only filled when a person has actually taken a course. > > With the rows you added person_course and without restrictin to a > specific person, the result of your query should be: > > > +---+-----------------------------------------+------------+ > | id| name | person_id | > +---+-----------------------------------------+------------+ > | 1 | SQL Beginner | 1 | > | 1 | SQL Beginner | | > | 1 | SQL Beginner | 3 | > | 2 | SQL Advanced | | > | 2 | SQL Advanced | | > | 2 | SQL Advanced | 3 | > +---+-----------------------------------------+------------+ > > In mysql, you get this with the following clause: > > SELECT c.id, c.name, pc.person_id > FROM person as p > left outer join person_course as pc on p.id = pc.person_id > right outer join course as c on pc.course_id = c.id > order by 1; > >> Please, read more about referential integrity. > > > Thanks, but I already have read a lot about it 14 years ago. > > Bye > Tim > In all your long years of experience, perhaps you haven't come across this? http://catb.org/~esr/faqs/smart-questions.html If you're going to ask a question here the least you could do is meet us half-way. b
Tim Tassonis <timtas@cubic.ch> writes:
> In mysql, the following statement:
> SELECT c.id, c.name, pc.person_id
> FROM person as p
> left outer join person_course as pc on p.id = pc.person_id
> right outer join course as c on pc.course_id = c.id
> where p.id = 2 order by 1;
> will get me the following result:
> +---+-----------------------------------------+----------+
> | id| name | person_id|
> +---+-----------------------------------------+----------+
> | 1 | SQL Beginner | |
> | 2 | SQL Advanced | |
> +---+-----------------------------------------+----------+
Really? It would be unbelievably broken if so, but a quick experiment
with mysql 5.0.27 says they return an empty set same as us.
You *would* get that answer without the WHERE clause, but neither of
those rows meet the WHERE. Look at the complete join output:
regression=# SELECT *
FROM person as p
left outer join person_course as pc on p.id = pc.person_id
right outer join course as c on pc.course_id = c.id
;
id | name | person_id | course_id | id | name
----+------+-----------+-----------+----+--------------
| | | | 1 | SQL Beginner
| | | | 2 | SQL Advanced
(2 rows)
The person-left-join-person_course join produces rows, but none of them
can match course during the right join, so they don't get through.
I think what you want might be a full join for the second step:
regression=# SELECT *
FROM person as p
left outer join person_course as pc on p.id = pc.person_id
full outer join course as c on pc.course_id = c.id
;
id | name | person_id | course_id | id | name
----+------+-----------+-----------+----+--------------
1 | Jack | | | |
2 | Jill | | | |
3 | Bob | | | |
| | | | 1 | SQL Beginner
| | | | 2 | SQL Advanced
(5 rows)
regression=# SELECT *
FROM person as p
left outer join person_course as pc on p.id = pc.person_id
full outer join course as c on pc.course_id = c.id
where p.id = 2 order by 1;
id | name | person_id | course_id | id | name
----+------+-----------+-----------+----+------
2 | Jill | | | |
(1 row)
regression=# insert into person_course values(2,2);
INSERT 0 1
regression=# SELECT *
FROM person as p
left outer join person_course as pc on p.id = pc.person_id
full outer join course as c on pc.course_id = c.id
where p.id = 2 order by 1;
id | name | person_id | course_id | id | name
----+------+-----------+-----------+----+--------------
2 | Jill | 2 | 2 | 2 | SQL Advanced
(1 row)
BTW, I tried to duplicate this in mysql and was surprised to find that
5.0.27 doesn't seem to support full join at all :-(
regards, tom lane
Tim Tassonis <timtas@cubic.ch> schrieb: > This is absolutely not what I want. I want a row for every person and every > course, regardless whether the person has taken the course or not. If the > person has not taken the course, I want a null value in the person id > column: test=# select c.id, c.name, pc.person_id from course c cross join person p left outer join person_course pc on (p.id,c.id)=(pc.person_id,pc.course_id); id | name | person_id ----+--------------+----------- 1 | SQL Beginner | 1 1 | SQL Beginner | 1 | SQL Beginner | 3 2 | SQL Advanced | 2 | SQL Advanced | 2 | SQL Advanced | 3 (6 rows) But i think, this is a little bit stupid, because row 4 and 5 are the same. Perhaps this would be better: test=# select c.id, c.name, p.id as person, pc.person_id from course c cross join person p left outer join person_course pc on (p.id,c.id)=(pc.person_id,pc.course_id); id | name | person | person_id ----+--------------+--------+----------- 1 | SQL Beginner | 1 | 1 1 | SQL Beginner | 2 | 1 | SQL Beginner | 3 | 3 2 | SQL Advanced | 1 | 2 | SQL Advanced | 2 | 2 | SQL Advanced | 3 | 3 (6 rows) Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknow) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
Thanks to you all for your replies. I was able to solve my problem after
some more reading in the manual:
select c.id, c.name, pc.person_id
from person as p
cross join course as c
left outer join person_course as pc
on (p.id = pc.person_id and c.id = pc.course_id)
where p.id = 2;
A few remarks to the answers:
I seem to have been misunderstood in a way that people understood that I
implied that mysql is right and postgres is wrong. This was in no way my
opinion. I just reported what results I got under mysql. I very much
prefer Postgres over mysql and never questioned postgres' correctness.
To brian:
>>> Please, read more about referential integrity.
>>
>>
>> Thanks, but I already have read a lot about it 14 years ago.
>>
>> Bye
>> Tim
>>
>
> In all your long years of experience, perhaps you haven't come across this?
>
> http://catb.org/~esr/faqs/smart-questions.html
>
> If you're going to ask a question here the least you could do is meet us half-way.
>
I think I asked quite politely, did not blame anybody else and just
described my problem. I agree, I felt a bit insulted when being told to
read about referential integrity, because that clearly had nothing to do
with my question and I do know about it. I'm coming from Oracle and not
from mysql.
And thanks Andreas, I just saw your latest post which has the same
solution as I got in it.
Bye
Tim