Обсуждение: Weird (?) problem with order of conditions in SELECT
Hello all,
I have encountered a weird problem I can't seem to understand. It involves
a correlated subquery, where the rows returned seem to depend upon the order
I specify my conditions. I can't see why the order should be important
(except maybe for performance)
To demonstrate I created a toy data set and query (note, both are completely
contrived so neither really 'makes sense' beyond this context). I present
first two queries which return different results, but differ *only in the
order I specify conditions*, along with the results of the queries. (The
toy data set is at the very end of the email)
SELECT c.score FROM c
WHERE c.score >= (SELECT MAX(score) AS score FROM c
WHERE a.name='nugget'
AND a.job='programmer'
AND a.a_id=b.a_id AND c.b_id = b.b_id)
AND a.name='nugget'
AND a.job='programmer'
AND a.a_id=b.a_id
AND c.b_id = b.b_id;
score
-------
2500
(1 row)
* NOTE, this query returns only one row.
SELECT c.score FROM c
WHERE a.name='nugget'
AND a.job='programmer'
AND a.a_id=b.a_id
AND c.b_id = b.b_id
AND c.score >= (SELECT MAX(score) AS score FROM c
WHERE a.name='nugget'
AND a.job='programmer'
AND a.a_id=b.a_id
AND c.b_id = b.b_id);
score
-------
100
2500
(2 rows)
* NOTE, now TWO rows are returned, even though all that has happened is I've
changed the order of the conditions.
What's going on here ? Why should the order of conditions be important ??
Any insight would be greatly appreciated.
Thanks,
terry
CREATE TABLE people (name text not null PRIMARY KEY, age int);
insert into people values ('nugget', 33);
insert into people values ('lisa', 32);
insert into people values ('larry', 28);
CREATE TABLE a (a_id INT NOT NULL PRIMARY KEY, job TEXT NOT NULL, name TEXT
NOT NULL, FOREIGN KEY (name) REFERENCES people(name));
INSERT INTO a VALUES (1, 'programmer', 'nugget');
INSERT INTO a VALUES (2, 'programmer', 'lisa');
INSERT INTO a VALUES (2, 'secretary', 'lisa');
INSERT INTO a VALUES (3, 'secretary', 'lisa');
INSERT INTO a VALUES (4, 'student', 'larry');
INSERT INTO a VALUES (5, 'programmer', 'nugget');
CREATE TABLE b (b_id INT NOT NULL PRIMARY KEY, a_id INT NOT NULL, FOREIGN
KEY (a_id) REFERENCES a(a_id));
INSERT INTO b (a_id, b_id) VALUES (1,1);
INSERT INTO b (a_id, b_id) VALUES (2,2);
INSERT INTO b (a_id, b_id) VALUES (3,3);
INSERT INTO b (a_id, b_id) VALUES (4,4);
INSERT INTO b (a_id, b_id) VALUES (5,5);
CREATE TABLE c (c_id INT NOT NULL PRIMARY KEY, b_id INT NOT NULL, score INT
NOT NULL, FOREIGN KEY (b_id) REFERENCES b(b_id));
INSERT INTO c (b_id, c_id, score) VALUES (1,1,100);
INSERT INTO c (b_id, c_id, score) VALUES (2,2,400);
INSERT INTO c (b_id, c_id, score) VALUES (3,3,900);
INSERT INTO c (b_id, c_id, score) VALUES (4,4,1600);
INSERT INTO c (b_id, c_id,score) VALUES (5,5,2500);
"Mark, Terry" <tmark@amgen.com> writes:
> I have encountered a weird problem I can't seem to understand. It involves
> a correlated subquery, where the rows returned seem to depend upon the order
> I specify my conditions. I can't see why the order should be important
> (except maybe for performance)
> SELECT c.score FROM c
> WHERE c.score >= (SELECT MAX(score) AS score FROM c
> WHERE a.name='nugget'
> AND a.job='programmer'
> AND a.a_id=b.a_id AND c.b_id = b.b_id)
> AND a.name='nugget'
> AND a.job='programmer'
> AND a.a_id=b.a_id
> AND c.b_id = b.b_id;
> SELECT c.score FROM c
> WHERE a.name='nugget'
> AND a.job='programmer'
> AND a.a_id=b.a_id
> AND c.b_id = b.b_id
> AND c.score >= (SELECT MAX(score) AS score FROM c
> WHERE a.name='nugget'
> AND a.job='programmer'
> AND a.a_id=b.a_id
> AND c.b_id = b.b_id);
This is a little less mysterious if you run it under 7.1, because 7.1
emits some warning notices:
NOTICE: Adding missing FROM-clause entry in subquery for table "a"
NOTICE: Adding missing FROM-clause entry in subquery for table "b"
NOTICE: Adding missing FROM-clause entry for table "a"
NOTICE: Adding missing FROM-clause entry for table "b"
score
-------
2500
(1 row)
NOTICE: Adding missing FROM-clause entry for table "a"
NOTICE: Adding missing FROM-clause entry for table "b"
score
-------
100
2500
(2 rows)
From this we can infer that Postgres is actually interpreting the first
query as
SELECT c.score FROM a,b,c
WHERE c.score >= (SELECT MAX(score) AS score FROM a,b,c
WHERE a.name='nugget'
AND a.job='programmer'
AND a.a_id=b.a_id AND c.b_id = b.b_id)
AND a.name='nugget'
AND a.job='programmer'
AND a.a_id=b.a_id
AND c.b_id = b.b_id;
whereas the second one is being interpreted as
SELECT c.score FROM a,b,c
WHERE a.name='nugget'
AND a.job='programmer'
AND a.a_id=b.a_id
AND c.b_id = b.b_id
AND c.score >= (SELECT MAX(score) AS score FROM c
WHERE a.name='nugget'
AND a.job='programmer'
AND a.a_id=b.a_id
AND c.b_id = b.b_id);
That is, in the second case the sub-select's references to A and B are
being taken as outer references to the current A and B rows of the outer
query, whereas in the first case the sub-select is interpreted as a
completely independent query.
I am not sure which interpretation you were actually intending.
This example shows one reason why the "implicit FROM item" feature of
Postgres is confusing and has come to be deprecated: it's not always
clear which FROM list an implicit item should be added to. We've
started to emit a warning about use of this feature in 7.1, and perhaps
someday it will be removed entirely.
regards, tom lane
Many thanks to Tom for his reply. My intention was to have the subquery
treated as a completely independent query.
Should I understand, then, that by explicitly naming all the involved table
portions in the subequery, that the subquery is guaranteed to be treated
independently ?
terry
-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, March 20, 2001 11:42 AM
To: Mark, Terry
Cc: pgsql-novice@postgresql.org
Subject: Re: [NOVICE] Weird (?) problem with order of conditions in
SELECT
"Mark, Terry" <tmark@amgen.com> writes:
> I have encountered a weird problem I can't seem to understand. It
involves
> a correlated subquery, where the rows returned seem to depend upon the
order
> I specify my conditions. I can't see why the order should be important
> (except maybe for performance)
> SELECT c.score FROM c
> WHERE c.score >= (SELECT MAX(score) AS score FROM c
> WHERE a.name='nugget'
> AND a.job='programmer'
> AND a.a_id=b.a_id AND c.b_id = b.b_id)
> AND a.name='nugget'
> AND a.job='programmer'
> AND a.a_id=b.a_id
> AND c.b_id = b.b_id;
> SELECT c.score FROM c
> WHERE a.name='nugget'
> AND a.job='programmer'
> AND a.a_id=b.a_id
> AND c.b_id = b.b_id
> AND c.score >= (SELECT MAX(score) AS score FROM c
> WHERE a.name='nugget'
> AND a.job='programmer'
> AND a.a_id=b.a_id
> AND c.b_id = b.b_id);
This is a little less mysterious if you run it under 7.1, because 7.1
emits some warning notices:
NOTICE: Adding missing FROM-clause entry in subquery for table "a"
NOTICE: Adding missing FROM-clause entry in subquery for table "b"
NOTICE: Adding missing FROM-clause entry for table "a"
NOTICE: Adding missing FROM-clause entry for table "b"
score
-------
2500
(1 row)
NOTICE: Adding missing FROM-clause entry for table "a"
NOTICE: Adding missing FROM-clause entry for table "b"
score
-------
100
2500
(2 rows)
From this we can infer that Postgres is actually interpreting the first
query as
SELECT c.score FROM a,b,c
WHERE c.score >= (SELECT MAX(score) AS score FROM a,b,c
WHERE a.name='nugget'
AND a.job='programmer'
AND a.a_id=b.a_id AND c.b_id = b.b_id)
AND a.name='nugget'
AND a.job='programmer'
AND a.a_id=b.a_id
AND c.b_id = b.b_id;
whereas the second one is being interpreted as
SELECT c.score FROM a,b,c
WHERE a.name='nugget'
AND a.job='programmer'
AND a.a_id=b.a_id
AND c.b_id = b.b_id
AND c.score >= (SELECT MAX(score) AS score FROM c
WHERE a.name='nugget'
AND a.job='programmer'
AND a.a_id=b.a_id
AND c.b_id = b.b_id);
That is, in the second case the sub-select's references to A and B are
being taken as outer references to the current A and B rows of the outer
query, whereas in the first case the sub-select is interpreted as a
completely independent query.
I am not sure which interpretation you were actually intending.
This example shows one reason why the "implicit FROM item" feature of
Postgres is confusing and has come to be deprecated: it's not always
clear which FROM list an implicit item should be added to. We've
started to emit a warning about use of this feature in 7.1, and perhaps
someday it will be removed entirely.
regards, tom lane