Обсуждение: [GENERAL] why isn't this subquery wrong?

Поиск
Список
Период
Сортировка

[GENERAL] why isn't this subquery wrong?

От
jonathan vanasco
Дата:

I ran into an issue while changing a database schema around.  Some queries still worked, even though I didn't expect them to.

Can anyone explain to me why the following is valid (running 9.6) ?

schema
CREATE TEMPORARY TABLE example_a__data (
foo_id INT,
bar_id INT
);
CREATE TEMPORARY TABLE example_a__rollup_source (
id int primary key,
name varchar(64),
foo_id INT,
check_bool BOOLEAN
);
CREATE TEMPORARY TABLE example_a__rollup AS
SELECT id, name, foo_id
FROM example_a__rollup_source
WHERE check_bool IS TRUE
;

query:
SELECT foo_id
FROM example_a__data
WHERE foo_id IN (SELECT bar_id FROM example_a__rollup)
;

a raw select of `SELECT bar_id FROM example_a__rollup;` will cause an error because bar_id doesn't exist

postgres doesn't raise an error because example_a__data does have a bar_id -- but example_a__rollup doesn't and there's no explicit correlation in the query.

can someone explain why this happens?  i'm guessing there is a good reason -- but I'm unfamiliar with the type of implicit join/queries this behavior is enabling.


Re: [GENERAL] why isn't this subquery wrong?

От
Steve Crawford
Дата:
On Thu, Apr 20, 2017 at 3:17 PM, jonathan vanasco <postgres@2xlp.com> wrote:

I ran into an issue while changing a database schema around.  Some queries still worked, even though I didn't expect them to.

Can anyone explain to me why the following is valid (running 9.6) ?

schema
CREATE TEMPORARY TABLE example_a__data (
foo_id INT,
bar_id INT
);
CREATE TEMPORARY TABLE example_a__rollup_source (
id int primary key,
name varchar(64),
foo_id INT,
check_bool BOOLEAN
);
CREATE TEMPORARY TABLE example_a__rollup AS
SELECT id, name, foo_id
FROM example_a__rollup_source
WHERE check_bool IS TRUE
;

query:
SELECT foo_id
FROM example_a__data
WHERE foo_id IN (SELECT bar_id FROM example_a__rollup)
;

a raw select of `SELECT bar_id FROM example_a__rollup;` will cause an error because bar_id doesn't exist

postgres doesn't raise an error because example_a__data does have a bar_id -- but example_a__rollup doesn't and there's no explicit correlation in the query.

can someone explain why this happens?  i'm guessing there is a good reason -- but I'm unfamiliar with the type of implicit join/queries this behavior is enabling.



There is no requirement in this query that bar_id be in the example_a__rollup table and since it is only in one table it is unambiguous so the server doesn't complain.

It may be explanatory to add a couple records to your example_a_rollup table:

insert into example_a__data values (3,4),(5,6);

Then run a simple select showing what the where clause would see:

SELECT,
   foo_id, 
   (SELECT bar_id FROM example_a__rollup) 
FROM
    example_a__data 
;

 foo_id | bar_id                                                                                           --------+-------- 
     3 | 
     5 |      

bar_id is null because there are no rows in example_a_rollup.

Now add a single record to example_a_rollup:

insert into example_a__rollup (id) values (10);

Rerun the query and you will get:

 foo_id | bar_id  
--------+--------
     3 |      4
     5 |      6


If you add another record to example_a__rollup and run it and you will get:

ERROR:  more than one row returned by a subquery used as an expression


Although the subquery won't work as an expression it would still work in a the where clause but I doubt it will return what you desire. Unfortunately there are lots of ways to write syntactically correct but logically flawed statements.


Cheers,
Steve

Re: [GENERAL] why isn't this subquery wrong?

От
"David G. Johnston"
Дата:
On Thu, Apr 20, 2017 at 3:17 PM, jonathan vanasco <postgres@2xlp.com> wrote:
postgres doesn't raise an error because example_a__data does have a bar_id -- but example_a__rollup doesn't and there's no explicit correlation in the query.


​Subqueries can see all columns of the parent.  When the subquery actually uses one of them it is called a "correlated subquery".

See another recent question on this topic for more details:


Unfortunately at present everyone here seems content answering this on the list semi-frequently instead of updating the docs and/or FAQ section of our wiki.  The topic is covered (I think...) but not easily found (I failed to just now...) - and harder when you don't know what to call it in the first place.

David J.

Re: [GENERAL] why isn't this subquery wrong?

От
Tom Lane
Дата:
jonathan vanasco <postgres@2xlp.com> writes:
> Can anyone explain to me why the following is valid (running 9.6) ?

> SELECT foo_id
>     FROM example_a__data
>     WHERE foo_id IN (SELECT bar_id FROM example_a__rollup)
>     ;

Per the SQL standard, bar_id is interpreted as an "outer reference"
to example_a__data.bar_id.  This trips people up all the time, but
(a) it's required by spec and (b) there are cases where it's really
hard to do what you want without an outer reference.

Cautious SQL programmers qualify all references inside sub-selects
to avoid getting caught by this accidentally.

            regards, tom lane


Re: [GENERAL] why isn't this subquery wrong?

От
jonathan vanasco
Дата:
thanks all!

On Apr 20, 2017, at 6:42 PM, David G. Johnston wrote:

​Subqueries can see all columns of the parent.  When the subquery actually uses one of them it is called a "correlated subquery".

i thought a correlated subquery had to note that table/alias, not a raw column.   I guess i've just been adhering to good form.


On Apr 20, 2017, at 6:43 PM, Tom Lane wrote:

Cautious SQL programmers qualify all references inside sub-selects to avoid getting caught by this accidentally.

is there a syntax to qualify a reference to lock a subquery to the current scope (disable looking at the parents)?  that's how I got caught on this by accident.

Re: [GENERAL] why isn't this subquery wrong?

От
Steve Crawford
Дата:


On Thu, Apr 20, 2017 at 3:56 PM, jonathan vanasco <postgres@2xlp.com> wrote:
thanks all!

On Apr 20, 2017, at 6:42 PM, David G. Johnston wrote:

​Subqueries can see all columns of the parent.  When the subquery actually uses one of them it is called a "correlated subquery".

i thought a correlated subquery had to note that table/alias, not a raw column.   I guess i've just been adhering to good form.


On Apr 20, 2017, at 6:43 PM, Tom Lane wrote:

Cautious SQL programmers qualify all references inside sub-selects to avoid getting caught by this accidentally.

is there a syntax to qualify a reference to lock a subquery to the current scope (disable looking at the parents)?  that's how I got caught on this by accident.

Like Tom said, "qualify all references":

...(SELECT example_a__rollup.bar_id FROM example_a__rollup)...

Or shortened with alises:

...(SELECT x.bar_id FROM example_a__rollup x)...

Cheers,
Steve

Re: [GENERAL] why isn't this subquery wrong?

От
"David G. Johnston"
Дата:
On Thu, Apr 20, 2017 at 3:17 PM, jonathan vanasco <postgres@2xlp.com> wrote:

SELECT foo_id
FROM example_a__data
WHERE foo_id IN (SELECT bar_id FROM example_a__rollup)
;


​Or write it the idiomatic way (i.e., as a proper semi-join):

​SELECT foo_id
FROM example_a__data
WHERE EXISTS (SELECT 1 
                                 FROM example_a__rollup
                              WHERE example_a__rollup.bar_id = example_a__data.foo_id)

IME its harder to forget the table prefix when using this form.

David J.

Re: [GENERAL] why isn't this subquery wrong?

От
"David G. Johnston"
Дата:
On Thu, Apr 20, 2017 at 3:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
jonathan vanasco <postgres@2xlp.com> writes:
> Can anyone explain to me why the following is valid (running 9.6) ?

> SELECT foo_id
>       FROM example_a__data
>       WHERE foo_id IN (SELECT bar_id FROM example_a__rollup)
>       ;

Per the SQL standard, bar_id is interpreted as an "outer reference"
to example_a__data.bar_id.  This trips people up all the time, but
(a) it's required by spec and (b) there are cases where it's really
hard to do what you want without an outer reference.

Cautious SQL programmers qualify all references inside sub-selects
to avoid getting caught by this accidentally.

​I added an FAQ entry to the wiki​ for this question:


David J.