Обсуждение: BUG #9007: List comparison
The following bug has been logged on the website:
Bug reference: 9007
Logged by: Stefan Kirchev
Email address: stefan.kirchev@gmail.com
PostgreSQL version: 9.1.0
Operating system: Linux Ubuntu Server
Description:
Using two tables to extract differences fails to show any result.
Table `table1` has two column of type integer, table `table2` has the same
structure. Using the following query fails to show the expected result:
select * from table1 where (c1, c2) not in (select c1, c2 from table2);
Adding a dummy condition in the internal query helps in getting results:
select * from table1 where (c1, c2) not in (select c1, c2 from table2 where
c1 <> 0);
Consider c1 does not nave values equal to 0:
pnp=# select c1, c2 from table2 where c1 = 0;
c1 | c2
----+----
(0 rows)
Tested on versions 8.4 and 9.1. Here are the query plans on v8.4:
pnp=# explain select * from table1 where (c1, c2) not in (select c1, c2 from
table2 where c1 <> 0);
QUERY PLAN
--------------------------------------------------------------------
Seq Scan on table1 (cost=290.31..681.84 rows=9951 width=8)
Filter: (NOT (hashed subplan))
SubPlan
-> Seq Scan on table2 (cost=0.00..253.58 rows=14692 width=8)
Filter: (c1 <> 0)
(5 rows)
pnp=# explain select * from table1 where (c1, c2) not in (select c1, c2 from
table2);
QUERY PLAN
--------------------------------------------------------------------
Seq Scan on table1 (cost=253.57..645.11 rows=9951 width=8)
Filter: (NOT (hashed subplan))
SubPlan
-> Seq Scan on table2 (cost=0.00..216.66 rows=14766 width=8)
(4 rows)
pnp=#
In the query plan the rows shown are half of the real table records:
pnp=# select count(*) from table1;
count
-------
20880
(1 row)
pnp=# select count(*) from table2;
count
-------
15557
(1 row)
pnp=#
Is that a buffers size issue? The server is pretty powerful and handles a
much bigger tables of about 100GB without loosing breath.
On Tue, Jan 28, 2014 at 10:56 AM, <stefan.kirchev@gmail.com> wrote:
> select * from table1 where (c1, c2) not in (select c1, c2 from table2);
Note that if there are any (NULL, NULL) values in table2 then NOT IN
*always* returns zero results. This counter-intuitive behavior is
mandated by the SQL specification and also makes the NOT IN clause
hard to optimize. It's not a bug, however.
It's almost always faster and more foolproof to convert them to NOT
EXISTS clauses like this:
select * from table1 where not exists
(select * from table2 where (table1.c1, table1.c2) = (table2.c1,
table2.c2));
db=# create table table1 (c1, c2) as values (1, 1);
db=# create table table2 (c1, c2) as values (2, 2);
db=# select * from table1 where (c1, c2) not in (select c1, c2 from table2);
c1 | c2
----+----
1 | 1
(1 row)
db=# insert into table2 values(null, null);
db=# select * from table1 where (c1, c2) not in (select c1, c2 from table2);
c1 | c2
----+----
(0 rows)
db=# select * from table1 where not exists (select * from table2 where
(table1.c1, table1.c2) = (table2.c1, table2.c2));
c1 | c2
----+----
1 | 1
(1 row)
> Is that a buffers size issue?
No, tuning parameters should never affect the correctness of returned
results, only time.
Regards,
Marti