> Thanks for the report, but I think your subject line is a little
> exaggerated.
Sorry for that :-)
> I'd not actually thought about expanding this to IN and NOT IN, but
> likely it would be possible, providing NULLs could be handled
> correctly too. Was this the optimisation you think is missing?
No - let's consider a real-world example where the problem becomes more
evident. Assume we have the following two tables:
CREATE TABLE book (id SERIAL NOT NULL, name VARCHAR, authorId SERIAL,
CONSTRAINT book_pkey PRIMARY KEY (id));
CREATE TABLE author (id SERIAL NOT NULL, surname VARCHAR, CONSTRAINT
author_pkey PRIMARY KEY (id));
All relevant columns are indexed. The book table contains 1.3 M rows.
Now let's say we want to find books by either name or by author name. A
simple problem, actually. The following query is used (which could of
course also be rewritten to use a JOIN instead, but that's not the point
here). We find out that it has an unexpected and unacceptably long
execution time:
EXPLAIN SELECT * FROM book WHERE name = 'Harry Potter' OR authorId IN
(SELECT id FROM author WHERE surname = 'Rowling');
QUERY PLAN
-------------------------------------------------------------------------------------------
Seq Scan on book (cost=13.68..26792.88 rows=576709 width=40)
Filter: (((name)::text = 'Harry Potter'::text) OR (hashed SubPlan 1))
SubPlan 1
-> Bitmap Heap Scan on author (cost=4.20..13.67 rows=6 width=4)
Recheck Cond: ((surname)::text = 'Rowling'::text)
-> Bitmap Index Scan on author_surname_index
(cost=0.00..4.20 rows=6 width=0)
Index Cond: ((surname)::text = 'Rowling'::text)
In contrast to that, consider the query plan for the following logically
equivalent query, which executes in just a few milliseconds (!):
EXPLAIN SELECT * FROM book WHERE name = 'Harry Potter' UNION SELECT *
FROM book WHERE authorId IN (SELECT id FROM author WHERE surname =
'Rowling');
QUERY PLAN
------------------------------------------------------------------------------------------------------------
HashAggregate (cost=454.87..455.99 rows=112 width=29)
Group Key: book.id, book.name, book.authorid
-> Append (cost=0.43..454.03 rows=112 width=29)
-> Index Scan using book_name_index on book
(cost=0.43..16.48 rows=3 width=29)
Index Cond: ((name)::text = 'Harry Potter'::text)
-> Nested Loop (cost=4.63..436.43 rows=109 width=29)
-> Bitmap Heap Scan on author (cost=4.20..13.67 rows=6
width=4)
Recheck Cond: ((surname)::text = 'Rowling'::text)
-> Bitmap Index Scan on author_surname_index
(cost=0.00..4.20 rows=6 width=0)
Index Cond: ((surname)::text = 'Rowling'::text)
-> Index Scan using book_authorid_index on book book_1
(cost=0.43..70.28 rows=18 width=29)
Index Cond: (authorid = author.id)