On Mon, Oct 5, 2009 at 2:52 PM, Matthew Wakeling <matthew@flymine.org> wrote:
mnw21-modmine-r13features-copy=# select count(*) from project; count ------- 10 (1 row)
mnw21-modmine-r13features-copy=# select count(*) from intermineobject; count ---------- 26344616 (1 row)
mnw21-modmine-r13features-copy=# \d intermineobject; Table "public.intermineobject" Column | Type | Modifiers --------+---------+----------- object | text | id | integer | not null class | text | Indexes: "intermineobject_pkey" UNIQUE, btree (id)
mnw21-modmine-r13features-copy=# explain select * from project where id NOT IN (SELECT id FROM intermineobject); QUERY PLAN ------------------------------------------------------------------------------------ Seq Scan on project (cost=1476573.93..1476575.05 rows=5 width=183) Filter: (NOT (hashed SubPlan 1)) SubPlan 1 -> Seq Scan on intermineobject (cost=0.00..1410720.74 rows=26341274 width=4) (4 rows)
This query plan seems to me to be a little slow. Surely it could iterate through the ten project rows and perform ten index lookups in the big table?
try using join instead of 'not in'..
select p.* from project p left join intermineobject i on i.id=p.id where i.id is null;