Slow subquery on large dataset
От | bob_bamber@hotmail.com (Bob) |
---|---|
Тема | Slow subquery on large dataset |
Дата | |
Msg-id | b598ae2.0402191407.c1548ea@posting.google.com обсуждение исходный текст |
Ответы |
Re: Slow subquery on large dataset
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-novice |
Hi, I'm having some performance issues when querying a couple of tables containing a large amount of data. Here's the schema: CREATE TABLE capacity_data ( data_id SERIAL, data TEXT, modified TIMESTAMPTZ DEFAULT NOW(), modified_by INTEGER NOT NULL, CONSTRAINT capacity_data_pk PRIMARY KEY (data_id), CONSTRAINT capacity_data_modified_by_fk FOREIGN KEY (modified_by) REFERENCES editors(editor_id) ); CREATE TABLE capacities ( CREATE TABLE capacities ( room_id BIGINT NOT NULL, capacity_type_id BIGINT NOT NULL, data_id BIGINT NOT NULL, modified TIMESTAMPTZ DEFAULT NOW(), modified_by INTEGER NOT NULL, CONSTRAINT capacities_pk PRIMARY KEY (room_id, data_id), CONSTRAINT capacities_room_id_fk FOREIGN KEY (room_id) REFERENCES meeting_rooms(room_id) ON DELETE CASCADE, CONSTRAINT capacities_capacity_type_id_fk FOREIGN KEY (capacity_type_id) REFERENCES capacity_types(capacity_type_id) ON DELETE CASCADE, CONSTRAINT capacities_data_id_fk FOREIGN KEY (data_id) REFERENCES capacity_data(data_id) ON DELETE CASCADE, CONSTRAINT capacities_modified_by_fk FOREIGN KEY (modified_by) REFERENCES editors(editor_id) ); data_id BIGINT NOT NULL, modified TIMESTAMPTZ DEFAULT NOW(), modified_by INTEGER NOT NULL, CONSTRAINT capacities_pk PRIMARY KEY (room_id, data_id), CONSTRAINT capacities_room_id_fk FOREIGN KEY (room_id) REFERENCES meeting_rooms(room_id) ON DELETE CASCADE, CONSTRAINT capacities_capacity_type_id_fk FOREIGN KEY (capacity_type_id) REFERENCES capacity_types(capacity_type_id) ON DELETE CASCADE, CONSTRAINT capacities_data_id_fk FOREIGN KEY (data_id) REFERENCES capacity_data(data_id) ON DELETE CASCADE, CONSTRAINT capacities_modified_by_fk FOREIGN KEY (modified_by) REFERENCES editors(editor_id) ); I'm using a subquery to find all the capacity_data.data_id's that are not in capacities: foo=# SELECT data_id FROM capacity_data WHERE data_id NOT IN (SELECT data_id FROM capacities); However, I have over 15,000 records in capacity_data. Here is the query plan: QUERY PLAN ------------------------------------------------------------------------ Seq Scan on capacity_data (cost=0.00..2086295.56 rows=7538 width=4) Filter: (subplan) SubPlan -> Seq Scan on capacities (cost=0.00..276.75 rows=15075 width=8) (4 rows) A little on the slow side! I have indexes on data_id in both tables (in capacity_data it's the primary key) how can I use them to quickly acheive what I want? Thanks in advance, Bob.
В списке pgsql-novice по дате отправления: