Re: Delete performance
От | Phillip Sitbon |
---|---|
Тема | Re: Delete performance |
Дата | |
Msg-id | 536685ea0906151134o53c5627ep36167f0511c08953@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Delete performance (Jana <jana.vasseru@gmail.com>) |
Ответы |
Re: Delete performance
(Tom Lane <tgl@sss.pgh.pa.us>)
|
Список | pgsql-novice |
I've had to do this quite a bit, and here's how I usually go about it: DELETE FROM data_structures_items WHERE NOT EXISTS (SELECT 1 FROM data_structures WHERE id_structure=id_data_structure LIMIT 1); Even when the item in the subquery is a primary key, I find it semantically informative to always use "LIMIT 1". Looking closer at your structure, this looks a lot like a problem I have had in the past with almost the exact same scenario. What I ended up doing beyond the above suggested query was to create a temporary table. Since you're lucky enough to be working with a primary key in the large table (I wasn't), you can copy all of the IDs into temp table. Then delete from the temp table anything that appears in your data_structures IDs. I'm not sure if it's necessary, but you could also create an index on the temp table. So your query becomes: DELETE FROM data_structures_items WHERE EXISTS (SELECT 1 FROM tmp_not_in_data_structures WHERE id_structure=id_data_structure LIMIT 1); This should help if the set of IDs to delete is smaller than the entire set of IDs. It works for me, but I can't be sure it'll help you - my main goal was to minimize the condition checking overhead for each row of the big table. Also, I'm not sure if it's valid/possible/better, but instead of EXISTS, "IS NOT NULL" might work here as well. If your indexes are rather large (and therefore may not fit into memory), you might consider partitioning. Along with the above solution, I use partitions and operate on them in parallel from a Python client. Cheers, Phillip 6/15 Jana <jana.vasseru@gmail.com>: > On Sun, 14 Jun 2009 18:12:50 +0200, Frank Bax <fbax@sympatico.ca> wrote: > >> Jana wrote: >>> >>> Hello, >>> i have a table with about 250m records from which i want to delete thoose >>> not contained in other table. I used this SQL query: >>> DELETE FROM data_structures_items WHERE id_data_structure NOT IN ( >>> SELECT id_structure FROM data_structures); >> >> >> DELETE FROM data_structures_items, data_structures WHERE >> data_structures_items.id_data_structure = data_structures.id_data_structure >> AND data_structures_items.id_data_structure IS NULL; >> > > Thanks for answer, this however is not a valid syntax (at least according to > manual, and my 8.3) > version. What could be done is > > DELETE FROM data_structures_items USING data_structures > WHERE > data_structures_items.id_data_structure=data_structures.id_structure > AND something > > but the problem is in that "something". I cannot write > data_structures_items.id_data_structure = > data_structures.id_data_structure AND > data_structures_items.id_data_structure IS NULL; > > because it a) doesn't make sense (column can't be null and equal to > something at the same time), b) doesn't select what i want (rows whoose > id_data_structure is NOT in the data_structures table, i'm pretty sure it > is a number ). I can't join tables in DELETE command, and i can't think of a > way doing it with WHERE > > Regards, > Jana > > -- > Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-novice >
В списке pgsql-novice по дате отправления: