Обсуждение: Re: How can I prevent duplicate inserts by multiple concurrent threads ?
Sean Davis wrote:
Right, maybe that's what I should be doing. The reason I didn't do that initially was because creating new data can be *very* expensive. We have about a 100 different chunks and if say 10 threads hit the database and each thread re-creates 100 chunks, we will end up trying to save 100*10 (1000) chunks, 900 of which will be rightly rejected because of the uniqueness constraint, so that time spent in creating those chunks would have been wasted to the annoyance of the clients.
It would be nice to see if that data already exists before we try to re-insert it....but how ? the "select for update" doesn't seem to be working as I hoped..
Best,
--j
Any suggestions, tips ? Am I at least thinking along the right lines ?
Should I be getting a table lock instead ? What is the recommended way
to prevent duplicate inserts in a concurrent situation like this ?Is there a reason that you cannot use a UNIQUE constraint?
Right, maybe that's what I should be doing. The reason I didn't do that initially was because creating new data can be *very* expensive. We have about a 100 different chunks and if say 10 threads hit the database and each thread re-creates 100 chunks, we will end up trying to save 100*10 (1000) chunks, 900 of which will be rightly rejected because of the uniqueness constraint, so that time spent in creating those chunks would have been wasted to the annoyance of the clients.
It would be nice to see if that data already exists before we try to re-insert it....but how ? the "select for update" doesn't seem to be working as I hoped..
Best,
--j
--As of January 7, 2015 3:01:42 PM -0500, Hursh Jain is alleged to have said: > Right, maybe that's what I should be doing. The reason I didn't do that > initially was because creating new data can be *very* expensive. We have > about a 100 different chunks and if say 10 threads hit the database and > each thread re-creates 100 chunks, we will end up trying to save 100*10 > (1000) chunks, 900 of which will be rightly rejected because of the > uniqueness constraint, so that time spent in creating those chunks would > have been wasted to the annoyance of the clients. > > It would be nice to see if that data already exists before we try to > re-insert it....but how ? the "select for update" doesn't seem to be > working as I hoped.. --As for the rest, it is mine. It won't completely solve the issue, but take a look at savepoints and transactions. My flow would probably be: Start transaction Select row to see if data exists If so, skip If not, create data and insert End transaction If you can generate the key without generating the data, I'd probably do: Start transaction Select row to see if data exists If not, insert key, else end transaction Set savepoint Create data. Update to key, rolling back to savepoint and restarting if there is an issue. End transaction (Though the 'select row' and 'insert key' of the second could probably be combined to just 'insert key'.) Each transaction will get you an internally-consistent view of the database, so the select will be on the same dataset you eventually insert into. My other question from your comments above is if this should all actually be in *one* table? If you are collecting a lot of different information from different sources, it might work better to have tables with that data, and just reference them with this table - which would be quicker and cheaper to update then. (I obviously don't know the situation - I acknowledge that you might to be able to do that for any of a hundred reasons, or that the gathering and correlating might be generating something more unique to each row. It's just a thought from the simplified description you've given.) Daniel T. Staal --------------------------------------------------------------- This email copyright the author. Unless otherwise noted, you are expressly allowed to retransmit, quote, or otherwise use the contents for non-commercial purposes. This copyright will expire 5 years after the author's death, or in 30 years, whichever is longer, unless such a period is in excess of local copyright law. ---------------------------------------------------------------