How can I prevent duplicate inserts by multiple concurrent threads ?

Поиск

Список

Период

Сортировка

От	Hursh Jain
Тема	How can I prevent duplicate inserts by multiple concurrent threads ?
Дата	7 января 2015 г. 19:29:44
Msg-id	54AD891D.9000508@beesell.com обсуждение исходный текст
Список	pgsql-novice

Дерево обсуждения

I am running postgres 9.3.x and am feeling a little confused about
transactions, isolations and locks, as they pertain to preventing
duplicate concurrent *inserts* of the same data. No updates, only inserts.

I am using JDBC and using the following code pattern, invoked from
multiple front-end web server threads:

getData()
  1...get JDBC connection (using default transaction isolation).
  2. START TRANSACTION
  3....some prelim processing...
  4. for a bunch of data chunks (say 100 different chunks),
   4a...check if data chunk already exists in DB ?
      QUERY: select count(*) from table where data_chunk_id = 123
   4b...if not exist (count is 0), create data and insert data into table
   4c...else if already exists (count > 0), retrieve existing data from
table
  5. Add up all the chunks, and create a JSON string for the client
  6. COMMIT TRANSACTION
  7. write JSON to client

We really only need to create/save these chunks the first time they are
requested and then send only the already created ones thereafter. The
issue is concurrent threads are running inside getData() the very first
time the request is made (so the initial request, like subsequent
requests, can be made by multiple clients at the same time).

So, then:

- Thread #1 is in step 4a and sees no data so inserts data into table.
- Before it reaches Step 4b and commits, Thread #2 reaches step 4a as well.
- Since Thread #1 has not committed yet, Thread #2 does not see any data
in the table and also inserts data into the table.
- Both threads insert the data and I get duplicate rows. (sometimes
N-duplicates, for N threads).

Any suggestions, tips ? Am I at least thinking along the right lines ?
Should I be getting a table lock instead ? What is the recommended way
to prevent duplicate inserts in a concurrent situation like this ?
(again, there are no updates, only one-time inserts if that data does
not exist already).

I'm thinking, I could do this:

Option (A)
getData()
   1...get JDBC connection (using default transaction isolation).
   2. START TRANSACTION
   3. ==> SET SAVEPOINT A
   4. for a bunch of data chunks (say 100 different chunks),
    4a...check if this data chunk already exists in DB, get ROW LOCK
      Query: select data_chunk_id from table where data_chunk_id = 123
FOR UPDATE
    4b...if null row returned, create data and insert data into table
    4c...else use already retrieved row from table
    4d...COMMIT TO SAVEPOINT A
   5. Add up all the chunks, and create a JSON string for the client
   6. COMMIT TRANSACTION
   7. write JSON to client

Another option

Option (B)
getData()
   1...get JDBC connection (using default transaction isolation).
   2. start transaction SERIALIZABLE
   3. ==> SET SAVEPOINT A
   4. for a bunch of data chunks (say 100 different chunks),
    4a. always create data and insert data into table
    4c...if error, ROLLBACK to A, else COMMIT
   5. Add up all the chunks, and create a JSON string for the client
   6. COMMIT TRANSACTION
   7. write JSON to client

This does imply I need to add a uniqueness constraint to the data (which
I can, although right now there are no constraints).

Any feedback appreciated...

Best,
--j

В списке pgsql-novice по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

How can I prevent duplicate inserts by multiple concurrent threads ?