Re: Deduplication and transaction isolation level

Поиск
Список
Период
Сортировка
От Merlin Moncure
Тема Re: Deduplication and transaction isolation level
Дата
Msg-id CAHyXU0zfsrnvXavLvgTUszm65AHwYwmaokEns16DWPfBiqizwg@mail.gmail.com
обсуждение исходный текст
Ответ на Deduplication and transaction isolation level  (François Beausoleil <francois@teksol.info>)
Ответы Re: Deduplication and transaction isolation level  (François Beausoleil <francois@teksol.info>)
Re: Deduplication and transaction isolation level  (Steven Schlansker <steven@likeness.com>)
Список pgsql-general
On Tue, Sep 24, 2013 at 10:19 PM, François Beausoleil
<francois@teksol.info> wrote:
> Hi all!
>
> I import many, many rows of data into a table, from three or more computers, 4 times per hour. I have a primary key,
andthe query I use to import the data is supposed to dedup before inserting, but I still get primary key violations. 
>
> The import process is:
>
> * Load CSV data into temp table
> * INSERT INTO dest SELECT DISTINCT (pkey) FROM temp WHERE NOT EXISTS(temp.pkey = dest.pkey)
>
> I assumed (erroneously) that this would guarantee no duplicate data could make it into the database. The primary key
violationsare proving me wrong. 

Right.  Transaction A and B are interleaved: they both run the same
check against the same id at the same time.  Both checks pass because
neither transaction is committed.  This problem is not solvable by
adjusting the isolation level.

Typical solutions might be to:
A. Lock the table while inserting
B. Retry the transaction following an error.
C. Import the records to a staging table, then copy the do the
deduplication check when moving from the staging table

merlin


В списке pgsql-general по дате отправления:

Предыдущее
От: Dave Cramer
Дата:
Сообщение: Re: reading cvs logs with pgadmin queries
Следующее
От: Guillaume Lelarge
Дата:
Сообщение: Re: streaming replication not working