Обсуждение: BUG #18279: Duplicate key violation and Deadlock when using ON CONFLICT/DO UPDATE with multiple unique indexes
BUG #18279: Duplicate key violation and Deadlock when using ON CONFLICT/DO UPDATE with multiple unique indexes
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 18279 Logged by: Yeongeun Noh Email address: nye7181@gmail.com PostgreSQL version: 16.1 Operating system: MacOS 13.1 Description: I have a table with a primary key and a unique index. First, I ran an INSERT query and then several INSERT INTO ON CONFLICT DO UPDATE queries in parallel. This will almost always result in Duplicate key violations, and sometimes Deadlocks. When I removed the unique index, there was no Duplicate key violation or Deadlock. I have two questions regarding this situation. 1. Since I had already inserted a row, I expected the INSERT INTO ON CONFLICT DO UPDATE queries to only update it. However, if there is a unique index, it seems to try to insert rather than update. 2. Each transaction executes only one query for one row. How does a deadlock occur? Here's the reproduction code. ```sql CREATE TABLE email_stats ( id varchar NOT NULL PRIMARY KEY, user_id integer NOT NULL, date date NOT NULL, count integer NOT NULL ); CREATE UNIQUE INDEX user_id_date ON email_stats(user_id, date); ``` ```sh #!/bin/bash CONNECTION_COUNT=500 QUERY_COUNT_PER_CONNECTION=10 INSERT_QUERY="INSERT INTO email_stats (id, user_id, date, count) VALUES ('2024-01-01_1', 1, '2024-01-01', 1);" ON_CONFLICT_QUERY="INSERT INTO email_stats (id, user_id, date, count) VALUES ('2024-01-01_1', 1, '2024-01-01', 1) ON CONFLICT (id) DO UPDATE SET count = email_stats.count + 1;" perform() { echo "Start" insert_first insert_multiple } insert_first() { psql -h localhost -p 5432 -U channel -d channel -c "$INSERT_QUERY" } insert_multiple() { echo "creating connections..." for ((i = 0; i < $CONNECTION_COUNT; i++)); do ( echo "inserting records for connection $i..." for ((j = 0; j < $QUERY_COUNT_PER_CONNECTION; j++)); do psql -h localhost -p 5432 -U channel -d channel -c "$ON_CONFLICT_QUERY" 1> /dev/null done ) & done wait } perform ``` Here's the server error log. ``` 2024-01-10 06:56:07.473 UTC [4331] ERROR: 23505: duplicate key value violates unique constraint "user_id_date" 2024-01-10 06:56:07.473 UTC [4331] DETAIL: Key (user_id, date)=(1, 2024-01-01) already exists. 2024-01-10 06:56:07.473 UTC [4331] LOCATION: _bt_check_unique, nbtinsert.c:666 2024-01-10 06:56:07.473 UTC [4331] STATEMENT: INSERT INTO email_stats (id, user_id, date, count) VALUES ('2024-01-01_1', 1, '2024-01-01', 1) ON CONFLICT (id) DO UPDATE SET count = email_stats.count + 1; 2024-01-10 06:56:07.645 UTC [4542] ERROR: 23505: duplicate key value violates unique constraint "user_id_date" 2024-01-10 06:56:07.645 UTC [4542] DETAIL: Key (user_id, date)=(1, 2024-01-01) already exists. 2024-01-10 06:56:07.645 UTC [4542] LOCATION: _bt_check_unique, nbtinsert.c:666 2024-01-10 06:56:07.645 UTC [4542] STATEMENT: INSERT INTO email_stats (id, user_id, date, count) VALUES ('2024-01-01_1', 1, '2024-01-01', 1) ON CONFLICT (id) DO UPDATE SET count = email_stats.count + 1; 2024-01-10 06:56:11.346 UTC [5064] ERROR: 40P01: deadlock detected 2024-01-10 06:56:11.346 UTC [5064] DETAIL: Process 5064 waits for ShareLock on transaction 6520; blocked by process 5277. Process 5277 waits for ShareLock on speculative token 1 of transaction 6386; blocked by process 5064. Process 5064: INSERT INTO email_stats (id, user_id, date, count) VALUES ('2024-01-01_1', 1, '2024-01-01', 1) ON CONFLICT (id) DO UPDATE SET count = email_stats.count + 1; Process 5277: INSERT INTO email_stats (id, user_id, date, count) VALUES ('2024-01-01_1', 1, '2024-01-01', 1) ON CONFLICT (id) DO UPDATE SET count = email_stats.count + 1; 2024-01-10 06:56:11.346 UTC [5064] HINT: See server log for query details. 2024-01-10 06:56:11.346 UTC [5064] CONTEXT: while inserting index tuple (34,3) in relation "user_id_date" 2024-01-10 06:56:11.346 UTC [5064] LOCATION: DeadLockReport, deadlock.c:1130 2024-01-10 06:56:11.346 UTC [5064] STATEMENT: INSERT INTO email_stats (id, user_id, date, count) VALUES ('2024-01-01_1', 1, '2024-01-01', 1) ON CONFLICT (id) DO UPDATE SET count = email_stats.count + 1; ```
On Wed, Jan 10, 2024 at 4:28 AM PG Bug reporting form <noreply@postgresql.org> wrote: > I have a table with a primary key and a unique index. > First, I ran an INSERT query and then several INSERT INTO ON CONFLICT DO > UPDATE queries in parallel. > This will almost always result in Duplicate key violations, and sometimes > Deadlocks. > When I removed the unique index, there was no Duplicate key violation or > Deadlock. This is not a bug. The statements you've shown use "ON CONFLICT (id) DO UPDATE", which will make the primary key the arbiter index -- the index used for upserting. The duplicate violation errors relate to the unique index, though. In general the implementation only uses one arbiter index to decide whether to insert or to update each row (barring certain edge cases where the table has multiple equivalent/redundant unique indexes). It's possible that you'll get the behavior you expect by specifying "ON CONFLICT (user_id, date) DO UPDATE" instead -- though I can't be sure of that. -- Peter Geoghegan
Thanks for the answer.
If the primary key was inferred as the arbiter index, shouldn't it be UPDATE because the arbiter index was conflicted?
And when parallelism is low, the duplicate violations don't occur.
When I ran ON CONFLICT (user_id, date) DO UPDATE as you suggested, a duplicate violation on primary key occurred.
However, the deadlock seems to disappear.
```
ERROR: duplicate key value violates unique constraint "email_stats_pkey"
DETAIL: Key (id)=(2024-01-01_1) already exists.
```
If the primary key was inferred as the arbiter index, shouldn't it be UPDATE because the arbiter index was conflicted?
And when parallelism is low, the duplicate violations don't occur.
When I ran ON CONFLICT (user_id, date) DO UPDATE as you suggested, a duplicate violation on primary key occurred.
However, the deadlock seems to disappear.
```
ERROR: duplicate key value violates unique constraint "email_stats_pkey"
DETAIL: Key (id)=(2024-01-01_1) already exists.
```
2024년 1월 11일 (목) 오전 12:12, Peter Geoghegan <pg@bowt.ie>님이 작성:
On Wed, Jan 10, 2024 at 4:28 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
> I have a table with a primary key and a unique index.
> First, I ran an INSERT query and then several INSERT INTO ON CONFLICT DO
> UPDATE queries in parallel.
> This will almost always result in Duplicate key violations, and sometimes
> Deadlocks.
> When I removed the unique index, there was no Duplicate key violation or
> Deadlock.
This is not a bug. The statements you've shown use "ON CONFLICT (id)
DO UPDATE", which will make the primary key the arbiter index -- the
index used for upserting. The duplicate violation errors relate to the
unique index, though.
In general the implementation only uses one arbiter index to decide
whether to insert or to update each row (barring certain edge cases
where the table has multiple equivalent/redundant unique indexes).
It's possible that you'll get the behavior you expect by specifying
"ON CONFLICT (user_id, date) DO UPDATE" instead -- though I can't be
sure of that.
--
Peter Geoghegan