Обсуждение: BUG #18279: Duplicate key violation and Deadlock when using ON CONFLICT/DO UPDATE with multiple unique indexes

Поиск
Список
Период
Сортировка
The following bug has been logged on the website:

Bug reference:      18279
Logged by:          Yeongeun Noh
Email address:      nye7181@gmail.com
PostgreSQL version: 16.1
Operating system:   MacOS 13.1
Description:

I have a table with a primary key and a unique index.
First, I ran an INSERT query and then several INSERT INTO ON CONFLICT DO
UPDATE queries in parallel.
This will almost always result in Duplicate key violations, and sometimes
Deadlocks.
When I removed the unique index, there was no Duplicate key violation or
Deadlock.

I have two questions regarding this situation.

1. Since I had already inserted a row, I expected the INSERT INTO ON
CONFLICT DO UPDATE queries to only update it.
However, if there is a unique index, it seems to try to insert rather than
update.
2. Each transaction executes only one query for one row. How does a deadlock
occur?

Here's the reproduction code.
```sql
CREATE TABLE email_stats (
  id varchar NOT NULL PRIMARY KEY,
  user_id integer NOT NULL,
  date date NOT NULL,
  count integer NOT NULL
);
CREATE UNIQUE INDEX user_id_date ON email_stats(user_id, date);
```

```sh
#!/bin/bash
CONNECTION_COUNT=500
QUERY_COUNT_PER_CONNECTION=10
INSERT_QUERY="INSERT INTO email_stats (id, user_id, date, count) VALUES
('2024-01-01_1', 1, '2024-01-01', 1);"
ON_CONFLICT_QUERY="INSERT INTO email_stats (id, user_id, date, count) VALUES
('2024-01-01_1', 1, '2024-01-01', 1) ON CONFLICT (id) DO UPDATE SET count =
email_stats.count + 1;"

perform() {
  echo "Start"
  insert_first
  insert_multiple
}
insert_first() {
  psql -h localhost -p 5432 -U channel -d channel -c "$INSERT_QUERY"
}
insert_multiple() {
  echo "creating connections..."
  for ((i = 0; i < $CONNECTION_COUNT; i++)); do
    (
      echo "inserting records for connection $i..."
      for ((j = 0; j < $QUERY_COUNT_PER_CONNECTION; j++)); do
        psql -h localhost -p 5432 -U channel -d channel -c
"$ON_CONFLICT_QUERY" 1> /dev/null
      done
    ) &
  done
  wait
}

perform
```

Here's the server error log.
```
2024-01-10 06:56:07.473 UTC [4331] ERROR: 23505: duplicate key value
violates unique constraint "user_id_date"
2024-01-10 06:56:07.473 UTC [4331] DETAIL: Key (user_id, date)=(1,
2024-01-01) already exists.
2024-01-10 06:56:07.473 UTC [4331] LOCATION: _bt_check_unique,
nbtinsert.c:666
2024-01-10 06:56:07.473 UTC [4331] STATEMENT: INSERT INTO email_stats (id,
user_id, date, count) VALUES ('2024-01-01_1', 1, '2024-01-01', 1) ON
CONFLICT (id) DO UPDATE SET count = email_stats.count + 1;
2024-01-10 06:56:07.645 UTC [4542] ERROR: 23505: duplicate key value
violates unique constraint "user_id_date"
2024-01-10 06:56:07.645 UTC [4542] DETAIL: Key (user_id, date)=(1,
2024-01-01) already exists.
2024-01-10 06:56:07.645 UTC [4542] LOCATION: _bt_check_unique,
nbtinsert.c:666
2024-01-10 06:56:07.645 UTC [4542] STATEMENT: INSERT INTO email_stats (id,
user_id, date, count) VALUES ('2024-01-01_1', 1, '2024-01-01', 1) ON
CONFLICT (id) DO UPDATE SET count = email_stats.count + 1;
2024-01-10 06:56:11.346 UTC [5064] ERROR: 40P01: deadlock detected
2024-01-10 06:56:11.346 UTC [5064] DETAIL: Process 5064 waits for ShareLock
on transaction 6520; blocked by process 5277.
Process 5277 waits for ShareLock on speculative token 1 of transaction 6386;
blocked by process 5064.
Process 5064: INSERT INTO email_stats (id, user_id, date, count) VALUES
('2024-01-01_1', 1, '2024-01-01', 1) ON CONFLICT (id) DO UPDATE SET count =
email_stats.count + 1;
Process 5277: INSERT INTO email_stats (id, user_id, date, count) VALUES
('2024-01-01_1', 1, '2024-01-01', 1) ON CONFLICT (id) DO UPDATE SET count =
email_stats.count + 1;
2024-01-10 06:56:11.346 UTC [5064] HINT: See server log for query details.
2024-01-10 06:56:11.346 UTC [5064] CONTEXT: while inserting index tuple
(34,3) in relation "user_id_date"
2024-01-10 06:56:11.346 UTC [5064] LOCATION: DeadLockReport,
deadlock.c:1130
2024-01-10 06:56:11.346 UTC [5064] STATEMENT: INSERT INTO email_stats (id,
user_id, date, count) VALUES ('2024-01-01_1', 1, '2024-01-01', 1) ON
CONFLICT (id) DO UPDATE SET count = email_stats.count + 1;
```


On Wed, Jan 10, 2024 at 4:28 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
> I have a table with a primary key and a unique index.
> First, I ran an INSERT query and then several INSERT INTO ON CONFLICT DO
> UPDATE queries in parallel.
> This will almost always result in Duplicate key violations, and sometimes
> Deadlocks.
> When I removed the unique index, there was no Duplicate key violation or
> Deadlock.

This is not a bug. The statements you've shown use "ON CONFLICT (id)
DO UPDATE", which will make the primary key the arbiter index -- the
index used for upserting. The duplicate violation errors relate to the
unique index, though.

In general the implementation only uses one arbiter index to decide
whether to insert or to update each row (barring certain edge cases
where the table has multiple equivalent/redundant unique indexes).
It's possible that you'll get the behavior you expect by specifying
"ON CONFLICT (user_id, date) DO UPDATE" instead -- though I can't be
sure of that.

--
Peter Geoghegan



Thanks for the answer.

If the primary key was inferred as the arbiter index, shouldn't it be UPDATE because the arbiter index was conflicted?
And when parallelism is low, the duplicate violations don't occur.

When I ran ON CONFLICT (user_id, date) DO UPDATE as you suggested, a duplicate violation on primary key occurred.
However, the deadlock seems to disappear.
```
ERROR:  duplicate key value violates unique constraint "email_stats_pkey"
DETAIL:  Key (id)=(2024-01-01_1) already exists.
```

2024년 1월 11일 (목) 오전 12:12, Peter Geoghegan <pg@bowt.ie>님이 작성:
On Wed, Jan 10, 2024 at 4:28 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
> I have a table with a primary key and a unique index.
> First, I ran an INSERT query and then several INSERT INTO ON CONFLICT DO
> UPDATE queries in parallel.
> This will almost always result in Duplicate key violations, and sometimes
> Deadlocks.
> When I removed the unique index, there was no Duplicate key violation or
> Deadlock.

This is not a bug. The statements you've shown use "ON CONFLICT (id)
DO UPDATE", which will make the primary key the arbiter index -- the
index used for upserting. The duplicate violation errors relate to the
unique index, though.

In general the implementation only uses one arbiter index to decide
whether to insert or to update each row (barring certain edge cases
where the table has multiple equivalent/redundant unique indexes).
It's possible that you'll get the behavior you expect by specifying
"ON CONFLICT (user_id, date) DO UPDATE" instead -- though I can't be
sure of that.

--
Peter Geoghegan