Обсуждение: BUG #15556: Duplicate key violations even when using ON CONFLICT DOUPDATE
BUG #15556: Duplicate key violations even when using ON CONFLICT DOUPDATE
От
 
		    	PG Bug reporting form
		    Дата:
		        The following bug has been logged on the website:
Bug reference:      15556
Logged by:          Feike Steenbergen
Email address:      feikesteenbergen@gmail.com
PostgreSQL version: 11.1
Operating system:   CentOS Linux release 7.5.1804 (Core)
Description:
On a table that has a primary key and multiple unique indexes, we have
recently
seen a duplicate key violation, even though we use ON CONFLICT DO UPDATE.
The values for the separate columns are dependent on each other, therefore
a
unique violation on one of the columns should also violate the other
indexes.
Below a self reproducing testcase for this behaviour.
I would have expected the statement to either insert or update, but
receiving
the duplicate key violation is not something that I expected.
Should I expect a duplicate key violation for the given table and
statements?
regards,
Feike
Reproducing testcase
#!/bin/bash
# within 1 second, I have multiple of these errors:
# ERROR:  duplicate key value violates unique constraint "bugtest_pkey"
# DETAIL:  Key (i)=(76) already exists.
PARALLEL=10
TIMEOUT=1
TMPFILE=$(mktemp)
trap 'rm -f ${TMPFILE}' EXIT QUIT INT TERM
psql -AtXq <<__EOT__
CREATE OR REPLACE FUNCTION slow_integer(i integer)
RETURNS integer LANGUAGE sql
AS 'SELECT i FROM pg_sleep(random()/1000000);'
IMMUTABLE;
DROP TABLE IF EXISTS bugtest;
CREATE TABLE bugtest(
    i int primary key,
    j int not null,
    k int not null,
    counter bigint not null default 1
);
CREATE UNIQUE INDEX ON bugtest (slow_integer(j));
CREATE UNIQUE INDEX ON bugtest (slow_integer(k));
__EOT__
cat > "${TMPFILE}" <<__EOT__
INSERT INTO bugtest(i, j, k)
SELECT i, i, i
  FROM CAST(random()*100 AS int) AS sub(i)
ON CONFLICT (slow_integer(k))
DO UPDATE
SET counter=bugtest.counter+1;
\watch 0.0000001
__EOT__
seq ${PARALLEL} | xargs --max-args=1 --max-procs=${PARALLEL} \
    timeout ${TIMEOUT} psql -AtXq -f "${TMPFILE}" --set
			
		Re: BUG #15556: Duplicate key violations even when using ON CONFLICTDO UPDATE
От
 
		    	Peter Geoghegan
		    Дата:
		        On Mon, Dec 17, 2018 at 1:55 AM PG Bug reporting form <noreply@postgresql.org> wrote: > Below a self reproducing testcase for this behaviour. > > I would have expected the statement to either insert or update, but > receiving > the duplicate key violation is not something that I expected. > > Should I expect a duplicate key violation for the given table and > statements? The problem is that unique index inference isn't sophisticated enough to recognize that the primary key ought to be inferred alongside the two other unique indexes, which are expression indexes. This is hardly surprising -- why would an expression index need to be created that was exactly equivalent to the primary key? Actually, it's impossible to make this work in principle, because nothing obligates you to insert the same thing into column i as column j or k -- you've merely done it that way this one time. Inference is clever enough to not differentiate based on irrelevant factors like column ordering among available, equivalent unique indexes. It's already very sophisticated for a mechanism that's just there to handle edge cases. I don't see a need for it to become more sophisticated. -- Peter Geoghegan
Re: BUG #15556: Duplicate key violations even when using ON CONFLICTDO UPDATE
От
 
		    	Peter Geoghegan
		    Дата:
		        On Mon, Dec 17, 2018 at 11:08 AM Peter Geoghegan <pg@bowt.ie> wrote: > The problem is that unique index inference isn't sophisticated enough > to recognize that the primary key ought to be inferred alongside the > two other unique indexes, which are expression indexes. This is hardly > surprising -- why would an expression index need to be created that > was exactly equivalent to the primary key? By the way, EXPLAIN ANALYZE INSERT ... ON CONFLICT will actually show you which unique indexes/constraints have been inferred from the target columns/expressions that appear in parenthesis. If two unique indexes use different columns, or are otherwise equivalent based on convention rather than on the semantics, then they're definitely not going to be recognized as equivalent by the inference process. You're not supposed to be able to UPDATE on a conflict on more than one unique index, really. The general idea with inference is to avoid unpleasant surprises when there are two indexes that enforce basically the same constraint, such as when a bloated unique index is replaced by creating a new index with CREATE UNIQUE INDEX CONCURRENTLY, before the original is dropped. That's what I meant about this being an edge case -- this hardly ever happens. So, yes, you can have multiple unique indexes inferred, but it doesn't matter which one you take the alternative UPDATE path on, because the rules of inference ensure that it cannot matter. We can inferred multiple indexes precisely because they'll all have the same conflicts. I got asked about multiple inference specifications in one statement quite a few times back when ON CONFLICT originally went in. That's not how it's supposed to be used -- what happens when *both* constraints are violated at once, in different ways? Just use multiple INSERT ... ON CONFLICT statements instead. -- Peter Geoghegan