Обсуждение: DELETE deletes more than one rows when LIMIT is used in the USING clause
Hello everyone,
Let me first share some information regarding my setup:
Platform: Ubuntu 20.04 - x86_64 (on wsl2)
Kernel version: 5.10.102.1-microsoft-standard-WSL2
GCC version: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
PostgreSQL version: 14.3: compiled from the source using following flags:
--enable-debug
--enable-depend
--enable-cassert
CFLAGS=-ggdb -O0 -g -fno-omit-frame-pointer
--with-openssl
--with-libxml
--with-icu
--with-libxslt
With the following table schema, DELETE deletes more than one rows unexpectedly when LIMIT is specified in the USING clause of the DELETE statement.
Surprisingly, this happens only when there is a primary key based on some columns used for filtering in DELETE statement:
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table (int_col_1 int, int_col_2 int, int_col_3 int);
INSERT INTO my_table VALUES (1, 1, 100), (1, 2, 101);
BEGIN;
-- deletes 1 tuple, correct
DELETE FROM my_table d
USING (
SELECT int_col_3
FROM my_table
WHERE my_table.int_col_1 = 1
LIMIT 1 FOR UPDATE OF my_table
) s
WHERE d.int_col_3 = s.int_col_3 AND d.int_col_1 = 1;
ROLLBACK;
-- now define the pkey and let’s see what happens with the same DELETE statement
ALTER TABLE my_table ADD CONSTRAINT my_table_pkey PRIMARY KEY (int_col_1, int_col_2);
BEGIN;
-- deletes 2 tuples, incorrect
DELETE FROM my_table d
USING (
SELECT int_col_3
FROM my_table
WHERE my_table.int_col_1 = 1
LIMIT 1 FOR UPDATE OF my_table
) s
WHERE d.int_col_3 = s.int_col_3 AND d.int_col_1 = 1;
ROLLBACK;
Onur Tirtir <Onur.Tirtir@microsoft.com> writes: > With the following table schema, DELETE deletes more than one rows unexpectedly when LIMIT is specified in the USING clauseof the DELETE statement. I do not see a bug here. Your query is fundamentally unstable: > DELETE FROM my_table d > USING ( > SELECT int_col_3 > FROM my_table > WHERE my_table.int_col_1 = 1 > LIMIT 1 FOR UPDATE OF my_table > ) s > WHERE d.int_col_3 = s.int_col_3 AND d.int_col_1 = 1; The SELECT ... LIMIT query is underspecified, in that there are multiple rows it could choose to return. Worse, because it's FOR UPDATE, its results are affected by whatever the DELETE may have already done: it won't return an already-deleted row. Thus, the query's behavior changes depending on whether the planner happens to put the sub-select on the inside or the outside of the nestloop join with "d". I think the CREATE INDEX command is just incidentally causing a statistics update that switches the preferred plan shape. Perhaps there's an argument that the planner should understand that the sub-select's results are volatile and avoid executing it more than once. But we haven't felt a need for such a restriction in the past, and this example does nothing to convince me that one is needed now. It looks like a mighty ugly, brute-force hack; surely there's a simpler and better solution to whatever your actual need is. If you really do want to build a query that works this way, the approved solution for ensuring the sub-select executes just once is to put it in WITH ... AS MATERIALIZED. regards, tom lane