Обсуждение: DELETE deletes more than one rows when LIMIT is used in the USING clause

Поиск
Список
Период
Сортировка

DELETE deletes more than one rows when LIMIT is used in the USING clause

От
Onur Tirtir
Дата:

Hello everyone,

 

Let me first share some information regarding my setup:

Platform: Ubuntu 20.04 - x86_64 (on wsl2)

Kernel version: 5.10.102.1-microsoft-standard-WSL2

GCC version: gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0

PostgreSQL version: 14.3: compiled from the source using following flags:

--enable-debug

--enable-depend

--enable-cassert

CFLAGS=-ggdb -O0 -g -fno-omit-frame-pointer

--with-openssl

--with-libxml

--with-icu

--with-libxslt

 

With the following table schema, DELETE deletes more than one rows unexpectedly when LIMIT is specified in the USING clause of the DELETE statement.

Surprisingly, this happens only when there is a primary key based on some columns used for filtering in DELETE statement:

DROP TABLE IF EXISTS my_table;

CREATE TABLE my_table (int_col_1 int, int_col_2 int, int_col_3 int);

INSERT INTO my_table VALUES (1, 1, 100), (1, 2, 101);

 

BEGIN;

    -- deletes 1 tuple, correct

    DELETE FROM my_table d

    USING (

        SELECT int_col_3

        FROM my_table

        WHERE my_table.int_col_1 = 1

        LIMIT 1 FOR UPDATE OF my_table

    ) s

    WHERE d.int_col_3 = s.int_col_3 AND d.int_col_1 = 1;

ROLLBACK;

 

-- now define the pkey and let’s see what happens with the same DELETE statement

ALTER TABLE my_table ADD CONSTRAINT my_table_pkey PRIMARY KEY (int_col_1, int_col_2);

 

BEGIN;

    -- deletes 2 tuples, incorrect

    DELETE FROM my_table d

    USING (

        SELECT int_col_3

        FROM my_table

        WHERE my_table.int_col_1 = 1

        LIMIT 1 FOR UPDATE OF my_table

    ) s

    WHERE d.int_col_3 = s.int_col_3 AND d.int_col_1 = 1;

ROLLBACK;

Re: DELETE deletes more than one rows when LIMIT is used in the USING clause

От
Tom Lane
Дата:
Onur Tirtir <Onur.Tirtir@microsoft.com> writes:
> With the following table schema, DELETE deletes more than one rows unexpectedly when LIMIT is specified in the USING
clauseof the DELETE statement. 

I do not see a bug here.  Your query is fundamentally unstable:

>     DELETE FROM my_table d
>     USING (
>         SELECT int_col_3
>         FROM my_table
>         WHERE my_table.int_col_1 = 1
>         LIMIT 1 FOR UPDATE OF my_table
>     ) s
>     WHERE d.int_col_3 = s.int_col_3 AND d.int_col_1 = 1;

The SELECT ... LIMIT query is underspecified, in that there are multiple
rows it could choose to return.  Worse, because it's FOR UPDATE, its
results are affected by whatever the DELETE may have already done: it
won't return an already-deleted row.  Thus, the query's behavior changes
depending on whether the planner happens to put the sub-select on the
inside or the outside of the nestloop join with "d".  I think the CREATE
INDEX command is just incidentally causing a statistics update that
switches the preferred plan shape.

Perhaps there's an argument that the planner should understand that the
sub-select's results are volatile and avoid executing it more than once.
But we haven't felt a need for such a restriction in the past, and this
example does nothing to convince me that one is needed now.  It looks
like a mighty ugly, brute-force hack; surely there's a simpler and better
solution to whatever your actual need is.

If you really do want to build a query that works this way, the approved
solution for ensuring the sub-select executes just once is to put it
in WITH ... AS MATERIALIZED.

            regards, tom lane