Обсуждение: Unnecessary FOR UPDATE lock instead of possible FOR NO KEY UPDATE lock in an UPSERT statement

Поиск
Список
Период
Сортировка

Unnecessary FOR UPDATE lock instead of possible FOR NO KEY UPDATE lock in an UPSERT statement

От
lights go out
Дата:
Hi, 

Since Postgres 9.3, UPDATE statements generally try to grab FOR NO KEY UPDATE locks, 
unless columns participating in UNIQUE indexes also were modified:

>commit 0ac5ad5134f2769ccbaefec73844f8504c4d6182
>Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
>Date: Wed Jan 23 12:04:59 2013 -0300
>...
>UPDATE commands that do not modify the values stored in the columns that are 
>part of the key of the tuple now grab a SELECT FOR NO KEY UPDATE lock on the tuple, 
>allowing them to proceed concurrently with tuple locks of the FOR KEY SHARE variety.

If we do an UPDATE statement and rewrite part of the key like "set a = a" 
Postgres still grabs a FOR NO KEY UPDATE lock.

But if we do an UPSERT statement and rewrite part of the tuple key with the same value
in the ON CONFLICT clause, Postgres grabs a more aggressive FOR UPDATE lock. 

(I couldn't provide a self-contained script, because we need to monitor row locks
in a separate connection). See below:

create extension if not exists pgrowlocks ;
create sequence if not exists books_id_seq ;

create table if not exists books (
    id integer primary key default nextval('books_id_seq'),
    provider_id int not null,
    external_id text not null,
    some_data text,
    unique(provider_id, external_id) -- this is important
);

truncate books;
insert into books (provider_id, external_id) values (1, 'ABC');


-- Case 1 (passed): update uniq key to the same value;
-- expect No Key Update lock, because they key hasn't changed
begin;
update books set (provider_id, external_id) = (provider_id, external_id);

-- (run this query in a different connection)
SELECT p.* FROM books AS t, pgrowlocks('books') AS p WHERE p.locked_row = t.ctid;
/*
 locked_row | locker  | multi |   xids    |       modes       | pids
------------+---------+-------+-----------+-------------------+------
 (0,1)      | 2103114 | f     | {2103114} | {"No Key Update"} | {0}
*/
rollback;


-- Case 2 (passed): update uniq key to a different value;
-- expect For Update lock, because the uniq key has changed
begin;
update books set (provider_id, external_id) = (2, 'ABC');

-- (run this query in a different connection)
SELECT p.* FROM books AS t, pgrowlocks('books') AS p WHERE p.locked_row = t.ctid;
/*
 locked_row | locker  | multi |   xids    |  modes   | pids
------------+---------+-------+-----------+----------+------
 (0,1)      | 2103117 | f     | {2103117} | {Update} | {0}
*/
rollback;


-- Case 3 (failed): update uniq key to the same value, but use the UPSERT query, 
-- expect No Key Update lock because the key hasn't changed, 
-- but we actually get For Update lock
begin;
insert into books (provider_id, external_id) 
select 1, 'ABC' 
on conflict (provider_id, external_id) 
do update set (provider_id, external_id) = (excluded.provider_id, excluded.external_id);

-- (run this query in a different connection)
SELECT p.* FROM books AS t, pgrowlocks('books') AS p WHERE p.locked_row = t.ctid;
/*
 locked_row | locker  | multi |   xids    |  modes   | pids
------------+---------+-------+-----------+----------+------
 (0,1)      | 2103119 | f     | {2103119} | {Update} | {0}
*/
rollback;

-- Case 4, same UPSERT, but don't update uniq key in the ON CONFLICT clause, 
-- expect No Key Update lock because the key hasn't changed
begin;
insert into books (provider_id, external_id, some_data) 
select 1, 'ABC', 'some data' 
on conflict (provider_id, external_id) 
do update set some_data = excluded.some_data;

-- (run this query in a different connection)
SELECT p.* FROM books AS t, pgrowlocks('books') AS p WHERE p.locked_row = t.ctid;
/*
 locked_row | locker  | multi |   xids    |       modes       | pids
------------+---------+-------+-----------+-------------------+------
 (0,1)      | 2103122 | f     | {2103122} | {"No Key Update"} | {0}
*/
rollback;

So in case 3 we rewrite the key part of the tuple with the same value 
and get the stronger FOR UPDATE lock.
While in case 1 we do the same but acquire a softer FOR NO KEY UPDATE lock
which is the correct behavior since we haven't actually changed the key.

This is inconsistent.
I expect UPSERT query in case 3 to grab a FOR NO KEY UPDATE lock 
because SET (a, b) = (excluded.a, excluded.b) is not actually modifying the key.

Unnecessarily stronger locks increase lock contentions, 
for example blocking FK constraint checks.

Postgres version: PostgreSQL 12.4 on x86_64-apple-darwin16.7.0, compiled by Apple LLVM version 8.1.0 (clang-802.0.42), 64-bit
Also reproduced with Postgresql 12.2 on a Linux box.

Best regards,
Ivan

Re: Unnecessary FOR UPDATE lock instead of possible FOR NO KEY UPDATE lock in an UPSERT statement

От
Peter Geoghegan
Дата:
On Fri, Oct 2, 2020 at 5:14 PM lights go out <enderstd@gmail.com> wrote:
> So in case 3 we rewrite the key part of the tuple with the same value
> and get the stronger FOR UPDATE lock.
> While in case 1 we do the same but acquire a softer FOR NO KEY UPDATE lock
> which is the correct behavior since we haven't actually changed the key.
>
> This is inconsistent.

It's not inconsistent. The excluded.* pseudo row is not the same thing
as the target row. I believe that you would see behavior consistent
with the plain UPDATE case (i.e. tuple lock strength "No Key Update")
you changed the ON CONFLICT ... DO UPDATE to update the target row
using the target row itself (in the UPDATE's target list). You can use
AS to create an alias for the target table so its row can be updated
using itself. This is what you actually did with your UPDATE example.

Of course, that isn't very helpful -- nobody writes queries like your
UPDATE example for obvious reasons. But the fact remains: there is no
inconsistency between UPDATE and ON CONFLICT ... DO UPDATE in evidence
here.

> I expect UPSERT query in case 3 to grab a FOR NO KEY UPDATE lock
> because SET (a, b) = (excluded.a, excluded.b) is not actually modifying the key.
>
> Unnecessarily stronger locks increase lock contentions,
> for example blocking FK constraint checks.

You're assuming that it's impossible for two equal values to be
distinct from each other. But that is possible, at least in some
cases. For example, with a unique index on a numeric column the value
'5.00' is visibly distinct from '5', but the values are nevertheless
equal. You could therefore have a conflict in which
exluded.numeric_col and target.numeric_col are visibly different, in a
way that the user might care about.

It would be possible in principle to optimize this case. But there are
numerous hard problems to solve to do so (not just the one I
mentioned), so I wouldn't expect that to happen. I would just rewrite
the query.

-- 
Peter Geoghegan