Обсуждение: BUG #4925: "select ... for update" doesn't affect rows from sub-query

Поиск
Список
Период
Сортировка

BUG #4925: "select ... for update" doesn't affect rows from sub-query

От
"Steve Caligo"
Дата:
The following bug has been logged online:

Bug reference:      4925
Logged by:          Steve Caligo
Email address:      steve.caligo@ctie.etat.lu
PostgreSQL version: 8.3.7 and 8.4.0
Operating system:   Archlinux and Gentoo 8.3.7, Gentoo 8.4.0
Description:        "select ... for update" doesn't affect rows from
sub-query
Details:

While trying to guarantee data consistency when doing concurrent processing,
I stumbled upon your cautions mentionned in your documentation (especially
"limit" ... "for update"):
http://www.postgresql.org/docs/8.4/static/sql-select.html

I tried working around this limitation and the statement on the same page
seemed promising to me:
"If FOR UPDATE or FOR SHARE is applied to a view or sub-query, it affects
all tables used in the view or sub-query."

But unfortunately the latter statement doesn't seem to be true and the
subquery isn't protected by row locks, as the following examples show.


1) INITIAL SITUATION

Create a simple table with some data. No constraints, no indexes, just the
bare minimum:

create table test (
    id integer,
    name varchar(10),
    c integer
);

insert into test values
    (1, 'test1', 0),
    (2, 'test2', 0),
    (3, 'test3', 0),
    (4, 'test4', 0),
    (5, 'test5', 0)
;


2) FIRST TRY, USING "UPDATE WHERE ID = ( SELECT )"
The goal is to have two clients set their unique ID to a single/different
row from the table. First, using "limit" in a slightly different way:

1=> begin transaction;
2=> begin transaction;

1=> update test set c = 1 where id = (
    select id from test where c = 0 order by id limit 1
)
; -- updates row id=1
2=> update test set c = 2 where id = (
    select id from test where c = 0 order by id limit 1
)
; -- forced to wait on lock

1=> commit; -- client #2 continues
1=> select * from test where id = 1;
 id | name  | c
----+-------+---
  1 | test1 | 1
(1 row)

2=> commit;
2=> select * from test where id = 1;
 id | name  | c
----+-------+---
  1 | test1 | 2
(1 row)

1=> select * from test where id = 1;
 id | name  | c
----+-------+---
  1 | test1 | 2
(1 row)


Conclusion: didn't work.
Probably because the subquery is executed before the update and not affected
by row locking.

Expected behaviour: one client update one row to c=1 and the other client
updates a different row to c=2.


3) SECOND TRY, PROTECTING THE UPDATE BY AN ADDITIONAL "SELECT ... FOR
UPDATE" AND AVOIDING THE "LIMIT"

1=> begin transaction;
2=> begin transaction;
1=> select id from test where id = (
    select min(id) from test where c = 0
) for update;
 id
----
  2
(1 row)

2=> select id from test where id = (
    select min(id) from test where c = 0
) for update; -- forced to wait on lock

1=> update test set c = 1 where id = 2;
1=> commit; -- client #2 continues:

2=> -- client #2 outputs:
 id
----
  2
(1 row)

2=> select * from test where id = 2;
 id | name  | c
----+-------+---
  2 | test2 | 1
(1 row)
2=> -- now this isn't what we initially asked for, let's just repeat the
query once more:

2=> select id from test where id = ( select min(id) from test where c = 0 )
for update;
 id
----
  3
(1 row)


Conclusion: didn't work.
The situation one ends up in is one that contradicts your above statement,
but also seemingly violates the "I" in ACID. In 3), client #2 is clearly
affected by the actions of client #1. While serialized transactions or full
table locks would avoid this race condition, it either requires large
changes in the application or impacts performance during contention.

Adding an additional "c = 0" to the main query of 3) of course suppresses
the row from the concurrent update, but it shouldn't have been returned with
a value of "c = 1" because the transaction #2 started prior to the update
statement of #1.

Re: BUG #4925: "select ... for update" doesn't affect rows from sub-query

От
Robert Haas
Дата:
On Thu, Jul 16, 2009 at 12:34 PM, Steve Caligo<steve.caligo@ctie.etat.lu> w=
rote:
>
> The following bug has been logged online:
>
> Bug reference: =A0 =A0 =A04925
> Logged by: =A0 =A0 =A0 =A0 =A0Steve Caligo
> Email address: =A0 =A0 =A0steve.caligo@ctie.etat.lu
> PostgreSQL version: 8.3.7 and 8.4.0
> Operating system: =A0 Archlinux and Gentoo 8.3.7, Gentoo 8.4.0
> Description: =A0 =A0 =A0 =A0"select ... for update" doesn't affect rows f=
rom
> sub-query
> Details:
>
> While trying to guarantee data consistency when doing concurrent processi=
ng,
> I stumbled upon your cautions mentionned in your documentation (especially
> "limit" ... "for update"):
> http://www.postgresql.org/docs/8.4/static/sql-select.html
>
> I tried working around this limitation and the statement on the same page
> seemed promising to me:
> "If FOR UPDATE or FOR SHARE is applied to a view or sub-query, it affects
> all tables used in the view or sub-query."
>
> But unfortunately the latter statement doesn't seem to be true and the
> subquery isn't protected by row locks, as the following examples show.
>
> 1) INITIAL SITUATION
>
> Create a simple table with some data. No constraints, no indexes, just the
> bare minimum:
>
> create table test (
> =A0 =A0 =A0 =A0id integer,
> =A0 =A0 =A0 =A0name varchar(10),
> =A0 =A0 =A0 =A0c integer
> );
>
> insert into test values
> =A0 =A0 =A0 =A0(1, 'test1', 0),
> =A0 =A0 =A0 =A0(2, 'test2', 0),
> =A0 =A0 =A0 =A0(3, 'test3', 0),
> =A0 =A0 =A0 =A0(4, 'test4', 0),
> =A0 =A0 =A0 =A0(5, 'test5', 0)
> ;
>
>
> 2) FIRST TRY, USING "UPDATE WHERE ID =3D ( SELECT )"
> The goal is to have two clients set their unique ID to a single/different
> row from the table. First, using "limit" in a slightly different way:
>
> 1=3D> begin transaction;
> 2=3D> begin transaction;
>
> 1=3D> update test set c =3D 1 where id =3D (
> =A0 =A0 =A0 =A0select id from test where c =3D 0 order by id limit 1
> )
> ; -- updates row id=3D1
> 2=3D> update test set c =3D 2 where id =3D (
> =A0 =A0 =A0 =A0select id from test where c =3D 0 order by id limit 1
> )
> ; -- forced to wait on lock
>
> 1=3D> commit; -- client #2 continues
> 1=3D> select * from test where id =3D 1;
> =A0id | name =A0| c
> ----+-------+---
> =A01 | test1 | 1
> (1 row)
>
> 2=3D> commit;
> 2=3D> select * from test where id =3D 1;
> =A0id | name =A0| c
> ----+-------+---
> =A01 | test1 | 2
> (1 row)
>
> 1=3D> select * from test where id =3D 1;
> =A0id | name =A0| c
> ----+-------+---
> =A01 | test1 | 2
> (1 row)
>
>
> Conclusion: didn't work.
> Probably because the subquery is executed before the update and not affec=
ted
> by row locking.
>
> Expected behaviour: one client update one row to c=3D1 and the other clie=
nt
> updates a different row to c=3D2.

This is pretty clearly NOT the situation described in the
documentation.  There is no FOR UPDATE or FOR SHARE anywhere in this
query.  You could argue that we should treat an UPDATE statement as
applying an implicit FOR UPDATE to any subqueries found therein, but
that has nothing to do with whether the current behavior matches the
documentation; it's a discussion of whether the current behavior is
good or bad.

> 3) SECOND TRY, PROTECTING THE UPDATE BY AN ADDITIONAL "SELECT ... FOR
> UPDATE" AND AVOIDING THE "LIMIT"
>
> 1=3D> begin transaction;
> 2=3D> begin transaction;
> 1=3D> select id from test where id =3D (
> =A0 =A0 =A0 =A0select min(id) from test where c =3D 0
> ) for update;
> =A0id
> ----
> =A02
> (1 row)
>
> 2=3D> select id from test where id =3D (
> =A0 =A0 =A0 =A0select min(id) from test where c =3D 0
> ) for update; -- forced to wait on lock
>
> 1=3D> update test set c =3D 1 where id =3D 2;
> 1=3D> commit; -- client #2 continues:
> 2=3D> -- client #2 outputs:
> =A0id
> ----
> =A02
> (1 row)
>
> 2=3D> select * from test where id =3D 2;
> =A0id | name =A0| c
> ----+-------+---
> =A02 | test2 | 1
> (1 row)
> 2=3D> -- now this isn't what we initially asked for, let's just repeat the
> query once more:
>
> 2=3D> select id from test where id =3D ( select min(id) from test where c=
 =3D 0 )
> for update;
> =A0id
> ----
> =A03
> (1 row)
>
>
> Conclusion: didn't work.
> The situation one ends up in is one that contradicts your above statement,
> but also seemingly violates the "I" in ACID. In 3), client #2 is clearly
> affected by the actions of client #1. While serialized transactions or fu=
ll
> table locks would avoid this race condition, it either requires large
> changes in the application or impacts performance during contention.

This is pretty weird behavior, and I am among those who think it
sucks.  But it is documented.

http://www.postgresql.org/docs/8.4/static/transaction-iso.html#MVCC-SERIALI=
ZABILITY

An interesting fact is that if you stick another "for update" into the
subquery here, the command will fail utterly, with the following error
message:

ERROR:  SELECT FOR UPDATE/SHARE is not allowed with aggregate functions

> Adding an additional "c =3D 0" to the main query of 3) of course suppress=
es
> the row from the concurrent update, but it shouldn't have been returned w=
ith
> a value of "c =3D 1" because the transaction #2 started prior to the upda=
te
> statement of #1.

...Robert

Re: BUG #4925: "select ... for update" doesn't affect rows from sub-query

От
Robert Haas
Дата:
On Thu, Jul 30, 2009 at 1:40 AM, Steve Caligo<Steve.Caligo@ctie.etat.lu> wr=
ote:
> Allowing FOR UPDATE in sub-queries has been rejected in this same thread,
> you've also mentioned it your previous reply.
> http://archives.postgresql.org/pgsql-bugs/2004-10/msg00150.php

Um, I didn't write that email.  That was Tom Lane.

>> This is pretty weird behavior, and I am among those who think it
>> sucks. =A0But it is documented.
>>
>> http://www.postgresql.org/docs/8.4/static/transaction-iso.html#MVCC-SERI=
ALIZABILITY
>
> This behavior may be consistent with the transaction isolation levels
> PostgreSQL provides (read committed/serializable, while one would need
> true repeatable read in this case), it is a huge drawback when working
> with the database, as one has to think about potential collateral damage
> with every single SELECT...FOR UPDATE / UPDATE query one writes,
> especially the complex ones.
>
>
>> An interesting fact is that if you stick another "for update" into the
>> subquery here, the command will fail utterly, with the following error
>> message:
>>
>> ERROR: =A0SELECT FOR UPDATE/SHARE is not allowed with aggregate functions
>
> See my second link above, it works as designed.
>
> The document states that the FOR UPDATE is applied to the view or
> sub-query, which I assume means: "applied to a SELECT containing a
> sub-query or view". As such, I'm expecting a single statement to be an
> atomic operation, i.e. always works on the same data snapshot, independent
> of the transaction isolation level.
>
> This atomicity has to be enforced through a full table lock (which is
> often the easiest to implement, but also the most expensive
> efficiency-wise) or serialization.
>
> Side note: a cross-database test showed that Oracle, DB2 nor MySQL (with
> InnoDB storage engine) run the query as true repeatable read, whether one
> makes use of the "skip locked rows" (when available) or not. One always
> ends up with two different rows being updated.

Sure.  I mean, I understand your frustration here, but this started
out as a documentation complaint.  I think the current behavior is
documented reasonably OK; the problem is that the behavior is pretty
weird.  I'd be all in favor of fixing it, but I'm not sure what would
be involved in that or what the trade-offs would be.  I suspect if it
were easy it would have been done long ago; you're not the first
person to complain about it.

...Robert

Re: BUG #4925: "select ... for update" doesn't affect rows from sub-query

От
"Steve Caligo"
Дата:
> On Thu, Jul 16, 2009 at 12:34 PM, Steve Caligo<steve.caligo@ctie.etat.lu>
> wrote:

>> 2) FIRST TRY, USING "UPDATE WHERE ID = ( SELECT )"
>> The goal is to have two clients set their unique ID to a
>> single/different
>> row from the table. First, using "limit" in a slightly different way:

> This is pretty clearly NOT the situation described in the
> documentation.  There is no FOR UPDATE or FOR SHARE anywhere in this
> query.  You could argue that we should treat an UPDATE statement as
> applying an implicit FOR UPDATE to any subqueries found therein, but
> that has nothing to do with whether the current behavior matches the
> documentation; it's a discussion of whether the current behavior is
> good or bad.

Should there really be a difference between doing an UPDATE or just
requesting rows to be locked for a later modification (update or deletion,
whatever) through SELECT ... FOR UPDATE?

I'm aware that there's no FOR UPDATE in that query, as its direct use is
discouraged in the documentation (SELECT ... LIMIT ... FOR UPDATE) and
reported as "odd behavior" previously as well:
http://archives.postgresql.org/pgsql-bugs/2004-10/msg00138.php

Yes, I'm considering this behavior bad, as I'd expect the UPDATE statement
to lock either the whole table (if there's no other way) or only the rows
returned by the sub-SELECT, but as far as I can see, such row-level
locking can't be achieved in PostgreSQL but through the use of
serialization.

Allowing FOR UPDATE in sub-queries has been rejected in this same thread,
you've also mentioned it your previous reply.
http://archives.postgresql.org/pgsql-bugs/2004-10/msg00150.php


> This is pretty weird behavior, and I am among those who think it
> sucks.  But it is documented.
>
> http://www.postgresql.org/docs/8.4/static/transaction-iso.html#MVCC-SERIALIZABILITY

This behavior may be consistent with the transaction isolation levels
PostgreSQL provides (read committed/serializable, while one would need
true repeatable read in this case), it is a huge drawback when working
with the database, as one has to think about potential collateral damage
with every single SELECT...FOR UPDATE / UPDATE query one writes,
especially the complex ones.


> An interesting fact is that if you stick another "for update" into the
> subquery here, the command will fail utterly, with the following error
> message:
>
> ERROR:  SELECT FOR UPDATE/SHARE is not allowed with aggregate functions

See my second link above, it works as designed.

The document states that the FOR UPDATE is applied to the view or
sub-query, which I assume means: "applied to a SELECT containing a
sub-query or view". As such, I'm expecting a single statement to be an
atomic operation, i.e. always works on the same data snapshot, independent
of the transaction isolation level.

This atomicity has to be enforced through a full table lock (which is
often the easiest to implement, but also the most expensive
efficiency-wise) or serialization.

Side note: a cross-database test showed that Oracle, DB2 nor MySQL (with
InnoDB storage engine) run the query as true repeatable read, whether one
makes use of the "skip locked rows" (when available) or not. One always
ends up with two different rows being updated.

Best regards,

Steve Caligo