Обсуждение: SELECT FOR UPDATE and LIMIT 1 behave oddly
Guys,
Summary:  SELECT FOR UPDATE and LIMIT behave oddly when combined
Affects: 7.4.3 (not tested yet on other versions)
Severity:  Annoyance
Description:
If you attempt to lock a row "off the top" of a table by using SELECT ... FOR
UPDATE LIMIT 1, any blocked transaction will have no rows returned when the
lock ends.   This is counter-intuitive and wierd.   It is easily worked
around, though, since the LIMIT 1 is really superfluous; possibly we don't
want to fix it, just put a warning in the docs.
Test Case:
primer=# create table some_que ( sequence int, done boolean );
CREATE TABLE
primer=# insert into some_que values ( 1, false );
primer=# insert into some_que values ( 2, false );
primer=# insert into some_que values ( 3, false );
primer=# insert into some_que values ( 4, false );
primer=# insert into some_que values ( 5, false );
primer=# insert into some_que values ( 6, false );
TRANSACTION A:
primer=# begin;
BEGIN
primer=# select * from some_que where done = false order by sequence limit 1
for update;
 sequence | done
----------+------
        1 | f
TRANSACTION B:
primer=# begin;
BEGIN
primer=# select * from some_que where done = false order by sequence limit 1
for update;
TRANSACTION A:
primer=# update some_que set done = true where sequence = 1;
UPDATE 1
primer=# commit;
COMMIT
TRANSACTION B:
 sequence | done
----------+------
(0 rows)
... as you can see, it falsely reports no rows.
			
		Josh Berkus <josh@agliodbs.com> writes:
> Summary:  SELECT FOR UPDATE and LIMIT behave oddly when combined
The FOR UPDATE part executes after the LIMIT part.  Arguably this is a
bad thing, but I'm concerned about the compatibility issues if we change
it.
            regards, tom lane
			
		Tom, > The FOR UPDATE part executes after the LIMIT part. Arguably this is a > bad thing, but I'm concerned about the compatibility issues if we change > it. In that case, maybe I should do a doc patch warning people not to combine them? Hmmm .... come to think of it, is there any easy way to query "give me the first row which is not locked"? If I tie pg_locks to a query, will I get wierd effects? Just musing .... -- Josh Berkus Aglio Database Solutions San Francisco
On Thu, 2004-10-14 at 14:02, Tom Lane wrote: > The FOR UPDATE part executes after the LIMIT part. Arguably this is a > bad thing, but I'm concerned about the compatibility issues if we change > it. I agree backward compat is a concern, but it seems pretty clear to me that this is not the optimal behavior. If there are any people who actually need the old behavior, they can nest the FOR UPDATE in a FROM-clause subselect: SELECT * FROM foo FOR UPDATE LIMIT 5; -- used to lock the whole table SELECT * FROM (SELECT * FROM foo FOR UPDATE) x LIMIT 5; -- will always do so Would it be sufficient to put a large warning in the "incompatibilities" section of the release notes? -Neil
Neil Conway <neilc@samurai.com> writes:
> I agree backward compat is a concern, but it seems pretty clear to me
> that this is not the optimal behavior. If there are any people who
> actually need the old behavior, they can nest the FOR UPDATE in a
> FROM-clause subselect:
> SELECT * FROM foo FOR UPDATE LIMIT 5; -- used to lock the whole table
> SELECT * FROM (SELECT * FROM foo FOR UPDATE) x LIMIT 5; -- will always
> do so
Allowing FOR UPDATE in sub-selects opens a can of worms that I do not
think we'll be able to re-can (at least not without the proverbial
larger size of can).  The fundamental question about the above construct
is: exactly which rows did it lock?  And what's your proof that that set
is what it *should* have locked?  What if some of the locked rows didn't
get returned to the client?  Even if LIMIT happens to work in a
convenient way, allowing FOR UPDATE inside a subselect would expose us
to a lot of other cases (joins and aggregates for instance) that I don't
believe we can guarantee pleasant behavior for.
My recollection is that the original FOR UPDATE and LIMIT behaviors were
both implemented at the top level in execMain.c, and at that time LIMIT
effectively executed after FOR UPDATE.  We later pushed LIMIT down to
become a plan node, which was a good idea in every respect except that
it changed the order of application of these two behaviors.  I'm afraid
of the semantic consequences of pushing down FOR UPDATE into a plan node
however.  Maybe it can be made to work but I think a lot of very careful
thought will need to go into it.
            regards, tom lane
			
		On Fri, 2004-10-15 at 14:22, Tom Lane wrote:
> Allowing FOR UPDATE in sub-selects opens a can of worms that I do not
> think we'll be able to re-can (at least not without the proverbial
> larger size of can).
Ah, I see. I had tried some trivial queries to determine if we supported
FOR UPDATE in subqueries, such as:
select * from def, abc, (select * from abc for update) x;
But of course a more careful examination shows that we don't (I'd guess
the planner is transforming the above subquery into a join). I think it
would make sense to reject the above query for the sake of consistency.
It seems that would be easy to do by rejecting FOR UPDATE of subqueries
in the analysis phase, rather than going to the trouble of explicitly
allowing them (see analyze.c circa line 2753) and then rejecting them in
the planner.
BTW, FOR UPDATE's interaction with LIMIT is not undocumented -- we
actually document the opposite of what we implement. From the SELECT ref
page:
        FOR UPDATE may appear before LIMIT for compatibility with
        PostgreSQL versions before 7.3. It effectively executes after
        LIMIT, however, and so that is the recommended place to write
        it.
> The fundamental question about the above construct is: exactly which
> rows did it lock?
I'm not sure I understand. The rows the query locks should be the result
set of the subquery. Also, I think it only makes sense to allow FOR
UPDATE in FROM-clause subselects, and it would also be reasonable to
disable some subquery optimizations (e.g. subquery pullup) when FOR
UPDATE is specified.
> What if some of the locked rows didn't get returned to the client?
In the case of SELECT ... FOR UPDATE LIMIT x, exactly the same condition
applies: some number of locked rows will not be returned to the client.
-Neil
			
		Neil Conway <neilc@samurai.com> writes:
> On Fri, 2004-10-15 at 14:22, Tom Lane wrote:
>> What if some of the locked rows didn't get returned to the client?
> In the case of SELECT ... FOR UPDATE LIMIT x, exactly the same condition
> applies: some number of locked rows will not be returned to the client.
Au contraire: every row that gets locked will be returned to the client.
The gripe at hand is that the number of such rows may be smaller than
the client wished, because the LIMIT step is applied before we do the
FOR UPDATE step (which has not only the effect of locking selected rows,
but of disregarding rows that were deleted since our query snapshot).
            regards, tom lane
			
		On Fri, 2004-10-15 at 15:30, Tom Lane wrote: > Au contraire: every row that gets locked will be returned to the client. > The gripe at hand is that the number of such rows may be smaller than > the client wished, because the LIMIT step is applied before we do the > FOR UPDATE step Ah, my apologies -- I misunderstood. Clearly not enough coffee this morning :-) Sorry for the noise. -Neil
Tom, Neil, > > Au contraire: every row that gets locked will be returned to the client. > > The gripe at hand is that the number of such rows may be smaller than > > the client wished, because the LIMIT step is applied before we do the > > FOR UPDATE step As I said, I think this can be taken care of with a doc patch. The truth is that FOR UPDATE LIMIT is not really terribly useful (it will still block outer queries to that table with the same LIMIT clause, so why not lock the whole table?). I propose that I add this sentence to the Docs: -------------- Please not that, since LIMIT is applied before FOR UPDATE, rows which disappear from the target set while waiting for a lock may result in less than LIMIT # of rows being returned. This can result in unintuitive behavior, so FOR UPDATE and LIMIT should only be combined after significant testing. --------------- Here's a question, though, for my education: It's possible to query "Please lock the first row which is not already locked" by including pg_locks, pg_class and xmax in the query set. Tom warned that this could result in a race condition. If the query-and-lock were a single statement, how would a race condition result? How could I test for it? -- Josh Berkus Aglio Database Solutions San Francisco
On Fri, 2004-10-15 at 17:09, Josh Berkus wrote:
>   I propose that I add this sentence to the Docs:
>
> --------------
> Please not that, since LIMIT is applied before FOR UPDATE, rows which
         ^^^
I assume this should be "note".  It took me a little time to parse your
plaintive appeal correctly. :-)
> disappear from the target set while waiting for a lock may result in less
> than LIMIT # of rows being returned.   This can result in unintuitive
> behavior, so FOR UPDATE and LIMIT should only be combined after significant
> testing.
> ---------------
--
Oliver Elphick                                          olly@lfix.co.uk
Isle of Wight                              http://www.lfix.co.uk/oliver
GPG: 1024D/A54310EA  92C8 39E7 280E 3631 3F0E  1EC0 5664 7A2F A543 10EA
                 ========================================
     "But be ye doers of the word, and not hearers only,
      deceiving your own selves."              James 1:22
			
		Bruce, Ah, yes, which reminds me I need to generate that doc patch. > I am wondering if a documentation warning about the use of FOR UPDATE > and LIMIT is a good idea. If we can't be sure the LIMIT will return a > guaranteed number of rows, should we just disallow that combination? I > realize such a case is rare. Should we emit a warning when it happens? Well, limit+for update can be useful under some circumstances, as long as you understand its limitations. We found a workaround. So I'd oppose disallowing it. -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus wrote: > Tom, Neil, > > > > Au contraire: every row that gets locked will be returned to the client. > > > The gripe at hand is that the number of such rows may be smaller than > > > the client wished, because the LIMIT step is applied before we do the > > > FOR UPDATE step > > As I said, I think this can be taken care of with a doc patch. The truth is > that FOR UPDATE LIMIT is not really terribly useful (it will still block > outer queries to that table with the same LIMIT clause, so why not lock the > whole table?). I propose that I add this sentence to the Docs: > > -------------- > Please not that, since LIMIT is applied before FOR UPDATE, rows which > disappear from the target set while waiting for a lock may result in less > than LIMIT # of rows being returned. This can result in unintuitive > behavior, so FOR UPDATE and LIMIT should only be combined after significant > testing. > --------------- > > Here's a question, though, for my education: It's possible to query "Please > lock the first row which is not already locked" by including pg_locks, > pg_class and xmax in the query set. Tom warned that this could result in a > race condition. If the query-and-lock were a single statement, how would a > race condition result? How could I test for it? I am wondering if a documentation warning about the use of FOR UPDATE and LIMIT is a good idea. If we can't be sure the LIMIT will return a guaranteed number of rows, should we just disallow that combination? I realize such a case is rare. Should we emit a warning when it happens? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
Andrea, > i'm sorry for the curiosity.... but > could you share, if it's possible, this workaround? ;) > (if it's not the one you describe at the beginning thread > e.g. don't use LIMIT 1) Well, we actually roped in the pg_locks view to do a "SELECT the first row not already locked for update". Then added some code on the client end for error handling, like race conditions and no rows being returned, both of which happen in production. -- --Josh Josh Berkus Aglio Database Solutions San Francisco
hi! Josh Berkus wrote: > Bruce, > [snip] > > > Well, limit+for update can be useful under some circumstances, as long as you > understand its limitations. We found a workaround. i'm sorry for the curiosity.... but could you share, if it's possible, this workaround? ;) andrea
tnks for the hint ;) I'll try something similar here. Andrea Josh Berkus wrote: > Andrea, > > >>i'm sorry for the curiosity.... but >>could you share, if it's possible, this workaround? ;) >>(if it's not the one you describe at the beginning thread >> e.g. don't use LIMIT 1) > > > Well, we actually roped in the pg_locks view to do a "SELECT the first row not > already locked for update". Then added some code on the client end for > error handling, like race conditions and no rows being returned, both of > which happen in production. >
I have documented the possible problem with LIMIT and FOR UPDATE. I also remove the mention that FOR UPDATE can appear before LIMIT for pre-7.3 compatibility. Patch applied to CVS HEAD only. --------------------------------------------------------------------------- Josh Berkus wrote: > Tom, Neil, > > > > Au contraire: every row that gets locked will be returned to the client. > > > The gripe at hand is that the number of such rows may be smaller than > > > the client wished, because the LIMIT step is applied before we do the > > > FOR UPDATE step > > As I said, I think this can be taken care of with a doc patch. The truth is > that FOR UPDATE LIMIT is not really terribly useful (it will still block > outer queries to that table with the same LIMIT clause, so why not lock the > whole table?). I propose that I add this sentence to the Docs: > > -------------- > Please not that, since LIMIT is applied before FOR UPDATE, rows which > disappear from the target set while waiting for a lock may result in less > than LIMIT # of rows being returned. This can result in unintuitive > behavior, so FOR UPDATE and LIMIT should only be combined after significant > testing. > --------------- > > Here's a question, though, for my education: It's possible to query "Please > lock the first row which is not already locked" by including pg_locks, > pg_class and xmax in the query set. Tom warned that this could result in a > race condition. If the query-and-lock were a single statement, how would a > race condition result? How could I test for it? > > -- > Josh Berkus > Aglio Database Solutions > San Francisco > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 Index: doc/src/sgml/ref/select.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/ref/select.sgml,v retrieving revision 1.83 diff -c -c -r1.83 select.sgml *** doc/src/sgml/ref/select.sgml 8 Apr 2005 00:59:58 -0000 1.83 --- doc/src/sgml/ref/select.sgml 22 Apr 2005 04:15:06 -0000 *************** *** 830,840 **** </para> <para> ! <literal>FOR UPDATE</literal> may appear before ! <literal>LIMIT</literal> for compatibility with ! <productname>PostgreSQL</productname> versions before 7.3. It ! effectively executes after <literal>LIMIT</literal>, however, and ! so that is the recommended place to write it. </para> </refsect2> </refsect1> --- 830,842 ---- </para> <para> ! It is possible for a <command>SELECT</> command using both ! <literal>LIMIT</literal> and <literal>FOR UPDATE</literal> ! clauses to return fewer rows than specified by <literal>LIMIT</literal>. ! This is because <literal>LIMIT</> selects a number of rows, ! but might then block requesting a <literal>FOR UPDATE</literal> lock. ! Once the <literal>SELECT</> unblocks, the query qualifiation might not ! be met and the row not be returned by <literal>SELECT</>. </para> </refsect2> </refsect1>