Обсуждение: BUG #1512: Assertion failure (lock.c:1537) with SELECT FOR UPDATE and savepoints

Поиск
Список
Период
Сортировка

BUG #1512: Assertion failure (lock.c:1537) with SELECT FOR UPDATE and savepoints

От
"Stephen Clouse"
Дата:
The following bug has been logged online:

Bug reference:      1512
Logged by:          Stephen Clouse
Email address:      stephenc@theiqgroup.com
PostgreSQL version: 8.0.1
Operating system:   Fedora Core 3
Description:        Assertion failure (lock.c:1537) with SELECT FOR UPDATE
and savepoints
Details:

You need two psql sessions going to reproduce this.  Start with this very
simple schema:

CREATE TABLE foo (bar NUMERIC);
INSERT INTO foo VALUES (1);

Now, start session 1:

> BEGIN;
> SELECT * FROM foo WHERE bar = 1 FOR UPDATE;

 bar
-----
   1
(1 row)

Switch to session 2:

> BEGIN;
> SAVEPOINT foo;
> SELECT * FROM foo WHERE bar = 1 FOR UPDATE;
(Abort this with Ctrl-C)
Cancel request sent
ERROR:  canceling query due to user request
> ROLLBACK TO SAVEPOINT foo;

Back to session 1:

> ROLLBACK;
Session 1's backend will now die horribly and trigger a server reset.

Log shows the following as the cause of the server abort:

TRAP: FailedAssertion("!(SHMQueueEmpty(&(lock->procLocks)))", File:
"lock.c", Line: 1537)


I have not achieved guru status with the PostgreSQL code yet, otherwise I'd
send a patch along with this.

Re: BUG #1512: Assertion failure (lock.c:1537) with SELECT FOR UPDATE and savepoints

От
Michael Fuhr
Дата:
On Tue, Mar 01, 2005 at 02:04:30AM +0000, Stephen Clouse wrote:

> TRAP: FailedAssertion("!(SHMQueueEmpty(&(lock->procLocks)))", File:
> "lock.c", Line: 1537)

I can duplicate this on FreeBSD 4.11-STABLE with the latest code
from REL8_0_STABLE (--enable-cassert required during build).  Here's
a stack trace:

#0  0x284997ac in kill () from /usr/lib/libc.so.4
#1  0x284db0a6 in abort () from /usr/lib/libc.so.4
#2  0x81ed93b in ExceptionalCondition () at assert.c:51
#3  0x8183191 in LockReleaseAll (lockmethodid=1, allxids=1 '\001') at lock.c:1537
#4  0x8183dbd in ProcReleaseLocks (isCommit=0) at proc.c:439
#5  0x81ffe69 in ResourceOwnerReleaseInternal (owner=0x82fefbc, phase=RESOURCE_RELEASE_LOCKS, isCommit=0 '\000',
isTopLevel=1'\001') at resowner.c:252 
#6  0x81ffd12 in ResourceOwnerRelease (owner=0x82fefbc, phase=RESOURCE_RELEASE_LOCKS, isCommit=0 '\000', isTopLevel=1
'\001')at resowner.c:160 
#7  0x809bedd in AbortTransaction () at xact.c:1694
#8  0x809c141 in CommitTransactionCommand () at xact.c:1906
#9  0x818ae7e in finish_xact_command () at postgres.c:1843
#10 0x8189da4 in exec_simple_query (query_string=0x836d01c "ROLLBACK;") at postgres.c:965
#11 0x818c2ab in PostgresMain (argc=4, argv=0x82fd274, username=0x82fd24c "mfuhr") at postgres.c:3007
#12 0x8163f41 in BackendRun (port=0x8313600) at postmaster.c:2816
#13 0x8163742 in BackendStartup (port=0x8313600) at postmaster.c:2452
#14 0x8161c9e in ServerLoop () at postmaster.c:1199
#15 0x81616a6 in PostmasterMain (argc=3, argv=0xbfbffc88) at postmaster.c:918
#16 0x8132b15 in main (argc=3, argv=0xbfbffc88) at main.c:268

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/

Re: BUG #1512: Assertion failure (lock.c:1537) with SELECT FOR UPDATE and savepoints

От
Tom Lane
Дата:
"Stephen Clouse" <stephenc@theiqgroup.com> writes:
> Description:        Assertion failure (lock.c:1537) with SELECT FOR UPDATE

It looks to me like the problem is that RemoveFromWaitQueue() is too
lazy.  Its comments say

 * NB: this does not remove the process' proclock object, nor the lock object,
 * even though their counts might now have gone to zero.  That will happen
 * during a subsequent LockReleaseAll call, which we expect will happen
 * during transaction cleanup.    (Removal of a proc from its wait queue by
 * this routine can only happen if we are aborting the transaction.)

but of course LockReleaseAll is not called until ROLLBACK.  I think the
scenario is:

* Query cancel in session 2 kicks the session off session 1's
transaction ID lock, but because of above it leaves a proclock
entry with count zero attached to the lock.

* Rollback in session 1 tries to remove the transaction ID lock,
and gets unhappy because there is still a proclock attached to it.
(A commit in session 1 fails the same way.)

In reality this code has been broken right along, but until 8.0 there
was only a very narrow window for failure --- session 1 would have to
try to release the lock between RemoveFromWaitQueue and LockReleaseAll
in session 2's transaction abort sequence.

ISTM we have to fix RemoveFromWaitQueue to remove the proclock object
immediately if its count has gone to zero.  It should be impossible
for the lock's count to have gone to zero (that would imply no one
else holds the lock, so we couldn't be waiting on it) so an Assert
is sufficient for that part.

Comments?

            regards, tom lane

Re: BUG #1512: Assertion failure (lock.c:1537) with SELECT FOR UPDATE and savepoints

От
Tom Lane
Дата:
I wrote:
> ISTM we have to fix RemoveFromWaitQueue to remove the proclock object
> immediately if its count has gone to zero.  It should be impossible
> for the lock's count to have gone to zero (that would imply no one
> else holds the lock, so we couldn't be waiting on it) so an Assert
> is sufficient for that part.

I've applied a patch along these lines; it seems to make the problem
go away.

            regards, tom lane