Обсуждение: Problem with pg_try_advisory_lock and two connections (seemingly) getting the same lock

Поиск
Список
Период
Сортировка

Problem with pg_try_advisory_lock and two connections (seemingly) getting the same lock

От
Brett Hoerner
Дата:
Hi,

I currently have a simple queue written ontop of Postgres.  Jobs are
inserted and workers periodically check for jobs they can do, do them,
and then delete the rows.  pg_try_advisory_lock is used to (attempt
to) stop two workers from doing the same job.

(I'm working on moving to a "real" messaging queue right now, this is
more a point of curiosity and education now.)

Here is my queue table,

CREATE TABLE queue (
    id serial NOT NULL PRIMARY KEY,
    rcvd timestamp with time zone,
    sent timestamp with time zone,
    host character varying(32),
    job character varying(32),
    arg text
);

Here is an example query,

SELECT q.*
FROM (SELECT id, job, arg
      FROM queue
      WHERE job = 'foo' OR job = 'bar'
      OFFSET 0) AS q
WHERE pg_try_advisory_lock(1, q.id)
LIMIT 10

(For information on OFFSET 0 see:
http://blog.endpoint.com/2009/04/offset-0-ftw.html)

Now if I have two workers running I will periodically see that each
worker gets a row with the same q.id (and thus does the work).  How is
that possible?  The outer query seemingly does a WHERE on an
advisory_lock.

Does anyone have any ideas?  Am I grossly misusing advisory_locks?

Thanks,
Brett

Re: Problem with pg_try_advisory_lock and two connections (seemingly) getting the same lock

От
Craig Ringer
Дата:
On 21/07/10 07:27, Brett Hoerner wrote:

> Here is an example query,
>
> SELECT q.*
> FROM (SELECT id, job, arg
>       FROM queue
>       WHERE job = 'foo' OR job = 'bar'
>       OFFSET 0) AS q
> WHERE pg_try_advisory_lock(1, q.id)
> LIMIT 10
>
> (For information on OFFSET 0 see:
> http://blog.endpoint.com/2009/04/offset-0-ftw.html)
>
> Now if I have two workers running I will periodically see that each
> worker gets a row with the same q.id (and thus does the work).  How is
> that possible?  The outer query seemingly does a WHERE on an
> advisory_lock.
>
> Does anyone have any ideas?  Am I grossly misusing advisory_locks?

You kick off two queries at once. Both have subqueries that grab a set
of id,job,arg . There's no exclusion at this stage, so they can easily
both land up with some or all of the same results.

THEN you filter the result. The filter will drop the result list to
empty if it can't acquire the lock. Under what circumstances can it not
acquire the lock? If another transaction holds it.

The first transaction might have grabbe the data, acquired the lock,
done its processing, and committed/rolled back to *release* the lock
before the second transaction gets around to checking the lock. In this
case, the second transaction will happily acquire the lock.

Classic race condition.

You should probably use one of the existing queuing mechanisms rather
than rolling your own, because building a high-performance, reliable
queueing mechanism is surprisingly hard to build. A search of the
archives here will turn up several options. I've noticed that PGQ from
Skytools gets mentioned a lot.

--
Craig Ringer

Re: Problem with pg_try_advisory_lock and two connections (seemingly) getting the same lock

От
Merlin Moncure
Дата:
On Tue, Jul 20, 2010 at 10:52 PM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
> On 21/07/10 07:27, Brett Hoerner wrote:
>
>> Here is an example query,
>>
>> SELECT q.*
>> FROM (SELECT id, job, arg
>>       FROM queue
>>       WHERE job = 'foo' OR job = 'bar'
>>       OFFSET 0) AS q
>> WHERE pg_try_advisory_lock(1, q.id)
>> LIMIT 10
>>
>> (For information on OFFSET 0 see:
>> http://blog.endpoint.com/2009/04/offset-0-ftw.html)
>>
>> Now if I have two workers running I will periodically see that each
>> worker gets a row with the same q.id (and thus does the work).  How is
>> that possible?  The outer query seemingly does a WHERE on an
>> advisory_lock.
>>
>> Does anyone have any ideas?  Am I grossly misusing advisory_locks?
>
> You kick off two queries at once. Both have subqueries that grab a set
> of id,job,arg . There's no exclusion at this stage, so they can easily
> both land up with some or all of the same results.
>
> THEN you filter the result. The filter will drop the result list to
> empty if it can't acquire the lock. Under what circumstances can it not
> acquire the lock? If another transaction holds it.
>
> The first transaction might have grabbe the data, acquired the lock,
> done its processing, and committed/rolled back to *release* the lock
> before the second transaction gets around to checking the lock. In this
> case, the second transaction will happily acquire the lock.
>
> Classic race condition.
>
> You should probably use one of the existing queuing mechanisms rather
> than rolling your own, because building a high-performance, reliable
> queueing mechanism is surprisingly hard to build. A search of the
> archives here will turn up several options. I've noticed that PGQ from
> Skytools gets mentioned a lot.

I generally agree with your statements, but there is one correction to
make: advisory locks are not released at end of transaction.  You
don't have to worry about the race as long as you make sure the record
'get' and 'lock' are done in the same operation.  This is indeed quite
tricky to get right, but I'm interested on academic grounds; I want to
know if the issue is the lock itself or the post lock handling.  OP:
can we also see how the lock is released?

merlin

Re: Problem with pg_try_advisory_lock and two connections (seemingly) getting the same lock

От
Craig Ringer
Дата:
On 24/07/10 00:00, Merlin Moncure wrote:

> I generally agree with your statements, but there is one correction to
> make: advisory locks are not released at end of transaction.

Argh. Good point. Every other kind of lock is, but advisory locks are
only released when the connection is closed or the lock is explicitly
released.

Thanks for the correction.

--
Craig Ringer