Обсуждение: deadlock debug methodology question

Поиск
Список
Период
Сортировка

deadlock debug methodology question

От
"antiochus antiochus"
Дата:
Hi All,
 
I have a deadlock situation, two transactions waiting on each other to complete. Based on the details below, would anyone have recommendations for me, please?
 
Regards,
 
A.
 
 
I am using:
rpm -qa|grep postgres
compat-postgresql-libs-4-2PGDG.rhel5_x86_64
postgresql-server-8.2.6-1PGDG.rhel5
postgresql-8.2.6-1PGDG.rhel5
postgresql-devel-8.2.6-1PGDG.rhel5
postgresql-libs-8.2.6-1PGDG.rhel5
 
I set 'deadlock_timeout = 1h' in order to have time to inspect pg_locks.

The locks are:
 
db0=# select * from pg_locks where not granted;
   locktype    | database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction | pid  |   mode    | granted
---------------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+------+-----------+---------
 transactionid |          |          |      |       |          1407 |         |       |          |        1404 | 8303 | ShareLock | f
 transactionid |          |          |      |       |          1404 |         |       |          |        1407 | 8277 | ShareLock | f
(2 rows)
 
Each transaction seems to be waiting on a row-level lock the other has acquired. The tuples are:
 
db0=# select * from pg_locks where locktype='tuple';
 locktype | database | relation | page | tuple | transactionid | classid | objid | objsubid | transaction | pid  |     mode      | granted
----------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+------+---------------+---------
 tuple    |    16384 |    16576 |   38 |     6 |               |         |       |          |        1407 | 8277 | ExclusiveLock | t
 tuple    |    16384 |    16576 |   38 |     5 |               |         |       |          |        1404 | 8303 | ShareLock     | t
(2 rows)

The corresponding rows are:
 
db0=# select id from tt where ctid = '(38,6)';
 id 
-----
 600
(1 row)
db0=# select id from tt where ctid = '(38,5)';
 id 
-----
 611
(1 row)

Note that the id column is defined as 'id serial primary key'.
 
The two queries in effect in each transaction are found using:
 
select current_query from pg_stat_activity where procpid = 8303;
select current_query from pg_stat_activity where procpid = 8277;
 
Careful inspection of these (unfortunately complex) queries seems to indicate row-level locks are acquired in consistent order, assuming that any command of the type
 
update tt where ....
 
will always lock rows in a consistent order (can someone confirm that it is necessarily the case).

Therefore, it is not clear to me how this deadlock situation arises.

Does anyone have a recommendation?
 

Re: deadlock debug methodology question

От
Bill Moran
Дата:
In response to "antiochus antiochus" <antiochus.usa@gmail.com>:
>
> I have a deadlock situation, two transactions waiting on each other to
> complete. Based on the details below, would anyone have recommendations for
> me, please?

I have a theory on deadlocks, and that theory is that it's damn near
impossible to track them all down, so your best bet is to wrap all
SQL calls in a function that detects deadlock and sleep/retries.

[snip]

> Careful inspection of these (unfortunately complex) queries seems to
> indicate row-level locks are acquired in consistent order, assuming that any
> command of the type
>
> update tt where ....
>
> will always lock rows in a consistent order (can someone confirm that it is
> necessarily the case).

I believe that assertion is incorrect.  Without seeing your entire
query, I can only speculate, but unless you have an explicit ordering
clause, there's no guarantee what order rows will be accessed in.

Try putting an explicit ORDER BY in the queries and see if the problem
goes away.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: deadlock debug methodology question

От
"antiochus antiochus"
Дата:
Thanks for your responses, I appreciate the help.
 
I gather from this that 2 transactions concurrently running the exact same:
 
    update table tt where ...

could end up in deadlock because it is not garanteed row-level locks are taken in a consistent order in an update.
 
One possibility might then seem to do something like:
 
    update table tt where ID in (select ID from tt where ... order by ID asc for update);
 
but unfortunately 'for update' is not allows in subqueries. Therefore, one could do:
 
    select ID from tt where ... order by ID asc for update;
    update table tt where ...;
 
However, in read committed mode, it is not garanteed that the subset of rows selected with the two 'where' tests will be the same...
 
Therefore, it seems impossible to solve this issue without using some form of explicit locking.
 
Does this make sense to you?
 
 
 

 
On Thu, May 22, 2008 at 2:57 PM, Bill Moran <wmoran@collaborativefusion.com> wrote:
In response to "antiochus antiochus" <antiochus.usa@gmail.com>:
>
> I have a deadlock situation, two transactions waiting on each other to
> complete. Based on the details below, would anyone have recommendations for
> me, please?

I have a theory on deadlocks, and that theory is that it's damn near
impossible to track them all down, so your best bet is to wrap all
SQL calls in a function that detects deadlock and sleep/retries.

[snip]

> Careful inspection of these (unfortunately complex) queries seems to
> indicate row-level locks are acquired in consistent order, assuming that any
> command of the type
>
> update tt where ....
>
> will always lock rows in a consistent order (can someone confirm that it is
> necessarily the case).

I believe that assertion is incorrect.  Without seeing your entire
query, I can only speculate, but unless you have an explicit ordering
clause, there's no guarantee what order rows will be accessed in.

Try putting an explicit ORDER BY in the queries and see if the problem
goes away.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: deadlock debug methodology question

От
Bill Moran
Дата:
Please don't top-post.  I've attempted to reconstruct the conversation
flow.

In response to "antiochus antiochus" <antiochus.usa@gmail.com>:
>
> On Thu, May 22, 2008 at 2:57 PM, Bill Moran <wmoran@collaborativefusion.com>
> wrote:
>
> > In response to "antiochus antiochus" <antiochus.usa@gmail.com>:
> > >
> > > I have a deadlock situation, two transactions waiting on each other to
> > > complete. Based on the details below, would anyone have recommendations
> > for
> > > me, please?
> >
> > I have a theory on deadlocks, and that theory is that it's damn near
> > impossible to track them all down, so your best bet is to wrap all
> > SQL calls in a function that detects deadlock and sleep/retries.
>
> One possibility might then seem to do something like:
>
>     update table tt where ID in (select ID from tt where ... order by ID asc
> for update);
>
> but unfortunately 'for update' is not allows in subqueries. Therefore, one
> could do:
>
>     select ID from tt where ... order by ID asc for update;
>     update table tt where ...;
>
> However, in read committed mode, it is not guaranteed that the subset of rows
> selected with the two 'where' tests will be the same...

I can see two solutions:

BEGIN;
SET TRANSACTION SERIALIZABLE
select ID from tt where ... order by ID asc for update;
update table tt where ...;
COMMIT;

or

BEGIN;
LOCK TABLE tt IN SHARE MODE;
select ID from tt where ... order by ID asc for update;
update table tt where ...;
COMMIT;

Depending on exactly what you need to accomplish.

> > [snip]
> >
> > > Careful inspection of these (unfortunately complex) queries seems to
> > > indicate row-level locks are acquired in consistent order, assuming that
> > any
> > > command of the type
> > >
> > > update tt where ....
> > >
> > > will always lock rows in a consistent order (can someone confirm that it
> > is
> > > necessarily the case).
> >
> > I believe that assertion is incorrect.  Without seeing your entire
> > query, I can only speculate, but unless you have an explicit ordering
> > clause, there's no guarantee what order rows will be accessed in.
> >
> > Try putting an explicit ORDER BY in the queries and see if the problem
> > goes away.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: deadlock debug methodology question

От
"antiochus antiochus"
Дата:


On Thu, May 22, 2008 at 4:20 PM, Bill Moran <wmoran@collaborativefusion.com> wrote:

Please don't top-post.  I've attempted to reconstruct the conversation
flow.

In response to "antiochus antiochus" <antiochus.usa@gmail.com>:
>
> On Thu, May 22, 2008 at 2:57 PM, Bill Moran <wmoran@collaborativefusion.com>
> wrote:
>
> > In response to "antiochus antiochus" <antiochus.usa@gmail.com>:
> > >
> > > I have a deadlock situation, two transactions waiting on each other to
> > > complete. Based on the details below, would anyone have recommendations
> > for
> > > me, please?
> >
> > I have a theory on deadlocks, and that theory is that it's damn near
> > impossible to track them all down, so your best bet is to wrap all
> > SQL calls in a function that detects deadlock and sleep/retries.
>
> One possibility might then seem to do something like:
>
>     update table tt where ID in (select ID from tt where ... order by ID asc
> for update);
>
> but unfortunately 'for update' is not allows in subqueries. Therefore, one
> could do:
>
>     select ID from tt where ... order by ID asc for update;
>     update table tt where ...;
>
> However, in read committed mode, it is not guaranteed that the subset of rows
> selected with the two 'where' tests will be the same...

I can see two solutions:

BEGIN;
SET TRANSACTION SERIALIZABLE
select ID from tt where ... order by ID asc for update;
update table tt where ...;
COMMIT;

or

BEGIN;
LOCK TABLE tt IN SHARE MODE;
select ID from tt where ... order by ID asc for update;
update table tt where ...;
COMMIT;

Depending on exactly what you need to accomplish.

> > [snip]
> >
> > > Careful inspection of these (unfortunately complex) queries seems to
> > > indicate row-level locks are acquired in consistent order, assuming that
> > any
> > > command of the type
> > >
> > > update tt where ....
> > >
> > > will always lock rows in a consistent order (can someone confirm that it
> > is
> > > necessarily the case).
> >
> > I believe that assertion is incorrect.  Without seeing your entire
> > query, I can only speculate, but unless you have an explicit ordering
> > clause, there's no guarantee what order rows will be accessed in.
> >
> > Try putting an explicit ORDER BY in the queries and see if the problem
> > goes away.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

 
 
Thanks, Bill.
 
For the past fews days, I've been making the application work using your second suggestion (what I meant by "some form of explicit locking"). It does works.
 
However, it is the first explicit lock used in this growing application and I was trying to not have any. The rows of this table are, from a business logic perspective, partitioned into distinct sets (think one set per customer) and a transaction would never involve rows across these sets. So in the original design, concurrency is great across customer requests. By using the table-level lock, writing transactions targeting these non-overlapping per-customer sets end up waiting on each other...
 
If that is the best we can do, that is it. However, I do not understand why acquiring row-level locks in consistent order does not seem to be enough, or if that is so, why my methodology to enforce this ordering is flawed. Note that I also use foreign keys and I am aware of the fact that constraints are verified (and lock acquired) after row insertion and this is taken into account as well. I could have a "per-customer" lock, to improve concurrency across customers while avoiding deadlocks, or use seralizable transactions, but I was wondering if a methodology to acomplish what I was originally trying to do has been documented anywhere.
 
Thanks a lot for your suggestions.
 
Regards,
 
A.
 

Re: deadlock debug methodology question

От
Bill Moran
Дата:
In response to "antiochus antiochus" <antiochus.usa@gmail.com>:

> On Thu, May 22, 2008 at 4:20 PM, Bill Moran <wmoran@collaborativefusion.com>
> wrote:
> >
> > In response to "antiochus antiochus" <antiochus.usa@gmail.com>:
> > >
> > > On Thu, May 22, 2008 at 2:57 PM, Bill Moran <
> > wmoran@collaborativefusion.com>
> > > wrote:
> > >
> > > > In response to "antiochus antiochus" <antiochus.usa@gmail.com>:
> > > > >
> > > > > I have a deadlock situation, two transactions waiting on each other
> > to
> > > > > complete. Based on the details below, would anyone have
> > recommendations
> > > > for
> > > > > me, please?
> > > >
> > > > I have a theory on deadlocks, and that theory is that it's damn near
> > > > impossible to track them all down, so your best bet is to wrap all
> > > > SQL calls in a function that detects deadlock and sleep/retries.

[snip]

> For the past fews days, I've been making the application work using your
> second suggestion (what I meant by "some form of explicit locking"). It does
> works.
>
> However, it is the first explicit lock used in this growing application and
> I was trying to not have any. The rows of this table are, from a business
> logic perspective, partitioned into distinct sets (think one set per
> customer) and a transaction would never involve rows across these sets. So
> in the original design, concurrency is great across customer requests.
> By using the table-level lock, writing transactions targeting these
> non-overlapping per-customer sets end up waiting on each other...
>
> If that is the best we can do, that is it. However, I do not understand why
> acquiring row-level locks in consistent order does not seem to be enough, or
> if that is so, why my methodology to enforce this ordering is flawed. Note
> that I also use foreign keys and I am aware of the fact that constraints are
> verified (and lock acquired) after row insertion and this is taken into
> account as well. I could have a "per-customer" lock, to improve concurrency
> across customers while avoiding deadlocks, or use seralizable transactions,
> but I was wondering if a methodology to accomplish what I was originally
> trying to do has been documented anywhere.

I seem to remember us determining that you were _not_ acquiring row-level
locks in a consistent order.  Did I miss something?  AFAIK, if you can do
so in a consistent order, you will never hit deadlocks.

Also, I stand by my original statement that deadlocks are nearly impossible
to 100% avoid and they're easy to recover from, so it makes sense to
program your application to detect them and retry appropriately.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023