Обсуждение: Deadlock Problem
<fontfamily><param>Courier</param><x-tad-smaller>Hello,
I am in trouble with more and more deadlock problems. We are
programming a web application with multiple users editing content at
the same time. Multiple times a day PostgreSQL runs into a deadlock,
which can only be resolved by killing some of the clients. Here is an
example from the pg_locks table:
SELECT l.mode, l.granted, l.pid, l.transaction, d.datname, r.relname
FROM pg_locks l, pg_database d, pg_class r WHERE d.oid=l.database AND
r.oid=l.relation;
mode | granted | pid | transaction | datname
| relname
------------------+---------+-------+-------------+--------------------+--------------------------
AccessShareLock | t | 12708 | | p247_website_1_1_0
| pg_locks
AccessShareLock | t | 12708 | | p247_website_1_1_0
| pg_class
AccessShareLock | t | 12714 | | p247_website_1_1_0
| upd_dependency_id
AccessShareLock | t | 12714 | | p247_website_1_1_0
| preset
AccessShareLock | t | 12726 | | p247_website_1_1_0
| preset
AccessShareLock | t | 12714 | | p247_website_1_1_0
| file
AccessShareLock | t | 12726 | | p247_website_1_1_0
| node_pkey
AccessShareLock | t | 12714 | | p247_website_1_1_0
| descr_node
AccessShareLock | t | 12726 | | p247_website_1_1_0
| systemuser_usergroup_rel
AccessShareLock | t | 12726 | | p247_website_1_1_0
| permgroup_permission_rel
AccessShareLock | t | 12726 | | p247_website_1_1_0
| usergroup_permgroup_rel
AccessShareLock | t | 12726 | | p247_website_1_1_0
| account_permission_grant
AccessShareLock | t | 12726 | | p247_website_1_1_0
| permission
AccessShareLock | t | 12726 | | p247_website_1_1_0
| account_permgroup_grant
AccessShareLock | t | 12714 | | p247_website_1_1_0
| environment
AccessShareLock | t | 12726 | | p247_website_1_1_0
| environment
AccessShareLock | t | 12726 | | p247_website_1_1_0
| systemuser
AccessShareLock | t | 12726 | | p247_website_1_1_0
| account
AccessShareLock | t | 12714 | | p247_website_1_1_0
| account
AccessShareLock | t | 12714 | | p247_website_1_1_0
| nodetype
AccessShareLock | t | 12726 | | p247_website_1_1_0
| nodetype
AccessShareLock | t | 12714 | | p247_website_1_1_0
| upd_templatefile
RowExclusiveLock | t | 12726 | | p247_website_1_1_0
| upd_template
AccessShareLock | t | 12726 | | p247_website_1_1_0
| upd_template
RowExclusiveLock | t | 12714 | | p247_website_1_1_0
| upd_template
AccessShareLock | t | 12714 | | p247_website_1_1_0
| upd_template
AccessShareLock | t | 12714 | | p247_website_1_1_0
| upd_pagevalue
AccessShareLock | t | 12714 | | p247_website_1_1_0
| descr_upd_page
RowExclusiveLock | t | 12714 | | p247_website_1_1_0
| upd_page
AccessShareLock | t | 12714 | | p247_website_1_1_0
| upd_page
RowExclusiveLock | t | 12726 | | p247_website_1_1_0
| upd_page
AccessShareLock | t | 12726 | | p247_website_1_1_0
| upd_page
RowExclusiveLock | t | 12714 | | p247_website_1_1_0
| upd_dependency
AccessShareLock | t | 12714 | | p247_website_1_1_0
| upd_dependency
RowExclusiveLock | t | 12726 | | p247_website_1_1_0
| node
AccessShareLock | t | 12726 | | p247_website_1_1_0
| node
RowExclusiveLock | t | 12714 | | p247_website_1_1_0
| node
RowShareLock | t | 12714 | | p247_website_1_1_0
| node
AccessShareLock | t | 12714 | | p247_website_1_1_0
| node
AccessShareLock | t | 12714 | | p247_website_1_1_0
| descr_node_en
(40 rows)
I expected deadlocks to be reported by PostgreSQL after the configured
timeout with an error message, but this one is just hanging up all
clients.
We are running PostgreSQL 7.3.6. Client software is written in Perl
using the DBI interface.
I would really appreciate any comments where to search for the problem.
Matthias Schmitt
</x-tad-smaller></fontfamily>
<fontfamily><param>Courier</param>magic moving pixel s.a. Phone:
+352 54 75 75 - 0
Technoport Schlassgoart Fax : +352 54 75 75 - 54
66, rue de Luxembourg URL : http://www.mmp.lu
L-4221 Esch-sur-Alzette
</fontfamily>
Hello,
I am in trouble with more and more deadlock problems. We are
programming a web application with multiple users editing content at
the same time. Multiple times a day PostgreSQL runs into a deadlock,
which can only be resolved by killing some of the clients. Here is an
example from the pg_locks table:
SELECT l.mode, l.granted, l.pid, l.transaction, d.datname, r.relname
FROM pg_locks l, pg_database d, pg_class r WHERE d.oid=l.database AND
r.oid=l.relation;
mode | granted | pid | transaction | datname
| relname
------------------+---------+-------+-------------+--------------------
+--------------------------
AccessShareLock | t | 12708 | | p247_website_1_1_0
| pg_locks
AccessShareLock | t | 12708 | | p247_website_1_1_0
| pg_class
AccessShareLock | t | 12714 | | p247_website_1_1_0
| upd_dependency_id
AccessShareLock | t | 12714 | | p247_website_1_1_0
| preset
AccessShareLock | t | 12726 | | p247_website_1_1_0
| preset
AccessShareLock | t | 12714 | | p247_website_1_1_0
| file
AccessShareLock | t | 12726 | | p247_website_1_1_0
| node_pkey
AccessShareLock | t | 12714 | | p247_website_1_1_0
| descr_node
AccessShareLock | t | 12726 | | p247_website_1_1_0
| systemuser_usergroup_rel
AccessShareLock | t | 12726 | | p247_website_1_1_0
| permgroup_permission_rel
AccessShareLock | t | 12726 | | p247_website_1_1_0
| usergroup_permgroup_rel
AccessShareLock | t | 12726 | | p247_website_1_1_0
| account_permission_grant
AccessShareLock | t | 12726 | | p247_website_1_1_0
| permission
AccessShareLock | t | 12726 | | p247_website_1_1_0
| account_permgroup_grant
AccessShareLock | t | 12714 | | p247_website_1_1_0
| environment
AccessShareLock | t | 12726 | | p247_website_1_1_0
| environment
AccessShareLock | t | 12726 | | p247_website_1_1_0
| systemuser
AccessShareLock | t | 12726 | | p247_website_1_1_0
| account
AccessShareLock | t | 12714 | | p247_website_1_1_0
| account
AccessShareLock | t | 12714 | | p247_website_1_1_0
| nodetype
AccessShareLock | t | 12726 | | p247_website_1_1_0
| nodetype
AccessShareLock | t | 12714 | | p247_website_1_1_0
| upd_templatefile
RowExclusiveLock | t | 12726 | | p247_website_1_1_0
| upd_template
AccessShareLock | t | 12726 | | p247_website_1_1_0
| upd_template
RowExclusiveLock | t | 12714 | | p247_website_1_1_0
| upd_template
AccessShareLock | t | 12714 | | p247_website_1_1_0
| upd_template
AccessShareLock | t | 12714 | | p247_website_1_1_0
| upd_pagevalue
AccessShareLock | t | 12714 | | p247_website_1_1_0
| descr_upd_page
RowExclusiveLock | t | 12714 | | p247_website_1_1_0
| upd_page
AccessShareLock | t | 12714 | | p247_website_1_1_0
| upd_page
RowExclusiveLock | t | 12726 | | p247_website_1_1_0
| upd_page
AccessShareLock | t | 12726 | | p247_website_1_1_0
| upd_page
RowExclusiveLock | t | 12714 | | p247_website_1_1_0
| upd_dependency
AccessShareLock | t | 12714 | | p247_website_1_1_0
| upd_dependency
RowExclusiveLock | t | 12726 | | p247_website_1_1_0
| node
AccessShareLock | t | 12726 | | p247_website_1_1_0
| node
RowExclusiveLock | t | 12714 | | p247_website_1_1_0
| node
RowShareLock | t | 12714 | | p247_website_1_1_0
| node
AccessShareLock | t | 12714 | | p247_website_1_1_0
| node
AccessShareLock | t | 12714 | | p247_website_1_1_0
| descr_node_en
(40 rows)
I expected deadlocks to be reported by PostgreSQL after the configured
timeout with an error message, but this one is just hanging up all
clients.
We are running PostgreSQL 7.3.6. Client software is written in Perl
using the DBI interface.
I would really appreciate any comments where to search for the problem.
Matthias Schmitt
magic moving pixel s.a. Phone: +352 54 75 75 - 0
Technoport Schlassgoart Fax : +352 54 75 75 - 54
66, rue de Luxembourg URL : http://www.mmp.lu
L-4221 Esch-sur-Alzette
Matthias Schmitt <freak002@mmp.lu> writes:
> I am in trouble with more and more deadlock problems. We are
> programming a web application with multiple users editing content at
> the same time. Multiple times a day PostgreSQL runs into a deadlock,
> which can only be resolved by killing some of the clients. Here is an
> example from the pg_locks table:
All of the rows you showed us have granted=t. No blockage is evident,
let alone any deadlock.
regards, tom lane
<excerpt><fontfamily><param>Courier</param><x-tad-smaller>Matthias
Schmitt <<freak002@mmp.lu> writes:
</x-tad-smaller><excerpt><x-tad-smaller>I am in trouble with more and
more deadlock problems. We are
programming a web application with multiple users editing content at
the same time. Multiple times a day PostgreSQL runs into a deadlock,
which can only be resolved by killing some of the clients. Here is an
example from the pg_locks table:
</x-tad-smaller></excerpt><x-tad-smaller>
All of the rows you showed us have granted=t. No blockage is evident,
let alone any deadlock.
</x-tad-smaller></fontfamily></excerpt><fontfamily><param>Courier</param><x-tad-smaller>
Hello,
we tried to reduce the possible error sources. So we logged the last
statements sent to the database and were able to reproduce our problem
with psql alone. I did the following in two psql shell environments:
shell no 1:
CREATE TABLE the_test (
id int4 PRIMARY KEY,
name varchar(32)
);
insert into the_test values (1, 'hello world');
begin;
update the_test set name = 'still alive' where id = 1;
To keep the transaction open I did not issue any commit or rollback
command.
shell no 2:
</x-tad-smaller><x-tad-smaller>begin;
update the_test set name = 'still alive' where id = 1;
</x-tad-smaller><x-tad-smaller>
The second shell hangs now forever. The pg_locks table shows:
select * from pg_locks;
relation | database | transaction | pid | mode |
granted
----------+----------+-------------+-------+------------------+---------
1980976 | 1980969 | | 16034 | AccessShareLock | t
16757 | 1 | | 16100 | AccessShareLock | t
| | 762472 | 16036 | ExclusiveLock | t
| | 762473 | 16034 | ExclusiveLock | t
| | 762472 | 16034 | ShareLock | f
1980974 | 1980969 | | 16036 | AccessShareLock | t
1980974 | 1980969 | | 16036 | RowExclusiveLock | t
| | 762478 | 16100 | ExclusiveLock | t
1980974 | 1980969 | | 16034 | AccessShareLock | t
1980974 | 1980969 | | 16034 | RowExclusiveLock | t
(10 rows)
In our applications it is possible that multiple records of different
tables are updated in different sequences, depending on the task to
fulfill. Shouldn't a time-out error resolve those problems?
Thank you.
Matthias Schmitt
</x-tad-smaller><x-tad-smaller>
</x-tad-smaller><x-tad-smaller>magic moving pixel s.a. Phone: +352
54 75 75 - 0
Technoport Schlassgoart Fax : +352 54 75 75 - 54
66, rue de Luxembourg URL : http://www.mmp.lu
L-4221 Esch-sur-Alzette
</x-tad-smaller><x-tad-smaller>
</x-tad-smaller></fontfamily>
> Matthias Schmitt <freak002@mmp.lu> writes:
>> I am in trouble with more and more deadlock problems. We are
>> programming a web application with multiple users editing content at
>> the same time. Multiple times a day PostgreSQL runs into a deadlock,
>> which can only be resolved by killing some of the clients. Here is an
>> example from the pg_locks table:
>
> All of the rows you showed us have granted=t. No blockage is evident,
> let alone any deadlock.
Hello,
we tried to reduce the possible error sources. So we logged the last
statements sent to the database and were able to reproduce our problem
with psql alone. I did the following in two psql shell environments:
shell no 1:
CREATE TABLE the_test (
id int4 PRIMARY KEY,
name varchar(32)
);
insert into the_test values (1, 'hello world');
begin;
update the_test set name = 'still alive' where id = 1;
To keep the transaction open I did not issue any commit or rollback
command.
shell no 2:
begin;
update the_test set name = 'still alive' where id = 1;
The second shell hangs now forever. The pg_locks table shows:
select * from pg_locks;
relation | database | transaction | pid | mode | granted
----------+----------+-------------+-------+------------------+---------
1980976 | 1980969 | | 16034 | AccessShareLock | t
16757 | 1 | | 16100 | AccessShareLock | t
| | 762472 | 16036 | ExclusiveLock | t
| | 762473 | 16034 | ExclusiveLock | t
| | 762472 | 16034 | ShareLock | f
1980974 | 1980969 | | 16036 | AccessShareLock | t
1980974 | 1980969 | | 16036 | RowExclusiveLock | t
| | 762478 | 16100 | ExclusiveLock | t
1980974 | 1980969 | | 16034 | AccessShareLock | t
1980974 | 1980969 | | 16034 | RowExclusiveLock | t
(10 rows)
In our applications it is possible that multiple records of different
tables are updated in different sequences, depending on the task to
fulfill. Shouldn't a time-out error resolve those problems?
Thank you.
Matthias Schmitt
magic moving pixel s.a. Phone: +352 54 75 75 - 0
Technoport Schlassgoart Fax : +352 54 75 75 - 54
66, rue de Luxembourg URL : http://www.mmp.lu
L-4221 Esch-sur-Alzette
On Tue, Mar 16, 2004 at 09:50:43AM +0100, Matthias Schmitt wrote:
> begin;
> update the_test set name = 'still alive' where id = 1;
>
> To keep the transaction open I did not issue any commit or rollback
> command.
>
> shell no 2:
>
> begin;
> update the_test set name = 'still alive' where id = 1;
>
> The second shell hangs now forever. The pg_locks table shows:
First, you haven't tried to COMMIT anywhere. Nothing will happen in
that case. For all you know, one of these is going to ROLLBACK and
resolve the lock on its own. The second one you issue is just going
to sit there, sure.
You actually haven't describe a deadlock here. This is just a
straightforward wait-and-see lock for transaction 2: that update
statement will indeed wait forever until it sees what 1 has done.
You'd only get a deadlock in case transaction 2 first did something
that T1 _next_ had to depend on. You really need two conflicting
locks for a deadlock to happen.
A
--
Andrew Sullivan | ajs@crankycanuck.ca
The fact that technology doesn't work is no bar to success in the marketplace.
--Philip Greenspun
Matthias Schmitt <freak002@mmp.lu> writes:
> I did the following in two psql shell environments:
> shell no 1:
> CREATE TABLE the_test (
> id int4 PRIMARY KEY,
> name varchar(32)
> );
> insert into the_test values (1, 'hello world');
> begin;
> update the_test set name = 'still alive' where id = 1;
> To keep the transaction open I did not issue any commit or rollback
> command.
> shell no 2:
> begin;
> update the_test set name = 'still alive' where id = 1;
> The second shell hangs now forever.
Well, of course. It has to wait to see if the previous update of the
row commits or not, so that it knows which version of the row to start
from. (In this trivial case it doesn't really matter, but in more
complex cases such as where different fields are being updated, it
does.)
This is *not* a deadlock, however, as transaction 1 is free to make
progress. The fact that you've got a client holding an open transaction
and not doing anything is a client-side design error, not a deadlock.
> Shouldn't a time-out error resolve those problems?
Sure, and it works fine:
regression=# set statement_timeout TO 10000;
SET
regression=# begin;
BEGIN
regression=# update the_test set name = 'still alive' where id = 1;
-- about ten seconds elapse, then:
ERROR: canceling query due to user request
regression=#
regards, tom lane