Обсуждение: BUG #13523: Unexplained deadlocks (possible race condition)
The following bug has been logged on the website: Bug reference: 13523 Logged by: Jack Douglas Email address: jack@douglastechnology.co.uk PostgreSQL version: 9.4.3 Operating system: Debian Jessie Description: Hi I'm getting intermittent (but easily reproducible) deadlocks showing in my error log, for example: 2015-07-28 16:46:19 BST ERROR: deadlock detected 2015-07-28 16:46:19 BST DETAIL: Process 9394 waits for ExclusiveLock on relation 65605 of database 12141; blocked by process 9393. Process 9393 waits for ExclusiveLock on relation 65605 of database 12141; blocked by process 9394. Process 9394: select f_test('blah') Process 9393: select f_test('blah') 2015-07-28 16:46:19 BST HINT: See server log for query details. 2015-07-28 16:46:19 BST CONTEXT: SQL function "f_test" statement 1 2015-07-28 16:46:19 BST STATEMENT: select f_test('blah') The above is from a minimal test case I've attempted to create rather than my production code. The test case is as follows: create table test( id serial primary key, val text ); create function f_test(v text) returns integer language sql security definer set search_path = postgres,pg_temp as $$ lock test in exclusive mode; insert into test(val) select v where not exists(select * from test where val=v); select id from test where val=v; $$; then from two bash sessions simultaneously: for i in {1..1000}; do psql postgres postgres -c "select f_test('blah')"; done This produces a handful of deadlocks on my server, however using a VM on a laptop another postgres user said he got far more. More detail is on the question I posted at DBA.SE (and it's probably easier to read the code there: http://dba.stackexchange.com/q/108290/1396 Kind regards Jack
jack@douglastechnology.co.uk writes: > I'm getting intermittent (but easily reproducible) deadlocks showing in my > error log, for example: > The above is from a minimal test case I've attempted to create rather than > my production code. The test case is as follows: > create table test( id serial primary key, val text ); > create function f_test(v text) returns integer language sql security definer > set search_path = postgres,pg_temp as $$ > lock test in exclusive mode; > insert into test(val) select v where not exists(select * from test where > val=v); > select id from test where val=v; > $$; I believe the issue with this is that a SQL function will do parsing (and maybe planning too; don't feel like checking the code right now) for the entire function body at once. This means that due to the INSERT command you acquire RowExclusiveLock on the "test" table during function body parsing, before the LOCK command actually executes. So the LOCK represents a lock escalation attempt, and deadlocks are to be expected. This coding technique would be safe in plpgsql, but not in a SQL-language function. There have been discussions of reimplementing SQL-language functions so that parsing occurs one statement at a time, but don't hold your breath about something happening in that direction; it doesn't seem to be a high priority concern for anybody. regards, tom lane
> I believe the issue with this is that a SQL function will do parsing (and maybe planning too; don't feel like checking the code right now) for the entire function body at once. This means that due to the INSERT command you acquire RowExclusiveLock on the "test" table during function body parsing, before the LOCK command actually executes. So the LOCK represents a lock escalation attempt, and deadlocks are to be expected. That makes perfect sense, many thanks for the explanation. > This coding technique would be safe in plpgsql, but not in a SQL-language function. That's useful to know - I already worked around the issue with a retry-loop (as I'm basically doing an upsert the 'bad' way with `lock table` - roll on 9.5!) Kind regards Jack
On 2015-07-28 16:28:23 +0000, jack@douglastechnology.co.uk wrote: > 2015-07-28 16:46:19 BST ERROR: deadlock detected > 2015-07-28 16:46:19 BST DETAIL: Process 9394 waits for ExclusiveLock on > relation 65605 of database 12141; blocked by process 9393. > Process 9393 waits for ExclusiveLock on relation 65605 of database > 12141; blocked by process 9394. > Process 9394: select f_test('blah') > Process 9393: select f_test('blah') > 2015-07-28 16:46:19 BST HINT: See server log for query details. > 2015-07-28 16:46:19 BST CONTEXT: SQL function "f_test" statement 1 > 2015-07-28 16:46:19 BST STATEMENT: select f_test('blah') I am wondering if the deadlock reports would be a bit easier to debug if we didn't just say "blocked by process 9394", but also in which lockmode it's held currently. Greetings, Andres Freund
> There have been discussions of reimplementing SQL-language functions so that parsing occurs one statement at a time, but don't hold your breath about something happening in that direction; it doesn't seem to be a high priority concern for anybody. Perhaps it is worth considering a less drastic change if that is not on the cards in the immediate future? If parsing the INSERT aquires the RowExclusiveLock, perhaps parsing the LOCK statement should also aquire the lock? That would mean the following principle in the documentation ("...The best defense against deadlocks is generally to avoid them by being certain that all applications using a database acquire locks on multiple objects in a consistent order...", http://www.postgresql.org/docs/9.4/static/explicit-locking.html#LOCKING-DEAD LOCKS) would be possible (or at least more easily understood) when using SQL-language functions.
On 2015-07-30 09:23:10 +0100, Jack Douglas wrote: > If parsing the INSERT aquires the RowExclusiveLock, perhaps parsing the LOCK > statement should also aquire the lock? That would mean the following > principle in the documentation ("...The best defense against deadlocks is > generally to avoid them by being certain that all applications using a > database acquire locks on multiple objects in a consistent order...", > http://www.postgresql.org/docs/9.4/static/explicit-locking.html#LOCKING-DEAD > LOCKS) would be possible (or at least more easily understood) when using > SQL-language functions. I don't think that'd help at all? The problem here is the lock upgrade from RowExclusiveLock to the exclusive lock, and that'll not be fixed by that proposal? Regards, Andres
> I am wondering if the deadlock reports would be a bit easier to debug if we didn't just say "blocked by process 9394", but also in which lockmode it's held currently. That would certainly help, something like this would have been a big clue in this case: ... relation 65605 of database 12141; blocked by RowExclusiveLock held by process 9393. ... is that what you mean?
> I don't think that'd help at all? The problem here is the lock upgrade from RowExclusiveLock to the exclusive lock, and that'll not be fixed by that proposal? The problem is that the RowExclusiveLock is being aquired in one session before the Exclusive lock, even though the LOCK TABLE statement is physically first in the SQL function. Because the locks are being acquired out-of-order, deadlocks become possible as another session tries to escalate the lock and waits, then the first session tries to escalate it's own lock and deadlocks. Normally this is prevented by acquiring the most restrictive lock first, but with a SQL-language function (unlike plpgsql for example) this is not possible. This is how I understand Tom's initial reply, is that not right?