Обсуждение: ALTER TABLE deadlock with concurrent INSERT
I'm working with a client on an application upgrade script which executes a function to conditionally do an: ALTER TABLE foo ALTER COLUMN bar SET DATA TYPE baz If this is run while the application is concurrently doing inserts into foo, we are occasionally seeing deadlocks. Aside from the fact that they are better off not altering the table amid concurrent inserts, I'm trying to understand why this is even able to happen. I expect one to block the other, not a deadlock. This is 8.4.1 (I know, I know, I have advised strongly that they upgrade to 8.4.latest). We have not been able to repeat this forcibly. Here is what the log shows: ------------------------------ 2011-02-25 14:38:07 PST [31686]: [1-1] ERROR: deadlock detected 2011-02-25 14:38:07 PST [31686]: [2-1] DETAIL: Process 31686 waits for AccessExclusiveLock on relation 16896 of database 16386; blocked by process 31634. Process 31634 waits for RowExclusiveLock on relation 16902 of database 16386; blocked by process 31686. Process 31686: SELECT change_column_type('attribute_summary', 'sequence_number', 'numeric'); Process 31634: insert into attribute_summary (attribute_value, sequence_number, attribute_id) values ($1, $2, $3) 2011-02-25 14:38:07 PST [31686]: [3-1] HINT: See server log for query details. 2011-02-25 14:38:07 PST [31686]: [4-1] CONTEXT: SQL statement "ALTER TABLE attribute_summary ALTER COLUMN sequence_number SET DATA TYPE numeric" PL/pgSQL function "change_column_type"line 18 at EXECUTE statement 2011-02-25 14:38:07 PST [31686]: [5-1] STATEMENT: SELECT change_column_type('attribute_summary', 'sequence_number', 'numeric'); ------------------------------ Reviewing the release notes, I see some marginally related commits, but nothing that jumps out to me as a specific fix. Thoughts? Thanks, Joe -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support
Joe Conway <mail@joeconway.com> writes: > I'm working with a client on an application upgrade script which > executes a function to conditionally do an: > ALTER TABLE foo ALTER COLUMN bar SET DATA TYPE baz > If this is run while the application is concurrently doing inserts into > foo, we are occasionally seeing deadlocks. Aside from the fact that they > are better off not altering the table amid concurrent inserts, I'm > trying to understand why this is even able to happen. I expect one to > block the other, not a deadlock. Looks like the process trying to do the ALTER has already got some lower-level lock on the table. It evidently hasn't got AccessExclusiveLock, but nonetheless has something strong enough to block an INSERT, such as ShareLock. regards, tom lane
On 03/02/2011 12:41 PM, Tom Lane wrote: > Looks like the process trying to do the ALTER has already got some > lower-level lock on the table. It evidently hasn't got > AccessExclusiveLock, but nonetheless has something strong enough to > block an INSERT, such as ShareLock. Hmmm, is it possible that the following might do that, whereas a simple ALTER TABLE would not? 8<----------------------------------- BEGIN; CREATE OR REPLACE FUNCTION change_column_type ( tablename text, columnname text, newtype text ) RETURNS text AS $$ DECLARE newtypeid oid; tableoid oid; curtypeid oid; BEGIN SELECT INTO newtypeid oid FROMpg_type WHERE oid = newtype::regtype::oid; SELECT INTO tableoid oid FROM pg_classWHERE relname = tablename; IF NOT FOUND THEN RETURN 'TABLE NOT FOUND'; END IF; SELECT INTO curtypeid atttypid FROM pg_attribute WHERE attrelid = tableoid AND attname::text = columnname; IF NOT FOUND THEN RETURN 'COLUMN NOT FOUND'; END IF; IF curtypeid != newtypeid THEN EXECUTE 'ALTER TABLE ' || tablename || ' ALTER COLUMN ' || columnname || ' SET DATA TYPE ' || newtype; RETURN 'CHANGE SUCCESSFUL'; ELSE RETURN 'CHANGE SKIPPED'; END IF;EXCEPTION WHEN undefined_object THEN RETURN 'INVALID TARGET TYPE'; END; $$ LANGUAGE plpgsql; SELECT change_column_type('attribute_summary', 'sequence_number', 'numeric'); COMMIT; 8<----------------------------------- This text is in a file being run from a shell script with something like: psql dbname < script.sql The concurrent INSERTs are being done by the main application code (running on Tomcat). Joe -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support
On Mar 2, 2011, at 2:54 PM, Joe Conway wrote: > On 03/02/2011 12:41 PM, Tom Lane wrote: >> Looks like the process trying to do the ALTER has already got some >> lower-level lock on the table. It evidently hasn't got >> AccessExclusiveLock, but nonetheless has something strong enough to >> block an INSERT, such as ShareLock. > > Hmmm, is it possible that the following might do that, whereas a simple > ALTER TABLE would not? Impossible to tell without seeing what's in the script... ie: if the script was BEGIN; -- Do something to that table that blocks inserts SELECT change_column_type(...); COMMIT; You'd get a deadlock. The script also has several race conditions: - Someone could drop the table after you query pg_class - Someone could alter/drop the column after you query pg_attribute My suggestion would be to try to grab an exclusive lock on the table as the first line in the function (and then don't doanything cute in the declare section, such as use tablename::regprocedure). Speaking of which, I would recommend using the regprocedure and regtype casts instead of querying the catalog directly; thatway you have working schema support and you're immune from future catalog changes. Unfortunately you'll still have todo things the hard way to find the column (unless we added regcolumn post 8.3), but you might want to use information_schema,or at least see what it's doing there. The query *technically* should include WHERE attnum > 0 (maybe>=) AND NOT attisdropped, though it's probably not a big deal that it isn't since ALTER TABLE will save your baconthere (though, I'd include a comment to that effect to protect anyone who decides to blindly cut and paste that querysomewhere else where it does matter...). > 8<----------------------------------- > BEGIN; > > CREATE OR REPLACE FUNCTION change_column_type > ( > tablename text, > columnname text, > newtype text > ) RETURNS text AS $$ > DECLARE > newtypeid oid; > tableoid oid; > curtypeid oid; > BEGIN > SELECT INTO newtypeid oid FROM pg_type WHERE oid = > newtype::regtype::oid; > SELECT INTO tableoid oid FROM pg_class WHERE relname = tablename; > IF NOT FOUND THEN > RETURN 'TABLE NOT FOUND'; > END IF; > > SELECT INTO curtypeid atttypid FROM pg_attribute WHERE > attrelid = tableoid AND attname::text = columnname; > IF NOT FOUND THEN > RETURN 'COLUMN NOT FOUND'; > END IF; > > IF curtypeid != newtypeid THEN > EXECUTE 'ALTER TABLE ' || tablename || ' ALTER COLUMN ' || > columnname || ' SET DATA TYPE ' || newtype; > RETURN 'CHANGE SUCCESSFUL'; > ELSE > RETURN 'CHANGE SKIPPED'; > END IF; > EXCEPTION > WHEN undefined_object THEN > RETURN 'INVALID TARGET TYPE'; > END; > $$ LANGUAGE plpgsql; > > SELECT change_column_type('attribute_summary', > 'sequence_number', > 'numeric'); > > COMMIT; > 8<----------------------------------- > > This text is in a file being run from a shell script with something like: > > psql dbname < script.sql > > The concurrent INSERTs are being done by the main application code > (running on Tomcat). > > Joe > > -- > Joe Conway > credativ LLC: http://www.credativ.us > Linux, PostgreSQL, and general Open Source > Training, Service, Consulting, & 24x7 Support > -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On 03/03/2011 03:49 PM, Jim Nasby wrote: > On Mar 2, 2011, at 2:54 PM, Joe Conway wrote: >> On 03/02/2011 12:41 PM, Tom Lane wrote: >>> Looks like the process trying to do the ALTER has already got some >>> lower-level lock on the table. It evidently hasn't got >>> AccessExclusiveLock, but nonetheless has something strong enough to >>> block an INSERT, such as ShareLock. >> >> Hmmm, is it possible that the following might do that, whereas a simple >> ALTER TABLE would not? > > Impossible to tell without seeing what's in the script... ie: if the script was > > BEGIN; > -- Do something to that table that blocks inserts > SELECT change_column_type(...); > COMMIT; > > You'd get a deadlock. The script was exactly the one posted, i.e. BEGIN; CREATE FUNCTION change_column_type(...); SELECT change_column_type(...); COMMIT; That's all there is to it. And the function itself has no specific reference to the table being altered. That's why I'm left scratching my head ;-) Joe -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support
On Mar 3, 2011, at 6:26 PM, Joe Conway wrote: > On 03/03/2011 03:49 PM, Jim Nasby wrote: >> On Mar 2, 2011, at 2:54 PM, Joe Conway wrote: >>> On 03/02/2011 12:41 PM, Tom Lane wrote: >>>> Looks like the process trying to do the ALTER has already got some >>>> lower-level lock on the table. It evidently hasn't got >>>> AccessExclusiveLock, but nonetheless has something strong enough to >>>> block an INSERT, such as ShareLock. >>> >>> Hmmm, is it possible that the following might do that, whereas a simple >>> ALTER TABLE would not? >> >> Impossible to tell without seeing what's in the script... ie: if the script was >> >> BEGIN; >> -- Do something to that table that blocks inserts >> SELECT change_column_type(...); >> COMMIT; >> >> You'd get a deadlock. > > The script was exactly the one posted, i.e. > BEGIN; > CREATE FUNCTION change_column_type(...); > SELECT change_column_type(...); > COMMIT; > > That's all there is to it. And the function itself has no specific > reference to the table being altered. That's why I'm left scratching my > head ;-) I suggest grabbing a snapshot of pg_locks for the connection that's creating the function, and then do the same for the insertand see what could potentially conflict... -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Wed, Mar 02, 2011 at 12:25:16PM -0800, Joe Conway wrote: > I'm working with a client on an application upgrade script which > executes a function to conditionally do an: > > ALTER TABLE foo ALTER COLUMN bar SET DATA TYPE baz > > If this is run while the application is concurrently doing inserts into > foo, we are occasionally seeing deadlocks. Aside from the fact that they > are better off not altering the table amid concurrent inserts, I'm > trying to understand why this is even able to happen. I expect one to > block the other, not a deadlock. > > This is 8.4.1 (I know, I know, I have advised strongly that they upgrade > to 8.4.latest). > > We have not been able to repeat this forcibly. Here is what the log shows: > ------------------------------ > 2011-02-25 14:38:07 PST [31686]: [1-1] ERROR: deadlock detected > 2011-02-25 14:38:07 PST [31686]: [2-1] DETAIL: Process 31686 waits for > AccessExclusiveLock on relation 16896 of database 16386; blocked by > process 31634. > Process 31634 waits for RowExclusiveLock on relation 16902 of > database 16386; blocked by process 31686. > Process 31686: SELECT change_column_type('attribute_summary', > 'sequence_number', 'numeric'); > Process 31634: insert into attribute_summary (attribute_value, > sequence_number, attribute_id) values ($1, $2, $3) > 2011-02-25 14:38:07 PST [31686]: [3-1] HINT: See server log for query > details. > 2011-02-25 14:38:07 PST [31686]: [4-1] CONTEXT: SQL statement "ALTER > TABLE attribute_summary ALTER COLUMN sequence_number SET DATA TYPE numeric" > PL/pgSQL function "change_column_type" line 18 at EXECUTE statement > 2011-02-25 14:38:07 PST [31686]: [5-1] STATEMENT: SELECT > change_column_type('attribute_summary', 'sequence_number', 'numeric'); > ------------------------------ Does relation 16902 (attribute_summary) have a foreign key constraint over the sequence_number column, in either direction, with relation 16896? That would explain it: session 1: ALTER TABLE attribute_summary ... <sleeps after relation_openrv in transformAlterTableStmt> session 2: SELECT 1 FROM rel16896 LIMIT 0; session 2: SELECT 1 FROM attribute_summary LIMIT 0; <blocks> session 1: <wakes up; continues ALTER TABLE: deadlock upon locking rel16896> Off the cuff, I think you could make sure this never deadlocks with a PL/pgSQL recipe like this: LOOPBEGIN LOCK TABLE rel16896; LOCK TABLE attribute_summary NOWAIT; EXIT;EXCEPTION WHEN lock_not_available THENEND; END LOOP; Granted, the cure may be worse than the disease. nm
On 03/03/2011 11:36 PM, Noah Misch wrote: > Does relation 16902 (attribute_summary) have a foreign key constraint over the > sequence_number column, in either direction, with relation 16896? That would > explain it: > > session 1: ALTER TABLE attribute_summary ... <sleeps after relation_openrv in transformAlterTableStmt> > session 2: SELECT 1 FROM rel16896 LIMIT 0; > session 2: SELECT 1 FROM attribute_summary LIMIT 0; <blocks> > session 1: <wakes up; continues ALTER TABLE: deadlock upon locking rel16896> Ah, OK -- then that would explain it as there are foreign keys on that column. Thanks, hadn't thought about that aspect. > Granted, the cure may be worse than the disease. Right. I've already advised they shut down the application during the alter table, which they can do (and in fact already do -- they were restarting the application just prior to this step, which really makes no sense anyway). Thanks, Joe -- Joe Conway credativ LLC: http://www.credativ.us Linux, PostgreSQL, and general Open Source Training, Service, Consulting, & 24x7 Support