Обсуждение: ALTER TABLE deadlock with concurrent INSERT

Поиск
Список
Период
Сортировка

ALTER TABLE deadlock with concurrent INSERT

От
Joe Conway
Дата:
I'm working with a client on an application upgrade script which
executes a function to conditionally do an:
 ALTER TABLE foo ALTER COLUMN bar SET DATA TYPE baz

If this is run while the application is concurrently doing inserts into
foo, we are occasionally seeing deadlocks. Aside from the fact that they
are better off not altering the table amid concurrent inserts, I'm
trying to understand why this is even able to happen. I expect one to
block the other, not a deadlock.

This is 8.4.1 (I know, I know, I have advised strongly that they upgrade
to 8.4.latest).

We have not been able to repeat this forcibly. Here is what the log shows:
------------------------------
2011-02-25 14:38:07 PST [31686]: [1-1] ERROR:  deadlock detected
2011-02-25 14:38:07 PST [31686]: [2-1] DETAIL:  Process 31686 waits for
AccessExclusiveLock on relation 16896 of database 16386; blocked by
process 31634.       Process 31634 waits for RowExclusiveLock on relation 16902 of
database 16386; blocked by process 31686.       Process 31686: SELECT change_column_type('attribute_summary',
'sequence_number', 'numeric');       Process 31634: insert into attribute_summary (attribute_value,
sequence_number, attribute_id) values ($1, $2, $3)
2011-02-25 14:38:07 PST [31686]: [3-1] HINT:  See server log for query
details.
2011-02-25 14:38:07 PST [31686]: [4-1] CONTEXT:  SQL statement "ALTER
TABLE attribute_summary ALTER COLUMN sequence_number SET DATA TYPE numeric"       PL/pgSQL function
"change_column_type"line 18 at EXECUTE statement 
2011-02-25 14:38:07 PST [31686]: [5-1] STATEMENT:  SELECT
change_column_type('attribute_summary', 'sequence_number', 'numeric');
------------------------------

Reviewing the release notes, I see some marginally related commits, but
nothing that jumps out to me as a specific fix. Thoughts?

Thanks,

Joe


--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support


Re: ALTER TABLE deadlock with concurrent INSERT

От
Tom Lane
Дата:
Joe Conway <mail@joeconway.com> writes:
> I'm working with a client on an application upgrade script which
> executes a function to conditionally do an:

>   ALTER TABLE foo ALTER COLUMN bar SET DATA TYPE baz

> If this is run while the application is concurrently doing inserts into
> foo, we are occasionally seeing deadlocks. Aside from the fact that they
> are better off not altering the table amid concurrent inserts, I'm
> trying to understand why this is even able to happen. I expect one to
> block the other, not a deadlock.

Looks like the process trying to do the ALTER has already got some
lower-level lock on the table.  It evidently hasn't got
AccessExclusiveLock, but nonetheless has something strong enough to
block an INSERT, such as ShareLock.
        regards, tom lane


Re: ALTER TABLE deadlock with concurrent INSERT

От
Joe Conway
Дата:
On 03/02/2011 12:41 PM, Tom Lane wrote:
> Looks like the process trying to do the ALTER has already got some
> lower-level lock on the table.  It evidently hasn't got
> AccessExclusiveLock, but nonetheless has something strong enough to
> block an INSERT, such as ShareLock.

Hmmm, is it possible that the following might do that, whereas a simple
ALTER TABLE would not?

8<-----------------------------------
BEGIN;

CREATE OR REPLACE FUNCTION change_column_type
( tablename text, columnname text, newtype text
) RETURNS text AS $$ DECLARE   newtypeid   oid;   tableoid    oid;   curtypeid   oid; BEGIN   SELECT INTO newtypeid oid
FROMpg_type WHERE oid =                                     newtype::regtype::oid;   SELECT INTO tableoid oid FROM
pg_classWHERE relname = tablename;   IF NOT FOUND THEN     RETURN 'TABLE NOT FOUND';   END IF; 
   SELECT INTO curtypeid atttypid FROM pg_attribute WHERE                 attrelid = tableoid AND attname::text =
columnname;  IF NOT FOUND THEN     RETURN 'COLUMN NOT FOUND';   END IF; 
   IF curtypeid != newtypeid THEN     EXECUTE 'ALTER TABLE ' || tablename || ' ALTER COLUMN ' ||
columnname || ' SET DATA TYPE ' || newtype;     RETURN 'CHANGE SUCCESSFUL';   ELSE     RETURN 'CHANGE SKIPPED';   END
IF;EXCEPTION   WHEN undefined_object THEN     RETURN 'INVALID TARGET TYPE'; END; 
$$ LANGUAGE plpgsql;

SELECT change_column_type('attribute_summary',                         'sequence_number',
'numeric');

COMMIT;
8<-----------------------------------

This text is in a file being run from a shell script with something like:
 psql dbname < script.sql

The concurrent INSERTs are being done by the main application code
(running on Tomcat).

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support


Re: ALTER TABLE deadlock with concurrent INSERT

От
Jim Nasby
Дата:
On Mar 2, 2011, at 2:54 PM, Joe Conway wrote:
> On 03/02/2011 12:41 PM, Tom Lane wrote:
>> Looks like the process trying to do the ALTER has already got some
>> lower-level lock on the table.  It evidently hasn't got
>> AccessExclusiveLock, but nonetheless has something strong enough to
>> block an INSERT, such as ShareLock.
>
> Hmmm, is it possible that the following might do that, whereas a simple
> ALTER TABLE would not?

Impossible to tell without seeing what's in the script... ie: if the script was

BEGIN;
-- Do something to that table that blocks inserts
SELECT change_column_type(...);
COMMIT;

You'd get a deadlock.

The script also has several race conditions:

- Someone could drop the table after you query pg_class
- Someone could alter/drop the column after you query pg_attribute

My suggestion would be to try to grab an exclusive lock on the table as the first line in the function (and then don't
doanything cute in the declare section, such as use tablename::regprocedure). 

Speaking of which, I would recommend using the regprocedure and regtype casts instead of querying the catalog directly;
thatway you have working schema support and you're immune from future catalog changes. Unfortunately you'll still have
todo things the hard way to find the column (unless we added regcolumn post 8.3), but you might want to use
information_schema,or at least see what it's doing there. The query *technically* should include WHERE attnum > 0
(maybe>=) AND NOT attisdropped, though it's probably not a big deal that it isn't since ALTER TABLE will save your
baconthere (though, I'd include a comment to that effect to protect anyone who decides to blindly cut and paste that
querysomewhere else where it does matter...). 

> 8<-----------------------------------
> BEGIN;
>
> CREATE OR REPLACE FUNCTION change_column_type
> (
>  tablename text,
>  columnname text,
>  newtype text
> ) RETURNS text AS $$
>  DECLARE
>    newtypeid   oid;
>    tableoid    oid;
>    curtypeid   oid;
>  BEGIN
>    SELECT INTO newtypeid oid FROM pg_type WHERE oid =
>                                      newtype::regtype::oid;
>    SELECT INTO tableoid oid FROM pg_class WHERE relname = tablename;
>    IF NOT FOUND THEN
>      RETURN 'TABLE NOT FOUND';
>    END IF;
>
>    SELECT INTO curtypeid atttypid FROM pg_attribute WHERE
>                  attrelid = tableoid AND attname::text = columnname;
>    IF NOT FOUND THEN
>      RETURN 'COLUMN NOT FOUND';
>    END IF;
>
>    IF curtypeid != newtypeid THEN
>      EXECUTE 'ALTER TABLE ' || tablename || ' ALTER COLUMN ' ||
>                          columnname || ' SET DATA TYPE ' || newtype;
>      RETURN 'CHANGE SUCCESSFUL';
>    ELSE
>      RETURN 'CHANGE SKIPPED';
>    END IF;
>  EXCEPTION
>    WHEN undefined_object THEN
>      RETURN 'INVALID TARGET TYPE';
>  END;
> $$ LANGUAGE plpgsql;
>
> SELECT change_column_type('attribute_summary',
>                          'sequence_number',
>                          'numeric');
>
> COMMIT;
> 8<-----------------------------------
>
> This text is in a file being run from a shell script with something like:
>
>  psql dbname < script.sql
>
> The concurrent INSERTs are being done by the main application code
> (running on Tomcat).
>
> Joe
>
> --
> Joe Conway
> credativ LLC: http://www.credativ.us
> Linux, PostgreSQL, and general Open Source
> Training, Service, Consulting, & 24x7 Support
>

--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net




Re: ALTER TABLE deadlock with concurrent INSERT

От
Joe Conway
Дата:
On 03/03/2011 03:49 PM, Jim Nasby wrote:
> On Mar 2, 2011, at 2:54 PM, Joe Conway wrote:
>> On 03/02/2011 12:41 PM, Tom Lane wrote:
>>> Looks like the process trying to do the ALTER has already got some
>>> lower-level lock on the table.  It evidently hasn't got
>>> AccessExclusiveLock, but nonetheless has something strong enough to
>>> block an INSERT, such as ShareLock.
>>
>> Hmmm, is it possible that the following might do that, whereas a simple
>> ALTER TABLE would not?
>
> Impossible to tell without seeing what's in the script... ie: if the script was
>
> BEGIN;
> -- Do something to that table that blocks inserts
> SELECT change_column_type(...);
> COMMIT;
>
> You'd get a deadlock.

The script was exactly the one posted, i.e.
BEGIN;
CREATE FUNCTION change_column_type(...);
SELECT change_column_type(...);
COMMIT;

That's all there is to it. And the function itself has no specific
reference to the table being altered. That's why I'm left scratching my
head ;-)

Joe


--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support


Re: ALTER TABLE deadlock with concurrent INSERT

От
Jim Nasby
Дата:
On Mar 3, 2011, at 6:26 PM, Joe Conway wrote:
> On 03/03/2011 03:49 PM, Jim Nasby wrote:
>> On Mar 2, 2011, at 2:54 PM, Joe Conway wrote:
>>> On 03/02/2011 12:41 PM, Tom Lane wrote:
>>>> Looks like the process trying to do the ALTER has already got some
>>>> lower-level lock on the table.  It evidently hasn't got
>>>> AccessExclusiveLock, but nonetheless has something strong enough to
>>>> block an INSERT, such as ShareLock.
>>>
>>> Hmmm, is it possible that the following might do that, whereas a simple
>>> ALTER TABLE would not?
>>
>> Impossible to tell without seeing what's in the script... ie: if the script was
>>
>> BEGIN;
>> -- Do something to that table that blocks inserts
>> SELECT change_column_type(...);
>> COMMIT;
>>
>> You'd get a deadlock.
>
> The script was exactly the one posted, i.e.
> BEGIN;
> CREATE FUNCTION change_column_type(...);
> SELECT change_column_type(...);
> COMMIT;
>
> That's all there is to it. And the function itself has no specific
> reference to the table being altered. That's why I'm left scratching my
> head ;-)

I suggest grabbing a snapshot of pg_locks for the connection that's creating the function, and then do the same for the
insertand see what could potentially conflict... 
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net




Re: ALTER TABLE deadlock with concurrent INSERT

От
Noah Misch
Дата:
On Wed, Mar 02, 2011 at 12:25:16PM -0800, Joe Conway wrote:
> I'm working with a client on an application upgrade script which
> executes a function to conditionally do an:
> 
>   ALTER TABLE foo ALTER COLUMN bar SET DATA TYPE baz
> 
> If this is run while the application is concurrently doing inserts into
> foo, we are occasionally seeing deadlocks. Aside from the fact that they
> are better off not altering the table amid concurrent inserts, I'm
> trying to understand why this is even able to happen. I expect one to
> block the other, not a deadlock.
> 
> This is 8.4.1 (I know, I know, I have advised strongly that they upgrade
> to 8.4.latest).
> 
> We have not been able to repeat this forcibly. Here is what the log shows:
> ------------------------------
> 2011-02-25 14:38:07 PST [31686]: [1-1] ERROR:  deadlock detected
> 2011-02-25 14:38:07 PST [31686]: [2-1] DETAIL:  Process 31686 waits for
> AccessExclusiveLock on relation 16896 of database 16386; blocked by
> process 31634.
>         Process 31634 waits for RowExclusiveLock on relation 16902 of
> database 16386; blocked by process 31686.
>         Process 31686: SELECT change_column_type('attribute_summary',
> 'sequence_number', 'numeric');
>         Process 31634: insert into attribute_summary (attribute_value,
> sequence_number, attribute_id) values ($1, $2, $3)
> 2011-02-25 14:38:07 PST [31686]: [3-1] HINT:  See server log for query
> details.
> 2011-02-25 14:38:07 PST [31686]: [4-1] CONTEXT:  SQL statement "ALTER
> TABLE attribute_summary ALTER COLUMN sequence_number SET DATA TYPE numeric"
>         PL/pgSQL function "change_column_type" line 18 at EXECUTE statement
> 2011-02-25 14:38:07 PST [31686]: [5-1] STATEMENT:  SELECT
> change_column_type('attribute_summary', 'sequence_number', 'numeric');
> ------------------------------

Does relation 16902 (attribute_summary) have a foreign key constraint over the
sequence_number column, in either direction, with relation 16896?  That would
explain it:

session 1: ALTER TABLE attribute_summary ... <sleeps after relation_openrv in transformAlterTableStmt>
session 2: SELECT 1 FROM rel16896 LIMIT 0;
session 2: SELECT 1 FROM attribute_summary LIMIT 0; <blocks>
session 1: <wakes up; continues ALTER TABLE: deadlock upon locking rel16896>

Off the cuff, I think you could make sure this never deadlocks with a PL/pgSQL
recipe like this:

LOOPBEGIN    LOCK TABLE rel16896;    LOCK TABLE attribute_summary NOWAIT;    EXIT;EXCEPTION WHEN lock_not_available
THENEND;
END LOOP;

Granted, the cure may be worse than the disease.

nm


Re: ALTER TABLE deadlock with concurrent INSERT

От
Joe Conway
Дата:
On 03/03/2011 11:36 PM, Noah Misch wrote:
> Does relation 16902 (attribute_summary) have a foreign key constraint over the
> sequence_number column, in either direction, with relation 16896?  That would
> explain it:
>
> session 1: ALTER TABLE attribute_summary ... <sleeps after relation_openrv in transformAlterTableStmt>
> session 2: SELECT 1 FROM rel16896 LIMIT 0;
> session 2: SELECT 1 FROM attribute_summary LIMIT 0; <blocks>
> session 1: <wakes up; continues ALTER TABLE: deadlock upon locking rel16896>

Ah, OK -- then that would explain it as there are foreign keys on that
column. Thanks, hadn't thought about that aspect.

> Granted, the cure may be worse than the disease.

Right. I've already advised they shut down the application during the
alter table, which they can do (and in fact already do -- they were
restarting the application just prior to this step, which really makes
no sense anyway).

Thanks,

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support