Обсуждение: circular wait not triggering deadlock ?

Поиск
Список
Период
Сортировка

circular wait not triggering deadlock ?

От
Justin Pryzby
Дата:
Running pg10.2, I have a handful of maintenance jobs run in the middle of the
night, which appear to have gotten stuck waiting on each other..

 . reindex - loop around pg_repack or psql -c REINDEX $table for those which can't
   be repacked)
 . vacuum - mostly runs ANALYZE, like on empty parent tables

I believe I've come across something similar before, but hadn't previously
stopped to dig into it.

ts=# SELECT * FROM pg_stat_activity WHERE pid=20488;
-[ RECORD 1
]----+------------------------------------------------------------------------------------------------------------------------------------------
datid            | 16402
datname          | ts
pid              | 20488
usesysid         | 10
usename          | postgres
application_name | pg_repack
client_addr      | 
client_hostname  | 
client_port      | -1
backend_start    | 2018-03-07 01:00:15.72433-07
xact_start       | 2018-03-07 01:00:15.747619-07
query_start      | 2018-03-07 01:00:15.747619-07
state_change     | 2018-03-07 01:00:15.74762-07
wait_event_type  | Lock
wait_event       | virtualxid
state            | active
backend_xid      | 
backend_xmin     | 
query            | CREATE INDEX CONCURRENTLY index_1113966210 ON eric_enodeb_cell_20180304 USING btree (site_id) WITH
(fillfactor='100')TABLESPACE oldindex
 
backend_type     | client backend

ts=# SELECT * FROM pg_stat_activity WHERE pid=530;
-[ RECORD 1 ]----+--------------------------------------------------------------------------
datid            | 16402
datname          | ts
pid              | 530
usesysid         | 10
usename          | postgres
application_name | psql
client_addr      | 
client_hostname  | 
client_port      | -1
backend_start    | 2018-03-07 02:11:59.00832-07
xact_start       | 2018-03-07 02:11:59.009855-07
query_start      | 2018-03-07 02:11:59.009855-07
state_change     | 2018-03-07 02:11:59.009861-07
wait_event_type  | Lock
wait_event       | relation
state            | active
backend_xid      | 
backend_xmin     | 67578467
query            | ALTER TABLE eric_enodeb_cell_metrics* ALTER start_time SET STATISTICS 400
backend_type     | client backend

ts=# SELECT relation::regclass, granted, * FROM pg_locks WHERE (pid=530 AND NOT granted) OR
relation='eric_enodeb_cell_20180304'::regclass;--  virtualxid='16/38159763' OR
relation='eric_enodeb_cell_metrics'::regclassORDER BY 1,2,3;
 
         relation          | granted | locktype | database |  relation  | page | tuple | virtualxid | transactionid |
classid| objid | objsubid | virtualtransaction |  pid  |           mode           | granted | fastpath 
 

---------------------------+---------+----------+----------+------------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+--------------------------+---------+----------
 eric_enodeb_cell_20180304 | t       | relation |    16402 | 1113966172 |      |       |            |               |
     |       |          | 19/28401734        | 20488 | ShareUpdateExclusiveLock | t       | f
 
 eric_enodeb_cell_20180304 | f       | relation |    16402 | 1113966172 |      |       |            |               |
     |       |          | 16/38159763        |   530 | ShareUpdateExclusiveLock | f       | f
 

I see PID 530 has ShareUpdateExclusiveLock on many child tables.
..but NOT on eric_enodeb_cell_20180304 , which #20488 is trying to reindex.

ts=# SELECT granted, relation::regclass, * FROM pg_locks WHERE pid=530 OR relation='eric_enodeb_cell_metrics'::regclass
ORDERBY 1, 2, 3;
 
 granted |         relation          |  locktype  | database |  relation  | page | tuple | virtualxid  | transactionid
|classid | objid | objsubid | virtualtransaction |  pid  |           mode           | granted | fastpath 
 

---------+---------------------------+------------+----------+------------+------+-------+-------------+---------------+---------+-------+----------+--------------------+-------+--------------------------+---------+----------
 f       | eric_enodeb_cell_metrics  | relation   |    16402 |  948644697 |      |       |             |
|        |       |          | 32/838769          | 18941 | AccessShareLock          | f       | f
 
 f       | eric_enodeb_cell_metrics  | relation   |    16402 |  948644697 |      |       |             |
|        |       |          | 18/38156962        | 16592 | AccessExclusiveLock      | f       | f
 
 f       | eric_enodeb_cell_metrics  | relation   |    16402 |  948644697 |      |       |             |
|        |       |          | 31/1337307         | 22961 | AccessShareLock          | f       | f
 
 f       | eric_enodeb_cell_metrics  | relation   |    16402 |  948644697 |      |       |             |
|        |       |          | 27/1755026         | 17248 | AccessShareLock          | f       | f
 
 f       | eric_enodeb_cell_metrics  | relation   |    16402 |  948644697 |      |       |             |
|        |       |          | 28/1118276         | 28357 | AccessShareLock          | f       | f
 
 f       | eric_enodeb_cell_metrics  | relation   |    16402 |  948644697 |      |       |             |
|        |       |          | 29/830016          | 21846 | AccessExclusiveLock      | f       | f
 
 f       | eric_enodeb_cell_20180304 | relation   |    16402 | 1113966172 |      |       |             |
|        |       |          | 16/38159763        |   530 | ShareUpdateExclusiveLock | f       | f
 
 t       | eric_enodeb_cell_201609   | relation   |    16402 |  193318232 |      |       |             |
|        |       |          | 16/38159763        |   530 | ShareUpdateExclusiveLock | t       | f
 
 t       | eric_enodeb_cell_201610   | relation   |    16402 |  193318241 |      |       |             |
|        |       |          | 16/38159763        |   530 | ShareUpdateExclusiveLock | t       | f
 
 t       | eric_enodeb_cell_201611   | relation   |    16402 |  206004215 |      |       |             |
|        |       |          | 16/38159763        |   530 | ShareUpdateExclusiveLock | t       | f
 
 t       | eric_enodeb_cell_201612   | relation   |    16402 |  232980573 |      |       |             |
|        |       |          | 16/38159763        |   530 | ShareUpdateExclusiveLock | t       | f
 
 t       | eric_enodeb_cell_201701   | relation   |    16402 |  259721731 |      |       |             |
|        |       |          | 16/38159763        |   530 | ShareUpdateExclusiveLock | t       | f
 
 t       | eric_enodeb_cell_201702   | relation   |    16402 |  291841516 |      |       |             |
|        |       |          | 16/38159763        |   530 | ShareUpdateExclusiveLock | t       | f
 
 t       | eric_enodeb_cell_20170301 | relation   |    16402 |  384321707 |      |       |             |
|        |       |          | 16/38159763        |   530 | ShareUpdateExclusiveLock | t       | f
 
 t       | eric_enodeb_cell_20170302 | relation   |    16402 |  385254657 |      |       |             |
|        |       |          | 16/38159763        |   530 | ShareUpdateExclusiveLock | t       | f
 
 t       | eric_enodeb_cell_20170303 | relation   |    16402 |  386180686 |      |       |             |
|        |       |          | 16/38159763        |   530 | ShareUpdateExclusiveLock | t       | f
 
...

ts=# SELECT granted, relation::regclass, * FROM pg_locks WHERE pid=20488 ORDER BY 1,2,3;
-[ RECORD 1 ]------+--------------------------
granted            | f
locktype           | virtualxid
virtualxid         | 22/4317099
virtualtransaction | 19/28401734
pid                | 20488
mode               | ShareLock
granted            | f
fastpath           | f

ts=# SELECT granted, pid, mode, virtualtransaction, virtualxid FROM pg_locks WHERE virtualxid='22/4317099' ORDER BY
1,2,3;
 granted |  pid  |     mode      | virtualtransaction | virtualxid 
---------+-------+---------------+--------------------+------------
 f       | 20488 | ShareLock     | 19/28401734        | 22/4317099
 t       |  6471 | ExclusiveLock | 22/4317099         | 22/4317099

[pryzbyj@database ~]$ ps -O lstart -C reindex,pg_repack
  PID                  STARTED S TTY          TIME COMMAND
20038 Wed Mar  7 01:00:00 2018 S ?        00:00:00 /bin/sh /usr/local/sbin/reindex
20484 Wed Mar  7 01:00:14 2018 S ?        00:00:00 /bin/sh /usr/local/sbin/reindex
20485 Wed Mar  7 01:00:14 2018 S ?        00:00:00 /usr/pgsql-10/bin/pg_repack -E warning --no-kill-backend -d ts -i
eric_enodeb_cell_20180304_site_idx-Ss oldindex
 

Note, I believe last night our backup job would've run for longer and processed
(many) more tables than usual, looping around pg_dump --snapshot.

Is it a bug that this isn't caught by a deadlock detector and cancelled?

Thanks,
Justin


Re: circular wait not triggering deadlock ?

От
Tom Lane
Дата:
Justin Pryzby <pryzby@telsasoft.com> writes:
> Running pg10.2, I have a handful of maintenance jobs run in the middle of the
> night, which appear to have gotten stuck waiting on each other..

> ts=# SELECT granted, pid, mode, virtualtransaction, virtualxid FROM pg_locks WHERE virtualxid='22/4317099' ORDER BY
1,2,3;
>  granted |  pid  |     mode      | virtualtransaction | virtualxid
> ---------+-------+---------------+--------------------+------------
>  f       | 20488 | ShareLock     | 19/28401734        | 22/4317099
>  t       |  6471 | ExclusiveLock | 22/4317099         | 22/4317099

PID 20488 is evidently waiting for PID 6471 to finish its transaction.
What's that one doing?

> Is it a bug that this isn't caught by a deadlock detector and cancelled?

We did find a bug recently with concurrent CREATE INDEX CONCURRENTLY's
all waiting for each other to commit.

            regards, tom lane


Re: circular wait not triggering deadlock ?

От
Justin Pryzby
Дата:
On Thu, Mar 08, 2018 at 01:57:06PM -0500, Tom Lane wrote:
> Justin Pryzby <pryzby@telsasoft.com> writes:
> > Running pg10.2, I have a handful of maintenance jobs run in the middle of the
> > night, which appear to have gotten stuck waiting on each other..
> 
> > ts=# SELECT granted, pid, mode, virtualtransaction, virtualxid FROM pg_locks WHERE virtualxid='22/4317099' ORDER BY
1,2,3;
> >  granted |  pid  |     mode      | virtualtransaction | virtualxid 
> > ---------+-------+---------------+--------------------+------------
> >  f       | 20488 | ShareLock     | 19/28401734        | 22/4317099
> >  t       |  6471 | ExclusiveLock | 22/4317099         | 22/4317099
> 
> PID 20488 is evidently waiting for PID 6471 to finish its transaction.
> What's that one doing?

Um, I thought I had kept track of all two pids but looks not..

query            | SELECT pg_export_snapshot();

So that explains that.
I already killed one proc, but..

ts=# SELECT granted, relation::regclass, pid, mode, virtualtransaction, virtualxid FROM pg_locks WHERE pid=17248 ORDER
BY1;
 
 granted |                relation                 |  pid  |      mode       | virtualtransaction | virtualxid
---------+-----------------------------------------+-------+-----------------+--------------------+------------
 f       | eric_enodeb_cell_metrics                | 17248 | AccessShareLock | 27/1755026         |

ts=# SELECT granted, relation::regclass, pid, mode, virtualtransaction, virtualxid FROM pg_locks WHERE
relation='eric_enodeb_cell_metrics'::regclassORDER BY 1; --pid=17248  virtualxid='22/4317099' ORDER BY 1,2,3;
 
 granted |         relation         |  pid  |           mode           | virtualtransaction | virtualxid
---------+--------------------------+-------+--------------------------+--------------------+------------
 f       | eric_enodeb_cell_metrics | 22961 | AccessShareLock          | 31/1337307         |
 f       | eric_enodeb_cell_metrics | 17248 | AccessShareLock          | 27/1755026         |
 f       | eric_enodeb_cell_metrics | 28357 | AccessShareLock          | 28/1118276         |
 f       | eric_enodeb_cell_metrics | 21846 | AccessExclusiveLock      | 29/830016          |
 f       | eric_enodeb_cell_metrics | 16592 | AccessExclusiveLock      | 18/38156962        |
 f       | eric_enodeb_cell_metrics | 18941 | AccessShareLock          | 32/838769          |
 t       | eric_enodeb_cell_metrics |   530 | ShareUpdateExclusiveLock | 16/38159763        |
(7 rows)

ts=# SELECT pid, state, left(query,199) FROM pg_stat_activity WHERE pid IN(530,17248);
   530 | active | ALTER TABLE eric_enodeb_cell_metrics* ALTER start_time SET STATISTICS 400
 17248 | active | SELECT c.tableoid, c.oid, c.relname, (SELECT pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT
acl,row_n FROM pg_catalog.unnest(coalesce(c.relacl,pg_catalog.acldefault(
 
CASE WHEN c.relkind = 'S' 

ts=# SELECT granted, pid, mode, virtualtransaction, virtualxid FROM pg_locks WHERE
relation='eric_enodeb_cell_20180304'::regclass;
 
 granted |  pid  |           mode           | virtualtransaction | virtualxid
---------+-------+--------------------------+--------------------+------------
 t       | 20488 | ShareUpdateExclusiveLock | 19/28401734        |
 f       |   530 | ShareUpdateExclusiveLock | 16/38159763        |

ts=# SELECT pid, state, left(query,199) FROM pg_stat_activity WHERE pid=20488;
  pid  | state  |                                                                   left
                                   
 

-------+--------+-------------------------------------------------------------------------------------------------------------------------------------------
 20488 | active | CREATE INDEX CONCURRENTLY index_1113966210 ON eric_enodeb_cell_20180304 USING btree (site_id) WITH
(fillfactor='100')TABLESPACE oldindex
 

ts=# SELECT granted, pid, relation::regclass, locktype, mode  virtualxid, virtualxid, virtualtransaction FROM pg_locks
WHEREpid=20488 ORDER BY 1;
 
 granted |  pid  |         relation          |  locktype  |        virtualxid        | virtualxid  | virtualtransaction


---------+-------+---------------------------+------------+--------------------------+-------------+--------------------
 f       | 20488 |                           | virtualxid | ShareLock                | 22/4317099  | 19/28401734

ts=# SELECT granted, pid, relation::regclass, locktype, mode  virtualxid, virtualtransaction FROM pg_locks WHERE
virtualxid='22/4317099'ORDER BY 1;
 
 granted |  pid  | relation |  locktype  |  virtualxid   | virtualtransaction 
---------+-------+----------+------------+---------------+--------------------
 f       | 20488 |          | virtualxid | ShareLock     | 19/28401734
 t       |  6471 |          | virtualxid | ExclusiveLock | 22/4317099
(2 rows)

So...I gather ALTER SET STATS is waiting on pg_dump which is waiting on CREATE
INDEX which is waiting on SELECT pg_export_snapshot(), which I take to mean
that I should avoid running pg_repack until the backup is finished..

ts=# SELECT pid, state, application_name app, left(query,99) FROM pg_stat_activity ORDER BY xact_start LIMIT 9;
  pid  |        state        |    app    |                                                left
                      
 

-------+---------------------+-----------+-----------------------------------------------------------------------------------------------------
  6471 | idle in transaction | psql      | SELECT pg_export_snapshot();
 20488 | active              | pg_repack | CREATE INDEX CONCURRENTLY index_1113966210 ON eric_enodeb_cell_20180304
USINGbtree (site_id) WITH 
 
   530 | active              | psql      | ALTER TABLE eric_enodeb_cell_metrics* ALTER start_time SET STATISTICS 400
 17248 | active              | pg_dump   | SELECT c.tableoid, c.oid, c.relname, (SELECT pg_catalog.array_agg(acl ORDER
BYrow_n) FROM (SELECT 
 

Justin


Re: circular wait not triggering deadlock ?

От
Tom Lane
Дата:
Justin Pryzby <pryzby@telsasoft.com> writes:
> On Thu, Mar 08, 2018 at 01:57:06PM -0500, Tom Lane wrote:
>> PID 20488 is evidently waiting for PID 6471 to finish its transaction.
>> What's that one doing?

> Um, I thought I had kept track of all two pids but looks not..
> query            | SELECT pg_export_snapshot();

pg_export_snapshot doesn't block for other transactions, though.
Further down, you have output that confirms that:

>   6471 | idle in transaction | psql      | SELECT pg_export_snapshot();

That SELECT is the most recently completed command, not the current one.
So now the question is what the connected application is waiting for
before committing.

            regards, tom lane


Re: circular wait not triggering deadlock ?

От
Justin Pryzby
Дата:
On Thu, Mar 08, 2018 at 03:05:36PM -0500, Tom Lane wrote:
> Justin Pryzby <pryzby@telsasoft.com> writes:
> > On Thu, Mar 08, 2018 at 01:57:06PM -0500, Tom Lane wrote:
> >> PID 20488 is evidently waiting for PID 6471 to finish its transaction.
> >> What's that one doing?
> 
> > Um, I thought I had kept track of all two pids but looks not..
> > query            | SELECT pg_export_snapshot();
> 
> pg_export_snapshot doesn't block for other transactions, though.
> Further down, you have output that confirms that:
> 
> >   6471 | idle in transaction | psql      | SELECT pg_export_snapshot();
> 
> That SELECT is the most recently completed command, not the current one.
> So now the question is what the connected application is waiting for
> before committing.

I believe it does an idle loop around sleep(1), until all the pg_dump
--snapshot children are done.

Justin