Обсуждение: Query on pg_stat_activity table got stuck

Поиск

Список

Период

Сортировка

Query on pg_stat_activity table got stuck

От

neeraj kumar

Дата:

06 мая 2019 г., 21:24:31

Hello,

We are using PG 10.6. We have one cron job that queries pg_stat_activity table to find out how many queries are running longer than X minutes and generate metrics.

Query look like this :
SELECT * FROM pg_stat_activity WHERE state='active'

After some days, this query get stuck and doesn't finish. We tried to run this query manually and same result.

We looked into pg_locks table and there this query is not blocked on any lock :
https://justpaste.it/48rpe

Now we don't know where to look next. Our expectation was query on pg_stat_activity shouldn't get blocked on anything and pg_locks also showing same result.

Google search also not returning anything useful that is why reaching out to this email list.

We have separated that DB and issue is still produceable on that DB, in case if more information is not needed.

-------------------------------------
Thanks
Neeraj

Re: Query on pg_stat_activity table got stuck

От

Tom Lane

Дата:

07 мая 2019 г., 00:33:06

neeraj kumar <neeru.cse@gmail.com> writes:
> We are using PG 10.6. We have one cron job that queries pg_stat_activity
> table to find out how many queries are running longer than X minutes and
> generate metrics.

> Query look like this :
> SELECT * FROM pg_stat_activity WHERE state='active'

> After some days, this query get stuck and doesn't finish. We tried to run
> this query manually and same result.

> We looked into pg_locks table and there this query is not blocked on any
> lock :
> https://justpaste.it/48rpe

Interesting.  Can you get a stack trace to show where in the code it's
stuck?

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

            regards, tom lane

Re: Query on pg_stat_activity table got stuck

От

Tom Lane

Дата:

07 мая 2019 г., 00:33:06

neeraj kumar <neeru.cse@gmail.com> writes:
> We are using PG 10.6. We have one cron job that queries pg_stat_activity
> table to find out how many queries are running longer than X minutes and
> generate metrics.

> Query look like this :
> SELECT * FROM pg_stat_activity WHERE state='active'

> After some days, this query get stuck and doesn't finish. We tried to run
> this query manually and same result.

> We looked into pg_locks table and there this query is not blocked on any
> lock :
> https://justpaste.it/48rpe

Interesting.  Can you get a stack trace to show where in the code it's
stuck?

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

            regards, tom lane

Re: Query on pg_stat_activity table got stuck

От

neeraj kumar

Дата:

08 мая 2019 г., 20:42:12

Took some time to get stack trace as we didn't had root permission. Attaching stack trace of two process (out of many) stuck for same query below[1][2]

Seems like call is unable to come out of this loop :
https://github.com/postgres/postgres/blob/master/src/backend/postmaster/pgstat.c#L3361-L3400

All stack trace's top function points to somewhere in this loop so it means calls are not stuck, but are unable to come out of this loop.

[1]

[12:43:30][root][~]$ pstack 6283

#0 pgstat_read_current_status () at pgstat.c:3495

#1 0x0000000000732381 in pgstat_read_current_status () at pgstat.c:2566

#2 pgstat_fetch_stat_numbackends () at pgstat.c:2567

#3 0x000000000083bfef in pg_stat_get_activity (fcinfo=0x7ffd26955b80) at pgstatfuncs.c:581

#4 0x00000000006832a1 in ExecMakeTableFunctionResult (setexpr=0x14ea907dcc60, econtext=0x14ea907dca50, argContext=<optimized out>, expectedDesc=0x14ea907df048, randomAccess=0 '\000') at execSRF.c:231

#5 0x000000000068e7b3 in FunctionNext (node=node@entry=0x14ea907dc298) at nodeFunctionscan.c:94

#6 0x000000000068275a in ExecScanFetch (recheckMtd=0x68e4e0 <FunctionRecheck>, accessMtd=0x68e500 <FunctionNext>, node=0x14ea907dc298) at execScan.c:97

#7 ExecScan (node=0x14ea907dc298, accessMtd=0x68e500 <FunctionNext>, recheckMtd=0x68e4e0 <FunctionRecheck>) at execScan.c:147

#8 0x0000000000688009 in ExecProcNode (node=0x14ea907dc298) at ../../../src/include/executor/executor.h:250

#9 fetch_input_tuple (aggstate=aggstate@entry=0x14ea907dc4f8) at nodeAgg.c:695

#10 0x000000000068a0af in agg_retrieve_direct (aggstate=0x14ea907dc4f8) at nodeAgg.c:2347

#11 ExecAgg (pstate=0x14ea907dc4f8) at nodeAgg.c:2158

#12 0x000000000067cce2 in ExecProcNode (node=0x14ea907dc4f8) at ../../../src/include/executor/executor.h:250

#13 ExecutePlan (execute_once=<optimized out>, dest=0x14ea907a1190, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x14ea907dc4f8, estate=0x14ea907dc038) at execMain.c:1723

#14 standard_ExecutorRun (queryDesc=0x14ea906cc038, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364

#15 0x000014ed56cee425 in pgss_ExecutorRun (queryDesc=0x14ea906cc038, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at pg_stat_statements.c:891

#16 0x000014ed56cd760e in explain_ExecutorRun (queryDesc=0x14ea906cc038, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at auto_explain.c:267

#17 0x00000000007b328c in PortalRunSelect (portal=portal@entry=0x14ea907da038, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x14ea907a1190) at pquery.c:932

#18 0x00000000007b4630 in PortalRun (portal=portal@entry=0x14ea907da038, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', run_once=run_once@entry=1 '\001', dest=dest@entry=0x14ea907a1190, altdest=altdest@entry=0x14ea907a1190, completionTag=0x7ffd26956530 "") at pquery.c:773

#19 0x00000000007b0223 in exec_simple_query (query_string=0x14ed51d1f038 "select count(*) from pg_stat_activity;") at postgres.c:1145

#20 0x00000000007b2388 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x14ed51dd42a8, dbname=0x14ed51dd4158 "db_name", username=<optimized out>) at postgres.c:4235

#21 0x00000000004cf2ae in BackendRun (port=0x14ed51dfa380) at postmaster.c:4791

#22 BackendStartup (port=0x14ed51dfa380) at postmaster.c:4458

#23 ServerLoop () at postmaster.c:1930

#24 0x0000000000739d58 in PostmasterMain (argc=argc@entry=9, argv=argv@entry=0x14ed51c246f0) at postmaster.c:1557

#25 0x00000000004d1594 in main (argc=9, argv=0x14ed51c246f0) at main.c:228

[2]

[14:53:36][root][~]$ pstack 82504

#0 0x000000000072e053 in pgstat_read_current_status () at pgstat.c:3467

#1 0x0000000000732381 in pgstat_read_current_status () at pgstat.c:2566

#2 pgstat_fetch_stat_numbackends () at pgstat.c:2567

#3 0x000000000083bfef in pg_stat_get_activity (fcinfo=0x7ffd26955c30) at pgstatfuncs.c:581

#4 0x00000000006832a1 in ExecMakeTableFunctionResult (setexpr=0x14ea906e95b0, econtext=0x14ea906e8a50, argContext=<optimized out>, expectedDesc=0x14ea906eb958, randomAccess=0 '\000') at execSRF.c:231

#5 0x000000000068e7b3 in FunctionNext (node=node@entry=0x14ea906e8298) at nodeFunctionscan.c:94

#6 0x00000000006826e7 in ExecScanFetch (recheckMtd=0x68e4e0 <FunctionRecheck>, accessMtd=0x68e500 <FunctionNext>, node=0x14ea906e8298) at execScan.c:97

#7 ExecScan (node=0x14ea906e8298, accessMtd=0x68e500 <FunctionNext>, recheckMtd=0x68e4e0 <FunctionRecheck>) at execScan.c:164

#8 0x0000000000688009 in ExecProcNode (node=0x14ea906e8298) at ../../../src/include/executor/executor.h:250

#9 fetch_input_tuple (aggstate=aggstate@entry=0x14ea906e84f8) at nodeAgg.c:695

#10 0x000000000068a0af in agg_retrieve_direct (aggstate=0x14ea906e84f8) at nodeAgg.c:2347

#11 ExecAgg (pstate=0x14ea906e84f8) at nodeAgg.c:2158

#12 0x000000000067cce2 in ExecProcNode (node=0x14ea906e84f8) at ../../../src/include/executor/executor.h:250

#13 ExecutePlan (execute_once=<optimized out>, dest=0x14ed51d1f448, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x14ea906e84f8, estate=0x14ea906e8038) at execMain.c:1723

#14 standard_ExecutorRun (queryDesc=0x14ed51d854b8, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364

#15 0x000014ed56cee425 in pgss_ExecutorRun (queryDesc=0x14ed51d854b8, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at pg_stat_statements.c:891

#16 0x000014ed56cd760e in explain_ExecutorRun (queryDesc=0x14ed51d854b8, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at auto_explain.c:267

#17 0x00000000007b328c in PortalRunSelect (portal=portal@entry=0x14ed51e08038, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x14ed51d1f448) at pquery.c:932

#18 0x00000000007b4630 in PortalRun (portal=portal@entry=0x14ed51e08038, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', run_once=<optimized out>, dest=dest@entry=0x14ed51d1f448, altdest=altdest@entry=0x14ed51d1f448, completionTag=0x7ffd26956730 "") at pquery.c:773

#19 0x00000000007b200c in exec_execute_message (max_rows=9223372036854775807, portal_name=0x14ed51d1f038 "") at postgres.c:2030

#20 PostgresMain (argc=<optimized out>, argv=argv@entry=0x14ed51d6d088, dbname=0x14ed51d6d068 "db_name", username=<optimized out>) at postgres.c:4298

#21 0x00000000004cf2ae in BackendRun (port=0x14ed51dfa380) at postmaster.c:4791

#22 BackendStartup (port=0x14ed51dfa380) at postmaster.c:4458

#23 ServerLoop () at postmaster.c:1930

#24 0x0000000000739d58 in PostmasterMain (argc=argc@entry=9, argv=argv@entry=0x14ed51c246f0) at postmaster.c:1557

#25 0x00000000004d1594 in main (argc=9, argv=0x14ed51c246f0) at main.c:228

[14:53:43][root][~]$

On Mon, May 6, 2019 at 2:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

neeraj kumar <neeru.cse@gmail.com> writes:
> We are using PG 10.6. We have one cron job that queries pg_stat_activity
> table to find out how many queries are running longer than X minutes and
> generate metrics.

> Query look like this :
> SELECT * FROM pg_stat_activity WHERE state='active'

> After some days, this query get stuck and doesn't finish. We tried to run
> this query manually and same result.

> We looked into pg_locks table and there this query is not blocked on any
> lock :
> https://justpaste.it/48rpe

Interesting. Can you get a stack trace to show where in the code it's
stuck?

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

regards, tom lane

-------------------------------------
Thanks
Neeraj Kumar,
+1 (206) 427-7267