Обсуждение: Query on pg_stat_activity table got stuck

Поиск
Список
Период
Сортировка

Query on pg_stat_activity table got stuck

От
neeraj kumar
Дата:
Hello,

We are using PG 10.6. We have one cron job that queries pg_stat_activity table to find out how many queries are running longer than X minutes and generate metrics.

Query look like this : 
SELECT * FROM pg_stat_activity WHERE state='active'

After some days, this query get stuck and doesn't finish. We tried to run this query manually and same result.

We looked into pg_locks table and there this query is not blocked on any lock :
https://justpaste.it/48rpe 

Now we don't know where to look next. Our expectation was query on pg_stat_activity shouldn't get blocked on anything and pg_locks also showing same result.
Google search also not returning anything useful that is why reaching out to this email list.
We have separated that DB and issue is still produceable on that DB, in case if more information is not needed.

-------------------------------------
Thanks
Neeraj

Re: Query on pg_stat_activity table got stuck

От
Tom Lane
Дата:
neeraj kumar <neeru.cse@gmail.com> writes:
> We are using PG 10.6. We have one cron job that queries pg_stat_activity
> table to find out how many queries are running longer than X minutes and
> generate metrics.

> Query look like this :
> SELECT * FROM pg_stat_activity WHERE state='active'

> After some days, this query get stuck and doesn't finish. We tried to run
> this query manually and same result.

> We looked into pg_locks table and there this query is not blocked on any
> lock :
> https://justpaste.it/48rpe

Interesting.  Can you get a stack trace to show where in the code it's
stuck?

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

            regards, tom lane



Re: Query on pg_stat_activity table got stuck

От
Tom Lane
Дата:
neeraj kumar <neeru.cse@gmail.com> writes:
> We are using PG 10.6. We have one cron job that queries pg_stat_activity
> table to find out how many queries are running longer than X minutes and
> generate metrics.

> Query look like this :
> SELECT * FROM pg_stat_activity WHERE state='active'

> After some days, this query get stuck and doesn't finish. We tried to run
> this query manually and same result.

> We looked into pg_locks table and there this query is not blocked on any
> lock :
> https://justpaste.it/48rpe

Interesting.  Can you get a stack trace to show where in the code it's
stuck?

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

            regards, tom lane



Re: Query on pg_stat_activity table got stuck

От
neeraj kumar
Дата:
Took some time to get stack trace as we didn't had root permission. Attaching stack trace of two process (out of many) stuck for same query below[1][2]

All stack trace's top function points to somewhere in this loop so it means calls are not stuck, but are unable to come out of this loop. 

[1] 
[12:43:30][root][~]$ pstack 6283 
#0 pgstat_read_current_status () at pgstat.c:3495 
#1 0x0000000000732381 in pgstat_read_current_status () at pgstat.c:2566 
#2 pgstat_fetch_stat_numbackends () at pgstat.c:2567 
#3 0x000000000083bfef in pg_stat_get_activity (fcinfo=0x7ffd26955b80) at pgstatfuncs.c:581 
#4 0x00000000006832a1 in ExecMakeTableFunctionResult (setexpr=0x14ea907dcc60, econtext=0x14ea907dca50, argContext=<optimized out>, expectedDesc=0x14ea907df048, randomAccess=0 '\000') at execSRF.c:231 
#5 0x000000000068e7b3 in FunctionNext (node=node@entry=0x14ea907dc298) at nodeFunctionscan.c:94 
#6 0x000000000068275a in ExecScanFetch (recheckMtd=0x68e4e0 <FunctionRecheck>, accessMtd=0x68e500 <FunctionNext>, node=0x14ea907dc298) at execScan.c:97 
#7 ExecScan (node=0x14ea907dc298, accessMtd=0x68e500 <FunctionNext>, recheckMtd=0x68e4e0 <FunctionRecheck>) at execScan.c:147 
#8 0x0000000000688009 in ExecProcNode (node=0x14ea907dc298) at ../../../src/include/executor/executor.h:250 
#9 fetch_input_tuple (aggstate=aggstate@entry=0x14ea907dc4f8) at nodeAgg.c:695 
#10 0x000000000068a0af in agg_retrieve_direct (aggstate=0x14ea907dc4f8) at nodeAgg.c:2347 
#11 ExecAgg (pstate=0x14ea907dc4f8) at nodeAgg.c:2158 
#12 0x000000000067cce2 in ExecProcNode (node=0x14ea907dc4f8) at ../../../src/include/executor/executor.h:250 
#13 ExecutePlan (execute_once=<optimized out>, dest=0x14ea907a1190, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x14ea907dc4f8, estate=0x14ea907dc038) at execMain.c:1723 
#14 standard_ExecutorRun (queryDesc=0x14ea906cc038, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364 
#15 0x000014ed56cee425 in pgss_ExecutorRun (queryDesc=0x14ea906cc038, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at pg_stat_statements.c:891 
#16 0x000014ed56cd760e in explain_ExecutorRun (queryDesc=0x14ea906cc038, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at auto_explain.c:267 
#17 0x00000000007b328c in PortalRunSelect (portal=portal@entry=0x14ea907da038, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x14ea907a1190) at pquery.c:932 
#18 0x00000000007b4630 in PortalRun (portal=portal@entry=0x14ea907da038, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', run_once=run_once@entry=1 '\001', dest=dest@entry=0x14ea907a1190, altdest=altdest@entry=0x14ea907a1190, completionTag=0x7ffd26956530 "") at pquery.c:773 
#19 0x00000000007b0223 in exec_simple_query (query_string=0x14ed51d1f038 "select count(*) from pg_stat_activity;") at postgres.c:1145 
#20 0x00000000007b2388 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x14ed51dd42a8, dbname=0x14ed51dd4158 "db_name", username=<optimized out>) at postgres.c:4235 
#21 0x00000000004cf2ae in BackendRun (port=0x14ed51dfa380) at postmaster.c:4791 
#22 BackendStartup (port=0x14ed51dfa380) at postmaster.c:4458 
#23 ServerLoop () at postmaster.c:1930 
#24 0x0000000000739d58 in PostmasterMain (argc=argc@entry=9, argv=argv@entry=0x14ed51c246f0) at postmaster.c:1557 
#25 0x00000000004d1594 in main (argc=9, argv=0x14ed51c246f0) at main.c:228


[2]
[14:53:36][root][~]$ pstack 82504 
#0 0x000000000072e053 in pgstat_read_current_status () at pgstat.c:3467 
#1 0x0000000000732381 in pgstat_read_current_status () at pgstat.c:2566 
#2 pgstat_fetch_stat_numbackends () at pgstat.c:2567 
#3 0x000000000083bfef in pg_stat_get_activity (fcinfo=0x7ffd26955c30) at pgstatfuncs.c:581 
#4 0x00000000006832a1 in ExecMakeTableFunctionResult (setexpr=0x14ea906e95b0, econtext=0x14ea906e8a50, argContext=<optimized out>, expectedDesc=0x14ea906eb958, randomAccess=0 '\000') at execSRF.c:231 
#5 0x000000000068e7b3 in FunctionNext (node=node@entry=0x14ea906e8298) at nodeFunctionscan.c:94 
#6 0x00000000006826e7 in ExecScanFetch (recheckMtd=0x68e4e0 <FunctionRecheck>, accessMtd=0x68e500 <FunctionNext>, node=0x14ea906e8298) at execScan.c:97 
#7 ExecScan (node=0x14ea906e8298, accessMtd=0x68e500 <FunctionNext>, recheckMtd=0x68e4e0 <FunctionRecheck>) at execScan.c:164 
#8 0x0000000000688009 in ExecProcNode (node=0x14ea906e8298) at ../../../src/include/executor/executor.h:250 
#9 fetch_input_tuple (aggstate=aggstate@entry=0x14ea906e84f8) at nodeAgg.c:695 
#10 0x000000000068a0af in agg_retrieve_direct (aggstate=0x14ea906e84f8) at nodeAgg.c:2347 
#11 ExecAgg (pstate=0x14ea906e84f8) at nodeAgg.c:2158 
#12 0x000000000067cce2 in ExecProcNode (node=0x14ea906e84f8) at ../../../src/include/executor/executor.h:250 
#13 ExecutePlan (execute_once=<optimized out>, dest=0x14ed51d1f448, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x14ea906e84f8, estate=0x14ea906e8038) at execMain.c:1723 
#14 standard_ExecutorRun (queryDesc=0x14ed51d854b8, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364 
#15 0x000014ed56cee425 in pgss_ExecutorRun (queryDesc=0x14ed51d854b8, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at pg_stat_statements.c:891 
#16 0x000014ed56cd760e in explain_ExecutorRun (queryDesc=0x14ed51d854b8, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at auto_explain.c:267 
#17 0x00000000007b328c in PortalRunSelect (portal=portal@entry=0x14ed51e08038, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x14ed51d1f448) at pquery.c:932 
#18 0x00000000007b4630 in PortalRun (portal=portal@entry=0x14ed51e08038, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', run_once=<optimized out>, dest=dest@entry=0x14ed51d1f448, altdest=altdest@entry=0x14ed51d1f448, completionTag=0x7ffd26956730 "") at pquery.c:773 
#19 0x00000000007b200c in exec_execute_message (max_rows=9223372036854775807, portal_name=0x14ed51d1f038 "") at postgres.c:2030 
#20 PostgresMain (argc=<optimized out>, argv=argv@entry=0x14ed51d6d088, dbname=0x14ed51d6d068 "db_name", username=<optimized out>) at postgres.c:4298 
#21 0x00000000004cf2ae in BackendRun (port=0x14ed51dfa380) at postmaster.c:4791 
#22 BackendStartup (port=0x14ed51dfa380) at postmaster.c:4458 
#23 ServerLoop () at postmaster.c:1930 
#24 0x0000000000739d58 in PostmasterMain (argc=argc@entry=9, argv=argv@entry=0x14ed51c246f0) at postmaster.c:1557 
#25 0x00000000004d1594 in main (argc=9, argv=0x14ed51c246f0) at main.c:228 
[14:53:43][root][~]$




On Mon, May 6, 2019 at 2:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
neeraj kumar <neeru.cse@gmail.com> writes:
> We are using PG 10.6. We have one cron job that queries pg_stat_activity
> table to find out how many queries are running longer than X minutes and
> generate metrics.

> Query look like this :
> SELECT * FROM pg_stat_activity WHERE state='active'

> After some days, this query get stuck and doesn't finish. We tried to run
> this query manually and same result.

> We looked into pg_locks table and there this query is not blocked on any
> lock :
> https://justpaste.it/48rpe

Interesting.  Can you get a stack trace to show where in the code it's
stuck?

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

                        regards, tom lane


--
-------------------------------------
Thanks
Neeraj Kumar,
+1  (206) 427-7267

Re: Query on pg_stat_activity table got stuck

От
neeraj kumar
Дата:
Took some time to get stack trace as we didn't had root permission. Attaching stack trace of two process (out of many) stuck for same query below[1][2]

All stack trace's top function points to somewhere in this loop so it means calls are not stuck, but are unable to come out of this loop. 

[1] 
[12:43:30][root][~]$ pstack 6283 
#0 pgstat_read_current_status () at pgstat.c:3495 
#1 0x0000000000732381 in pgstat_read_current_status () at pgstat.c:2566 
#2 pgstat_fetch_stat_numbackends () at pgstat.c:2567 
#3 0x000000000083bfef in pg_stat_get_activity (fcinfo=0x7ffd26955b80) at pgstatfuncs.c:581 
#4 0x00000000006832a1 in ExecMakeTableFunctionResult (setexpr=0x14ea907dcc60, econtext=0x14ea907dca50, argContext=<optimized out>, expectedDesc=0x14ea907df048, randomAccess=0 '\000') at execSRF.c:231 
#5 0x000000000068e7b3 in FunctionNext (node=node@entry=0x14ea907dc298) at nodeFunctionscan.c:94 
#6 0x000000000068275a in ExecScanFetch (recheckMtd=0x68e4e0 <FunctionRecheck>, accessMtd=0x68e500 <FunctionNext>, node=0x14ea907dc298) at execScan.c:97 
#7 ExecScan (node=0x14ea907dc298, accessMtd=0x68e500 <FunctionNext>, recheckMtd=0x68e4e0 <FunctionRecheck>) at execScan.c:147 
#8 0x0000000000688009 in ExecProcNode (node=0x14ea907dc298) at ../../../src/include/executor/executor.h:250 
#9 fetch_input_tuple (aggstate=aggstate@entry=0x14ea907dc4f8) at nodeAgg.c:695 
#10 0x000000000068a0af in agg_retrieve_direct (aggstate=0x14ea907dc4f8) at nodeAgg.c:2347 
#11 ExecAgg (pstate=0x14ea907dc4f8) at nodeAgg.c:2158 
#12 0x000000000067cce2 in ExecProcNode (node=0x14ea907dc4f8) at ../../../src/include/executor/executor.h:250 
#13 ExecutePlan (execute_once=<optimized out>, dest=0x14ea907a1190, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x14ea907dc4f8, estate=0x14ea907dc038) at execMain.c:1723 
#14 standard_ExecutorRun (queryDesc=0x14ea906cc038, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364 
#15 0x000014ed56cee425 in pgss_ExecutorRun (queryDesc=0x14ea906cc038, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at pg_stat_statements.c:891 
#16 0x000014ed56cd760e in explain_ExecutorRun (queryDesc=0x14ea906cc038, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at auto_explain.c:267 
#17 0x00000000007b328c in PortalRunSelect (portal=portal@entry=0x14ea907da038, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x14ea907a1190) at pquery.c:932 
#18 0x00000000007b4630 in PortalRun (portal=portal@entry=0x14ea907da038, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', run_once=run_once@entry=1 '\001', dest=dest@entry=0x14ea907a1190, altdest=altdest@entry=0x14ea907a1190, completionTag=0x7ffd26956530 "") at pquery.c:773 
#19 0x00000000007b0223 in exec_simple_query (query_string=0x14ed51d1f038 "select count(*) from pg_stat_activity;") at postgres.c:1145 
#20 0x00000000007b2388 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x14ed51dd42a8, dbname=0x14ed51dd4158 "db_name", username=<optimized out>) at postgres.c:4235 
#21 0x00000000004cf2ae in BackendRun (port=0x14ed51dfa380) at postmaster.c:4791 
#22 BackendStartup (port=0x14ed51dfa380) at postmaster.c:4458 
#23 ServerLoop () at postmaster.c:1930 
#24 0x0000000000739d58 in PostmasterMain (argc=argc@entry=9, argv=argv@entry=0x14ed51c246f0) at postmaster.c:1557 
#25 0x00000000004d1594 in main (argc=9, argv=0x14ed51c246f0) at main.c:228


[2]
[14:53:36][root][~]$ pstack 82504 
#0 0x000000000072e053 in pgstat_read_current_status () at pgstat.c:3467 
#1 0x0000000000732381 in pgstat_read_current_status () at pgstat.c:2566 
#2 pgstat_fetch_stat_numbackends () at pgstat.c:2567 
#3 0x000000000083bfef in pg_stat_get_activity (fcinfo=0x7ffd26955c30) at pgstatfuncs.c:581 
#4 0x00000000006832a1 in ExecMakeTableFunctionResult (setexpr=0x14ea906e95b0, econtext=0x14ea906e8a50, argContext=<optimized out>, expectedDesc=0x14ea906eb958, randomAccess=0 '\000') at execSRF.c:231 
#5 0x000000000068e7b3 in FunctionNext (node=node@entry=0x14ea906e8298) at nodeFunctionscan.c:94 
#6 0x00000000006826e7 in ExecScanFetch (recheckMtd=0x68e4e0 <FunctionRecheck>, accessMtd=0x68e500 <FunctionNext>, node=0x14ea906e8298) at execScan.c:97 
#7 ExecScan (node=0x14ea906e8298, accessMtd=0x68e500 <FunctionNext>, recheckMtd=0x68e4e0 <FunctionRecheck>) at execScan.c:164 
#8 0x0000000000688009 in ExecProcNode (node=0x14ea906e8298) at ../../../src/include/executor/executor.h:250 
#9 fetch_input_tuple (aggstate=aggstate@entry=0x14ea906e84f8) at nodeAgg.c:695 
#10 0x000000000068a0af in agg_retrieve_direct (aggstate=0x14ea906e84f8) at nodeAgg.c:2347 
#11 ExecAgg (pstate=0x14ea906e84f8) at nodeAgg.c:2158 
#12 0x000000000067cce2 in ExecProcNode (node=0x14ea906e84f8) at ../../../src/include/executor/executor.h:250 
#13 ExecutePlan (execute_once=<optimized out>, dest=0x14ed51d1f448, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x14ea906e84f8, estate=0x14ea906e8038) at execMain.c:1723 
#14 standard_ExecutorRun (queryDesc=0x14ed51d854b8, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364 
#15 0x000014ed56cee425 in pgss_ExecutorRun (queryDesc=0x14ed51d854b8, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at pg_stat_statements.c:891 
#16 0x000014ed56cd760e in explain_ExecutorRun (queryDesc=0x14ed51d854b8, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at auto_explain.c:267 
#17 0x00000000007b328c in PortalRunSelect (portal=portal@entry=0x14ed51e08038, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x14ed51d1f448) at pquery.c:932 
#18 0x00000000007b4630 in PortalRun (portal=portal@entry=0x14ed51e08038, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', run_once=<optimized out>, dest=dest@entry=0x14ed51d1f448, altdest=altdest@entry=0x14ed51d1f448, completionTag=0x7ffd26956730 "") at pquery.c:773 
#19 0x00000000007b200c in exec_execute_message (max_rows=9223372036854775807, portal_name=0x14ed51d1f038 "") at postgres.c:2030 
#20 PostgresMain (argc=<optimized out>, argv=argv@entry=0x14ed51d6d088, dbname=0x14ed51d6d068 "db_name", username=<optimized out>) at postgres.c:4298 
#21 0x00000000004cf2ae in BackendRun (port=0x14ed51dfa380) at postmaster.c:4791 
#22 BackendStartup (port=0x14ed51dfa380) at postmaster.c:4458 
#23 ServerLoop () at postmaster.c:1930 
#24 0x0000000000739d58 in PostmasterMain (argc=argc@entry=9, argv=argv@entry=0x14ed51c246f0) at postmaster.c:1557 
#25 0x00000000004d1594 in main (argc=9, argv=0x14ed51c246f0) at main.c:228 
[14:53:43][root][~]$




On Mon, May 6, 2019 at 2:33 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
neeraj kumar <neeru.cse@gmail.com> writes:
> We are using PG 10.6. We have one cron job that queries pg_stat_activity
> table to find out how many queries are running longer than X minutes and
> generate metrics.

> Query look like this :
> SELECT * FROM pg_stat_activity WHERE state='active'

> After some days, this query get stuck and doesn't finish. We tried to run
> this query manually and same result.

> We looked into pg_locks table and there this query is not blocked on any
> lock :
> https://justpaste.it/48rpe

Interesting.  Can you get a stack trace to show where in the code it's
stuck?

https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

                        regards, tom lane


--
-------------------------------------
Thanks
Neeraj Kumar,
+1  (206) 427-7267

Re: Query on pg_stat_activity table got stuck

От
Tom Lane
Дата:
neeraj kumar <neeru.cse@gmail.com> writes:
> Took some time to get stack trace as we didn't had root permission.
> Attaching stack trace of two process (out of many) stuck for same query
> below[1][2]

Hmm, the line numbers in your stack traces don't agree with either v10
or HEAD branches for me.  But assuming that you've correctly identified
where it's stuck:

> Seems like call is unable to come out of this loop :
> https://github.com/postgres/postgres/blob/master/src/backend/postmaster/pgstat.c#L3361-L3400

the only really obvious theory is that some process left its
st_changecount odd, which would more or less have to imply that
something threw an error between pgstat_increment_changecount_before
and pgstat_increment_changecount_after.  There's only one place
where that seems very plausible, namely pgstat_bestart, which is
doing a rather scary amount of stuff in between.  Are you using
either SSL or GSS?

            regards, tom lane



Re: Query on pg_stat_activity table got stuck

От
Tom Lane
Дата:
neeraj kumar <neeru.cse@gmail.com> writes:
> Took some time to get stack trace as we didn't had root permission.
> Attaching stack trace of two process (out of many) stuck for same query
> below[1][2]

Hmm, the line numbers in your stack traces don't agree with either v10
or HEAD branches for me.  But assuming that you've correctly identified
where it's stuck:

> Seems like call is unable to come out of this loop :
> https://github.com/postgres/postgres/blob/master/src/backend/postmaster/pgstat.c#L3361-L3400

the only really obvious theory is that some process left its
st_changecount odd, which would more or less have to imply that
something threw an error between pgstat_increment_changecount_before
and pgstat_increment_changecount_after.  There's only one place
where that seems very plausible, namely pgstat_bestart, which is
doing a rather scary amount of stuff in between.  Are you using
either SSL or GSS?

            regards, tom lane



Re: Query on pg_stat_activity table got stuck

От
neeraj kumar
Дата:
Yes we use SSL to connect to DB.


From comment seems like each backend should have its own copy of PgBackendStatus, it means st_changecount should be different for each process. If st_changecount was corrupted for 1/2 process, how can it impact newly created process. So could you please help to understand then how come if we run new query via new console (means new process) that also is getting stuck.


On Wed, May 8, 2019 at 11:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
neeraj kumar <neeru.cse@gmail.com> writes:
> Took some time to get stack trace as we didn't had root permission.
> Attaching stack trace of two process (out of many) stuck for same query
> below[1][2]

Hmm, the line numbers in your stack traces don't agree with either v10
or HEAD branches for me.  But assuming that you've correctly identified
where it's stuck:

> Seems like call is unable to come out of this loop :
> https://github.com/postgres/postgres/blob/master/src/backend/postmaster/pgstat.c#L3361-L3400

the only really obvious theory is that some process left its
st_changecount odd, which would more or less have to imply that
something threw an error between pgstat_increment_changecount_before
and pgstat_increment_changecount_after.  There's only one place
where that seems very plausible, namely pgstat_bestart, which is
doing a rather scary amount of stuff in between.  Are you using
either SSL or GSS?

                        regards, tom lane


--
-------------------------------------
Thanks
Neeraj Kumar,
+1  (206) 427-7267

Re: Query on pg_stat_activity table got stuck

От
neeraj kumar
Дата:
Yes we use SSL to connect to DB.


From comment seems like each backend should have its own copy of PgBackendStatus, it means st_changecount should be different for each process. If st_changecount was corrupted for 1/2 process, how can it impact newly created process. So could you please help to understand then how come if we run new query via new console (means new process) that also is getting stuck.


On Wed, May 8, 2019 at 11:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
neeraj kumar <neeru.cse@gmail.com> writes:
> Took some time to get stack trace as we didn't had root permission.
> Attaching stack trace of two process (out of many) stuck for same query
> below[1][2]

Hmm, the line numbers in your stack traces don't agree with either v10
or HEAD branches for me.  But assuming that you've correctly identified
where it's stuck:

> Seems like call is unable to come out of this loop :
> https://github.com/postgres/postgres/blob/master/src/backend/postmaster/pgstat.c#L3361-L3400

the only really obvious theory is that some process left its
st_changecount odd, which would more or less have to imply that
something threw an error between pgstat_increment_changecount_before
and pgstat_increment_changecount_after.  There's only one place
where that seems very plausible, namely pgstat_bestart, which is
doing a rather scary amount of stuff in between.  Are you using
either SSL or GSS?

                        regards, tom lane


--
-------------------------------------
Thanks
Neeraj Kumar,
+1  (206) 427-7267

Re: Query on pg_stat_activity table got stuck

От
Tom Lane
Дата:
neeraj kumar <neeru.cse@gmail.com> writes:
> Yes we use SSL to connect to DB.

Hm.  I'm suspicious that one of the functions that fetch data for
an SSL connection threw an error.  In particular, it doesn't look
to be hard at all to make X509_NAME_to_cstring fall over --- an
encoding conversion failure would do it, even without any stretchy
assumptions about OOM this early in backend start.  Have you got
any SSL certificates floating around with non-ASCII subject name
or issuer name?

> From comment seems like each backend should have its own copy
> of PgBackendStatus, it means st_changecount should be different for each
> process. If st_changecount was corrupted for 1/2 process, how can it impact
> newly created process. So could you please help to understand then how come
> if we run new query via new console (means new process) that also is
> getting stuck.

No, that field is in shared memory.  The easiest way to get it reset
would be to restart the postmaster.  It's possible you could do it
without that, by starting enough sessions so that one of them takes
up the broken pg_stat_activity slot.

            regards, tom lane



Re: Query on pg_stat_activity table got stuck

От
Tom Lane
Дата:
neeraj kumar <neeru.cse@gmail.com> writes:
> Yes we use SSL to connect to DB.

Hm.  I'm suspicious that one of the functions that fetch data for
an SSL connection threw an error.  In particular, it doesn't look
to be hard at all to make X509_NAME_to_cstring fall over --- an
encoding conversion failure would do it, even without any stretchy
assumptions about OOM this early in backend start.  Have you got
any SSL certificates floating around with non-ASCII subject name
or issuer name?

> From comment seems like each backend should have its own copy
> of PgBackendStatus, it means st_changecount should be different for each
> process. If st_changecount was corrupted for 1/2 process, how can it impact
> newly created process. So could you please help to understand then how come
> if we run new query via new console (means new process) that also is
> getting stuck.

No, that field is in shared memory.  The easiest way to get it reset
would be to restart the postmaster.  It's possible you could do it
without that, by starting enough sessions so that one of them takes
up the broken pg_stat_activity slot.

            regards, tom lane



Re: Query on pg_stat_activity table got stuck

От
Jeremy Schneider
Дата:
On 5/8/19 14:58, Tom Lane wrote:
> neeraj kumar <neeru.cse@gmail.com> writes:
>> Yes we use SSL to connect to DB.
> 
> Hm.  I'm suspicious that one of the functions that fetch data for
> an SSL connection threw an error.  In particular, it doesn't look
> to be hard at all to make X509_NAME_to_cstring fall over --- an
> encoding conversion failure would do it, even without any stretchy
> assumptions about OOM this early in backend start.  Have you got
> any SSL certificates floating around with non-ASCII subject name
> or issuer name?

Crazy timing. We just had a report come in from a database in the RDS
fleet that's hitting this same issue.  It was one of the Aurora systems,
but there wasn't anything Aurora-specific that I could see in the
relevant bits of code.

Seems to me that at a minimum, this loop shouldn't go on forever. Even
having an arbitrary, crazy high, hard-coded number of attempts before
failure (like a million) would be better than spinning on the CPU
forever - which is what we are seeing.

Would be even cooler to detect and correct a broken slot in
PgBackendStatus... if I have a good idea I'll post/try it.  :)

-Jeremy

-- 
Jeremy Schneider
Database Engineer
Amazon Web Services



Re: Query on pg_stat_activity table got stuck

От
Jeremy Schneider
Дата:
On 5/8/19 14:58, Tom Lane wrote:
> neeraj kumar <neeru.cse@gmail.com> writes:
>> Yes we use SSL to connect to DB.
> 
> Hm.  I'm suspicious that one of the functions that fetch data for
> an SSL connection threw an error.  In particular, it doesn't look
> to be hard at all to make X509_NAME_to_cstring fall over --- an
> encoding conversion failure would do it, even without any stretchy
> assumptions about OOM this early in backend start.  Have you got
> any SSL certificates floating around with non-ASCII subject name
> or issuer name?

Crazy timing. We just had a report come in from a database in the RDS
fleet that's hitting this same issue.  It was one of the Aurora systems,
but there wasn't anything Aurora-specific that I could see in the
relevant bits of code.

Seems to me that at a minimum, this loop shouldn't go on forever. Even
having an arbitrary, crazy high, hard-coded number of attempts before
failure (like a million) would be better than spinning on the CPU
forever - which is what we are seeing.

Would be even cooler to detect and correct a broken slot in
PgBackendStatus... if I have a good idea I'll post/try it.  :)

-Jeremy

-- 
Jeremy Schneider
Database Engineer
Amazon Web Services



Re: Query on pg_stat_activity table got stuck

От
Tom Lane
Дата:
Jeremy Schneider <schnjere@amazon.com> writes:
> Seems to me that at a minimum, this loop shouldn't go on forever. Even
> having an arbitrary, crazy high, hard-coded number of attempts before
> failure (like a million) would be better than spinning on the CPU
> forever - which is what we are seeing.

I don't think it's the readers' fault.  The problem is that the
writer is violating the protocol.  If we put an upper limit on
the number of spin cycles on the reader side, we'll just be creating
a new failure mode when a writer gets swapped out at the wrong moment.

IMO we need to (a) get the failure-prone code out of the critical
section, and then (b) fix the pgstat_increment_changecount macros
so that the critical sections around these shmem changes really are
critical sections (ie bump CritSectionCount).  That way, if somebody
makes the same mistake again, at least there'll be a pretty obvious
failure rather than a lot of stuck readers.

            regards, tom lane



Re: Query on pg_stat_activity table got stuck

От
Tom Lane
Дата:
Jeremy Schneider <schnjere@amazon.com> writes:
> Seems to me that at a minimum, this loop shouldn't go on forever. Even
> having an arbitrary, crazy high, hard-coded number of attempts before
> failure (like a million) would be better than spinning on the CPU
> forever - which is what we are seeing.

I don't think it's the readers' fault.  The problem is that the
writer is violating the protocol.  If we put an upper limit on
the number of spin cycles on the reader side, we'll just be creating
a new failure mode when a writer gets swapped out at the wrong moment.

IMO we need to (a) get the failure-prone code out of the critical
section, and then (b) fix the pgstat_increment_changecount macros
so that the critical sections around these shmem changes really are
critical sections (ie bump CritSectionCount).  That way, if somebody
makes the same mistake again, at least there'll be a pretty obvious
failure rather than a lot of stuck readers.

            regards, tom lane



Re: Query on pg_stat_activity table got stuck

От
neeraj kumar
Дата:
We got more information about this issue. There is one backend process still present into beentry which has changecount as odd value. However this process is long gone/terminated. It means when this process was killed/terminated its entry was not cleaned from beentry list. There seems to be some shutdown hook which cleans beentry if process is killed/terminated that somehow was not kicked off ?

These are some of the fields of corrupted beentry entry which is still hanging :
st_changecount = 1407, st_procpid = 0, st_backendType = B_BACKEND, st_proc_start_timestamp = 610236763633421, st_xact_start_timestamp = 0, st_clienthostname = 0x9000023d480 "", st_ssl = 1 '\001', st_sslstatus = 0x90000c60f80, st_state = STATE_IDLEINTRANSACTION_ABORTED,




On Thu, May 9, 2019 at 1:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeremy Schneider <schnjere@amazon.com> writes:
> Seems to me that at a minimum, this loop shouldn't go on forever. Even
> having an arbitrary, crazy high, hard-coded number of attempts before
> failure (like a million) would be better than spinning on the CPU
> forever - which is what we are seeing.

I don't think it's the readers' fault.  The problem is that the
writer is violating the protocol.  If we put an upper limit on
the number of spin cycles on the reader side, we'll just be creating
a new failure mode when a writer gets swapped out at the wrong moment.

IMO we need to (a) get the failure-prone code out of the critical
section, and then (b) fix the pgstat_increment_changecount macros
so that the critical sections around these shmem changes really are
critical sections (ie bump CritSectionCount).  That way, if somebody
makes the same mistake again, at least there'll be a pretty obvious
failure rather than a lot of stuck readers.

                        regards, tom lane


--
-------------------------------------
Thanks
Neeraj Kumar,
+1  (206) 427-7267

Re: Query on pg_stat_activity table got stuck

От
neeraj kumar
Дата:
We got more information about this issue. There is one backend process still present into beentry which has changecount as odd value. However this process is long gone/terminated. It means when this process was killed/terminated its entry was not cleaned from beentry list. There seems to be some shutdown hook which cleans beentry if process is killed/terminated that somehow was not kicked off ?

These are some of the fields of corrupted beentry entry which is still hanging :
st_changecount = 1407, st_procpid = 0, st_backendType = B_BACKEND, st_proc_start_timestamp = 610236763633421, st_xact_start_timestamp = 0, st_clienthostname = 0x9000023d480 "", st_ssl = 1 '\001', st_sslstatus = 0x90000c60f80, st_state = STATE_IDLEINTRANSACTION_ABORTED,




On Thu, May 9, 2019 at 1:00 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeremy Schneider <schnjere@amazon.com> writes:
> Seems to me that at a minimum, this loop shouldn't go on forever. Even
> having an arbitrary, crazy high, hard-coded number of attempts before
> failure (like a million) would be better than spinning on the CPU
> forever - which is what we are seeing.

I don't think it's the readers' fault.  The problem is that the
writer is violating the protocol.  If we put an upper limit on
the number of spin cycles on the reader side, we'll just be creating
a new failure mode when a writer gets swapped out at the wrong moment.

IMO we need to (a) get the failure-prone code out of the critical
section, and then (b) fix the pgstat_increment_changecount macros
so that the critical sections around these shmem changes really are
critical sections (ie bump CritSectionCount).  That way, if somebody
makes the same mistake again, at least there'll be a pretty obvious
failure rather than a lot of stuck readers.

                        regards, tom lane


--
-------------------------------------
Thanks
Neeraj Kumar,
+1  (206) 427-7267

Re: Query on pg_stat_activity table got stuck

От
Tom Lane
Дата:
neeraj kumar <neeru.cse@gmail.com> writes:
> We got more information about this issue. There is one backend process
> still present into beentry which has changecount as odd value. However this
> process is long gone/terminated. It means when this process was
> killed/terminated its entry was not cleaned from beentry list. There seems
> to be some shutdown hook which cleans beentry if process is
> killed/terminated that somehow was not kicked off ?

No, that's the point: there isn't any such cleanup code.  The design
idea here was that we'd only have short straight-line code segments
with no possible way to throw an error while changecount is odd.
Then somebody abused the heck out of that ...

            regards, tom lane



Re: Query on pg_stat_activity table got stuck

От
Tom Lane
Дата:
neeraj kumar <neeru.cse@gmail.com> writes:
> We got more information about this issue. There is one backend process
> still present into beentry which has changecount as odd value. However this
> process is long gone/terminated. It means when this process was
> killed/terminated its entry was not cleaned from beentry list. There seems
> to be some shutdown hook which cleans beentry if process is
> killed/terminated that somehow was not kicked off ?

No, that's the point: there isn't any such cleanup code.  The design
idea here was that we'd only have short straight-line code segments
with no possible way to throw an error while changecount is odd.
Then somebody abused the heck out of that ...

            regards, tom lane



Re: Query on pg_stat_activity table got stuck

От
neeraj kumar
Дата:
Tom, may be I didn't make my point clear.
There are two issues :
1) Why this value was left as odd
2) Why backend entry is still pending in beentry for backend process even after it was killed/terminated.

I am talking about 2nd issue. My understanding is query on pg_stat_activity goes via all backend entries via beentry and it finds this wrong/corrupted entry. When a process terminates, ideally this backend entry into beentery should have also been cleaned. But why this still there? Whose responsibility it is to remove entry from beentry when process terminates ? May be that part is broken?

Ideally this issue should have stayed till this process was active. If entry from beentry would have been removed after process was killed, system should have auto recovered.



On Thu, May 9, 2019 at 8:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
neeraj kumar <neeru.cse@gmail.com> writes:
> We got more information about this issue. There is one backend process
> still present into beentry which has changecount as odd value. However this
> process is long gone/terminated. It means when this process was
> killed/terminated its entry was not cleaned from beentry list. There seems
> to be some shutdown hook which cleans beentry if process is
> killed/terminated that somehow was not kicked off ?

No, that's the point: there isn't any such cleanup code.  The design
idea here was that we'd only have short straight-line code segments
with no possible way to throw an error while changecount is odd.
Then somebody abused the heck out of that ...

                        regards, tom lane


--
-------------------------------------
Thanks
Neeraj Kumar,
+1  (206) 427-7267

Re: Query on pg_stat_activity table got stuck

От
neeraj kumar
Дата:
Tom, may be I didn't make my point clear.
There are two issues :
1) Why this value was left as odd
2) Why backend entry is still pending in beentry for backend process even after it was killed/terminated.

I am talking about 2nd issue. My understanding is query on pg_stat_activity goes via all backend entries via beentry and it finds this wrong/corrupted entry. When a process terminates, ideally this backend entry into beentery should have also been cleaned. But why this still there? Whose responsibility it is to remove entry from beentry when process terminates ? May be that part is broken?

Ideally this issue should have stayed till this process was active. If entry from beentry would have been removed after process was killed, system should have auto recovered.



On Thu, May 9, 2019 at 8:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
neeraj kumar <neeru.cse@gmail.com> writes:
> We got more information about this issue. There is one backend process
> still present into beentry which has changecount as odd value. However this
> process is long gone/terminated. It means when this process was
> killed/terminated its entry was not cleaned from beentry list. There seems
> to be some shutdown hook which cleans beentry if process is
> killed/terminated that somehow was not kicked off ?

No, that's the point: there isn't any such cleanup code.  The design
idea here was that we'd only have short straight-line code segments
with no possible way to throw an error while changecount is odd.
Then somebody abused the heck out of that ...

                        regards, tom lane


--
-------------------------------------
Thanks
Neeraj Kumar,
+1  (206) 427-7267

Re: Query on pg_stat_activity table got stuck

От
Tom Lane
Дата:
neeraj kumar <neeru.cse@gmail.com> writes:
> Tom, may be I didn't make my point clear.
> There are two issues :
> 1) Why this value was left as odd

Because a function called by pgstat_bestart threw an error, is what
I'm guessing.

> 2) Why backend entry is still pending in beentry for backend process even
> after it was killed/terminated.
> I am talking about 2nd issue. My understanding is query on pg_stat_activity
> goes via all backend entries via beentry and it finds this wrong/corrupted
> entry. When a process terminates, ideally this backend entry into beentery
> should have also been cleaned. But why this still there? Whose
> responsibility it is to remove entry from beentry when process terminates ?

If you're imagining that something takes an electronic hacksaw to shared
memory and physically removes that array entry, you're mistaken.  It's
still going to be there.  Now, there *is* cleanup code --- this bit of
pgstat_beshutdown_hook is supposed to mark the process's entry as
not-in-use:

    /*
     * Clear my status entry, following the protocol of bumping st_changecount
     * before and after.  We use a volatile pointer here to ensure the
     * compiler doesn't try to get cute.
     */
    pgstat_increment_changecount_before(beentry);

    beentry->st_procpid = 0;    /* mark invalid */

    pgstat_increment_changecount_after(beentry);

However, if something had left st_changecount odd before we got to this,
it'd still be odd afterwards.  Sure, st_procpid is now zero, but that
doesn't help readers because they're not supposed to believe st_procpid
is valid unless the changecount is even.

You could maybe argue that this cleanup code should be trying to ensure
that the changecount is left even, but I don't think that's an appropriate
response.  If it's not even when we get here, we already screwed up very
badly, because we were breaking the protocol for (potentially) as long as
the process has been in existence.  Moreover, if you're worried about
corner cases where we did mess that up, what of corner cases where process
exit fails before getting here?  I think the right answer is to bring the
hammer down as soon as we mess up, which is what the critical-section
mechanism is for.

            regards, tom lane



Re: Query on pg_stat_activity table got stuck

От
Tom Lane
Дата:
neeraj kumar <neeru.cse@gmail.com> writes:
> Tom, may be I didn't make my point clear.
> There are two issues :
> 1) Why this value was left as odd

Because a function called by pgstat_bestart threw an error, is what
I'm guessing.

> 2) Why backend entry is still pending in beentry for backend process even
> after it was killed/terminated.
> I am talking about 2nd issue. My understanding is query on pg_stat_activity
> goes via all backend entries via beentry and it finds this wrong/corrupted
> entry. When a process terminates, ideally this backend entry into beentery
> should have also been cleaned. But why this still there? Whose
> responsibility it is to remove entry from beentry when process terminates ?

If you're imagining that something takes an electronic hacksaw to shared
memory and physically removes that array entry, you're mistaken.  It's
still going to be there.  Now, there *is* cleanup code --- this bit of
pgstat_beshutdown_hook is supposed to mark the process's entry as
not-in-use:

    /*
     * Clear my status entry, following the protocol of bumping st_changecount
     * before and after.  We use a volatile pointer here to ensure the
     * compiler doesn't try to get cute.
     */
    pgstat_increment_changecount_before(beentry);

    beentry->st_procpid = 0;    /* mark invalid */

    pgstat_increment_changecount_after(beentry);

However, if something had left st_changecount odd before we got to this,
it'd still be odd afterwards.  Sure, st_procpid is now zero, but that
doesn't help readers because they're not supposed to believe st_procpid
is valid unless the changecount is even.

You could maybe argue that this cleanup code should be trying to ensure
that the changecount is left even, but I don't think that's an appropriate
response.  If it's not even when we get here, we already screwed up very
badly, because we were breaking the protocol for (potentially) as long as
the process has been in existence.  Moreover, if you're worried about
corner cases where we did mess that up, what of corner cases where process
exit fails before getting here?  I think the right answer is to bring the
hammer down as soon as we mess up, which is what the critical-section
mechanism is for.

            regards, tom lane



Re: Query on pg_stat_activity table got stuck

От
neeraj kumar
Дата:
There are multiple ways see this problem. One way I am seeing is : how system will auto-recover from this particular state.

So ideally if st_procpid is set to zero it means this process is already terminated, however it might be have left some corrupted information in memory. So when other components tries to read beentry, they should also check if st_procpid is already set to zero, if yes it means this process is gone and no need to consider this process any more.

Agree this is solving particular issue about pg_stat_activity however don't see any harm in adding that check.


On Thu, May 9, 2019 at 9:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
neeraj kumar <neeru.cse@gmail.com> writes:
> Tom, may be I didn't make my point clear.
> There are two issues :
> 1) Why this value was left as odd

Because a function called by pgstat_bestart threw an error, is what
I'm guessing.

> 2) Why backend entry is still pending in beentry for backend process even
> after it was killed/terminated.
> I am talking about 2nd issue. My understanding is query on pg_stat_activity
> goes via all backend entries via beentry and it finds this wrong/corrupted
> entry. When a process terminates, ideally this backend entry into beentery
> should have also been cleaned. But why this still there? Whose
> responsibility it is to remove entry from beentry when process terminates ?

If you're imagining that something takes an electronic hacksaw to shared
memory and physically removes that array entry, you're mistaken.  It's
still going to be there.  Now, there *is* cleanup code --- this bit of
pgstat_beshutdown_hook is supposed to mark the process's entry as
not-in-use:

        /*
         * Clear my status entry, following the protocol of bumping st_changecount
         * before and after.  We use a volatile pointer here to ensure the
         * compiler doesn't try to get cute.
         */
        pgstat_increment_changecount_before(beentry);

        beentry->st_procpid = 0;        /* mark invalid */

        pgstat_increment_changecount_after(beentry);

However, if something had left st_changecount odd before we got to this,
it'd still be odd afterwards.  Sure, st_procpid is now zero, but that
doesn't help readers because they're not supposed to believe st_procpid
is valid unless the changecount is even.

You could maybe argue that this cleanup code should be trying to ensure
that the changecount is left even, but I don't think that's an appropriate
response.  If it's not even when we get here, we already screwed up very
badly, because we were breaking the protocol for (potentially) as long as
the process has been in existence.  Moreover, if you're worried about
corner cases where we did mess that up, what of corner cases where process
exit fails before getting here?  I think the right answer is to bring the
hammer down as soon as we mess up, which is what the critical-section
mechanism is for.

                        regards, tom lane


--
-------------------------------------
Thanks
Neeraj Kumar,
+1  (206) 427-7267

Re: Query on pg_stat_activity table got stuck

От
neeraj kumar
Дата:
There are multiple ways see this problem. One way I am seeing is : how system will auto-recover from this particular state.

So ideally if st_procpid is set to zero it means this process is already terminated, however it might be have left some corrupted information in memory. So when other components tries to read beentry, they should also check if st_procpid is already set to zero, if yes it means this process is gone and no need to consider this process any more.

Agree this is solving particular issue about pg_stat_activity however don't see any harm in adding that check.


On Thu, May 9, 2019 at 9:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
neeraj kumar <neeru.cse@gmail.com> writes:
> Tom, may be I didn't make my point clear.
> There are two issues :
> 1) Why this value was left as odd

Because a function called by pgstat_bestart threw an error, is what
I'm guessing.

> 2) Why backend entry is still pending in beentry for backend process even
> after it was killed/terminated.
> I am talking about 2nd issue. My understanding is query on pg_stat_activity
> goes via all backend entries via beentry and it finds this wrong/corrupted
> entry. When a process terminates, ideally this backend entry into beentery
> should have also been cleaned. But why this still there? Whose
> responsibility it is to remove entry from beentry when process terminates ?

If you're imagining that something takes an electronic hacksaw to shared
memory and physically removes that array entry, you're mistaken.  It's
still going to be there.  Now, there *is* cleanup code --- this bit of
pgstat_beshutdown_hook is supposed to mark the process's entry as
not-in-use:

        /*
         * Clear my status entry, following the protocol of bumping st_changecount
         * before and after.  We use a volatile pointer here to ensure the
         * compiler doesn't try to get cute.
         */
        pgstat_increment_changecount_before(beentry);

        beentry->st_procpid = 0;        /* mark invalid */

        pgstat_increment_changecount_after(beentry);

However, if something had left st_changecount odd before we got to this,
it'd still be odd afterwards.  Sure, st_procpid is now zero, but that
doesn't help readers because they're not supposed to believe st_procpid
is valid unless the changecount is even.

You could maybe argue that this cleanup code should be trying to ensure
that the changecount is left even, but I don't think that's an appropriate
response.  If it's not even when we get here, we already screwed up very
badly, because we were breaking the protocol for (potentially) as long as
the process has been in existence.  Moreover, if you're worried about
corner cases where we did mess that up, what of corner cases where process
exit fails before getting here?  I think the right answer is to bring the
hammer down as soon as we mess up, which is what the critical-section
mechanism is for.

                        regards, tom lane


--
-------------------------------------
Thanks
Neeraj Kumar,
+1  (206) 427-7267

Re: Query on pg_stat_activity table got stuck

От
neeraj kumar
Дата:
Also curious why query on pg_stat_activity is considering terminated process ? Irrespective of corrupted state or not, ideally query on pg_stat_activity should ignore already terminated process.
My 2 cents.


On Fri, May 10, 2019 at 8:01 AM neeraj kumar <neeru.cse@gmail.com> wrote:
There are multiple ways see this problem. One way I am seeing is : how system will auto-recover from this particular state.

So ideally if st_procpid is set to zero it means this process is already terminated, however it might be have left some corrupted information in memory. So when other components tries to read beentry, they should also check if st_procpid is already set to zero, if yes it means this process is gone and no need to consider this process any more.

Agree this is solving particular issue about pg_stat_activity however don't see any harm in adding that check.


On Thu, May 9, 2019 at 9:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
neeraj kumar <neeru.cse@gmail.com> writes:
> Tom, may be I didn't make my point clear.
> There are two issues :
> 1) Why this value was left as odd

Because a function called by pgstat_bestart threw an error, is what
I'm guessing.

> 2) Why backend entry is still pending in beentry for backend process even
> after it was killed/terminated.
> I am talking about 2nd issue. My understanding is query on pg_stat_activity
> goes via all backend entries via beentry and it finds this wrong/corrupted
> entry. When a process terminates, ideally this backend entry into beentery
> should have also been cleaned. But why this still there? Whose
> responsibility it is to remove entry from beentry when process terminates ?

If you're imagining that something takes an electronic hacksaw to shared
memory and physically removes that array entry, you're mistaken.  It's
still going to be there.  Now, there *is* cleanup code --- this bit of
pgstat_beshutdown_hook is supposed to mark the process's entry as
not-in-use:

        /*
         * Clear my status entry, following the protocol of bumping st_changecount
         * before and after.  We use a volatile pointer here to ensure the
         * compiler doesn't try to get cute.
         */
        pgstat_increment_changecount_before(beentry);

        beentry->st_procpid = 0;        /* mark invalid */

        pgstat_increment_changecount_after(beentry);

However, if something had left st_changecount odd before we got to this,
it'd still be odd afterwards.  Sure, st_procpid is now zero, but that
doesn't help readers because they're not supposed to believe st_procpid
is valid unless the changecount is even.

You could maybe argue that this cleanup code should be trying to ensure
that the changecount is left even, but I don't think that's an appropriate
response.  If it's not even when we get here, we already screwed up very
badly, because we were breaking the protocol for (potentially) as long as
the process has been in existence.  Moreover, if you're worried about
corner cases where we did mess that up, what of corner cases where process
exit fails before getting here?  I think the right answer is to bring the
hammer down as soon as we mess up, which is what the critical-section
mechanism is for.

                        regards, tom lane


--
-------------------------------------
Thanks
Neeraj Kumar,
+1  (206) 427-7267


--
-------------------------------------
Thanks
Neeraj Kumar,
+1  (206) 427-7267

Re: Query on pg_stat_activity table got stuck

От
neeraj kumar
Дата:
Also curious why query on pg_stat_activity is considering terminated process ? Irrespective of corrupted state or not, ideally query on pg_stat_activity should ignore already terminated process.
My 2 cents.


On Fri, May 10, 2019 at 8:01 AM neeraj kumar <neeru.cse@gmail.com> wrote:
There are multiple ways see this problem. One way I am seeing is : how system will auto-recover from this particular state.

So ideally if st_procpid is set to zero it means this process is already terminated, however it might be have left some corrupted information in memory. So when other components tries to read beentry, they should also check if st_procpid is already set to zero, if yes it means this process is gone and no need to consider this process any more.

Agree this is solving particular issue about pg_stat_activity however don't see any harm in adding that check.


On Thu, May 9, 2019 at 9:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
neeraj kumar <neeru.cse@gmail.com> writes:
> Tom, may be I didn't make my point clear.
> There are two issues :
> 1) Why this value was left as odd

Because a function called by pgstat_bestart threw an error, is what
I'm guessing.

> 2) Why backend entry is still pending in beentry for backend process even
> after it was killed/terminated.
> I am talking about 2nd issue. My understanding is query on pg_stat_activity
> goes via all backend entries via beentry and it finds this wrong/corrupted
> entry. When a process terminates, ideally this backend entry into beentery
> should have also been cleaned. But why this still there? Whose
> responsibility it is to remove entry from beentry when process terminates ?

If you're imagining that something takes an electronic hacksaw to shared
memory and physically removes that array entry, you're mistaken.  It's
still going to be there.  Now, there *is* cleanup code --- this bit of
pgstat_beshutdown_hook is supposed to mark the process's entry as
not-in-use:

        /*
         * Clear my status entry, following the protocol of bumping st_changecount
         * before and after.  We use a volatile pointer here to ensure the
         * compiler doesn't try to get cute.
         */
        pgstat_increment_changecount_before(beentry);

        beentry->st_procpid = 0;        /* mark invalid */

        pgstat_increment_changecount_after(beentry);

However, if something had left st_changecount odd before we got to this,
it'd still be odd afterwards.  Sure, st_procpid is now zero, but that
doesn't help readers because they're not supposed to believe st_procpid
is valid unless the changecount is even.

You could maybe argue that this cleanup code should be trying to ensure
that the changecount is left even, but I don't think that's an appropriate
response.  If it's not even when we get here, we already screwed up very
badly, because we were breaking the protocol for (potentially) as long as
the process has been in existence.  Moreover, if you're worried about
corner cases where we did mess that up, what of corner cases where process
exit fails before getting here?  I think the right answer is to bring the
hammer down as soon as we mess up, which is what the critical-section
mechanism is for.

                        regards, tom lane


--
-------------------------------------
Thanks
Neeraj Kumar,
+1  (206) 427-7267


--
-------------------------------------
Thanks
Neeraj Kumar,
+1  (206) 427-7267

Re: Query on pg_stat_activity table got stuck

От
Tom Lane
Дата:
neeraj kumar <neeru.cse@gmail.com> writes:
> Also curious why query on pg_stat_activity is considering terminated
> process ?

The short answer to that is that this bug leaves shared memory in
a corrupt state.  It's not really useful to worry about whether
readers should react differently to that --- you could spend endless
amounts of time hypothesizing different kinds of corruption and
endless amounts of code/cycles trying to work around it.  But the
right answer is to fix it on the writing side.

            regards, tom lane



Re: Query on pg_stat_activity table got stuck

От
Tom Lane
Дата:
neeraj kumar <neeru.cse@gmail.com> writes:
> Also curious why query on pg_stat_activity is considering terminated
> process ?

The short answer to that is that this bug leaves shared memory in
a corrupt state.  It's not really useful to worry about whether
readers should react differently to that --- you could spend endless
amounts of time hypothesizing different kinds of corruption and
endless amounts of code/cycles trying to work around it.  But the
right answer is to fix it on the writing side.

            regards, tom lane



Re: Query on pg_stat_activity table got stuck

От
neeraj kumar
Дата:
--> But the right answer is to fix it on the writing side.
Yes I agree with this part. Even though there is very low probability, a process can still be killed in middle when writing. So what is your suggestion on how to recover from this automatically?

On Fri, May 10, 2019 at 9:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
neeraj kumar <neeru.cse@gmail.com> writes:
> Also curious why query on pg_stat_activity is considering terminated
> process ?

The short answer to that is that this bug leaves shared memory in
a corrupt state.  It's not really useful to worry about whether
readers should react differently to that --- you could spend endless
amounts of time hypothesizing different kinds of corruption and
endless amounts of code/cycles trying to work around it.  But the
right answer is to fix it on the writing side.

                        regards, tom lane


--
-------------------------------------
Thanks
Neeraj Kumar,
+1  (206) 427-7267

Re: Query on pg_stat_activity table got stuck

От
neeraj kumar
Дата:
--> But the right answer is to fix it on the writing side.
Yes I agree with this part. Even though there is very low probability, a process can still be killed in middle when writing. So what is your suggestion on how to recover from this automatically?

On Fri, May 10, 2019 at 9:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
neeraj kumar <neeru.cse@gmail.com> writes:
> Also curious why query on pg_stat_activity is considering terminated
> process ?

The short answer to that is that this bug leaves shared memory in
a corrupt state.  It's not really useful to worry about whether
readers should react differently to that --- you could spend endless
amounts of time hypothesizing different kinds of corruption and
endless amounts of code/cycles trying to work around it.  But the
right answer is to fix it on the writing side.

                        regards, tom lane


--
-------------------------------------
Thanks
Neeraj Kumar,
+1  (206) 427-7267

Re: Query on pg_stat_activity table got stuck

От
Tom Lane
Дата:
neeraj kumar <neeru.cse@gmail.com> writes:
> --> But the right answer is to fix it on the writing side.

> Yes I agree with this part. Even though there is very low probability, a
> process can still be killed in middle when writing. So what is your
> suggestion on how to recover from this automatically?

Here's a draft patch.  This makes more cosmetic changes than are strictly
necessary to fix the bug, but I'm concerned about preventing future
mistakes of the same sort, not just dealing with the immediate bug.

Anyway the idea here is (1) reduce the size of the critical section in
pgstat_bestart where the changecount is odd, so that nothing except
data copying happens there, and (2) adjust the macros for writing
st_changecount so that those sections actually are critical sections.
That means that if something throws an error inside those sections,
it'll turn into a PANIC and database restart.  That shouldn't ever happen,
of course, but if it does then a PANIC is better than a frozen system.
I renamed the macros for writing st_changecount to make it more apparent
that those are now critical-section boundaries.

The out-of-line string fields make the pgstat_bestart code more ticklish
than one could wish, since they have to be treated differently from
in-line fields.  But I'm not sure there's much we can do to avoid that.
It doesn't seem like a good idea to change the layout of the shared-
memory structure, at least not in released branches.

I found two other embarrassing bugs while I was at it.  The stanza
initializing st_backendType was clearly inserted with the aid of a
dartboard, because it was actually before the initial st_changecount bump.
That probably has little if any real-world impact, but it's still an
indication of sloppy patching.  And pgstat_read_current_status had not
been taught to copy the st_gssstatus out-of-line structure to local
storage, so that those values might fail to hold still while a
transaction examines the pg_stat_activity data.  That *is* a live bug.

This patch is against HEAD --- I've not looked at how much adjustment
it'll need for the back branches, but I'm sure there's some.

            regards, tom lane

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 64986b1..25a1bd9 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -2849,66 +2849,79 @@ pgstat_initialize(void)
  *    Apart from auxiliary processes, MyBackendId, MyDatabaseId,
  *    session userid, and application_name must be set for a
  *    backend (hence, this cannot be combined with pgstat_initialize).
+ *    Note also that we must be inside a transaction.
  * ----------
  */
 void
 pgstat_bestart(void)
 {
-    SockAddr    clientaddr;
-    volatile PgBackendStatus *beentry;
+    volatile PgBackendStatus *vbeentry = MyBEEntry;
+    PgBackendStatus lbeentry;
+#ifdef USE_SSL
+    PgBackendSSLStatus lsslstatus;
+#endif
+#ifdef ENABLE_GSS
+    PgBackendGSSStatus lgssstatus;
+#endif

-    /*
-     * To minimize the time spent modifying the PgBackendStatus entry, fetch
-     * all the needed data first.
-     */
+    /* pgstats state must be initialized from pgstat_initialize() */
+    Assert(vbeentry != NULL);

     /*
-     * We may not have a MyProcPort (eg, if this is the autovacuum process).
-     * If so, use all-zeroes client address, which is dealt with specially in
-     * pg_stat_get_backend_client_addr and pg_stat_get_backend_client_port.
+     * To minimize the time spent modifying the PgBackendStatus entry, and
+     * avoid risk of errors inside the critical section, we first copy the
+     * shared-memory struct to a local variable, then modify the data in the
+     * local variable, then copy the local variable back to shared memory.
+     * Only the last step has to be inside the critical section.
+     *
+     * Most of the data we copy here is just going to be overwritten, but the
+     * struct's not so large that it's worth the maintenance hassle to copy
+     * only the needful fields.
      */
-    if (MyProcPort)
-        memcpy(&clientaddr, &MyProcPort->raddr, sizeof(clientaddr));
-    else
-        MemSet(&clientaddr, 0, sizeof(clientaddr));
+    memcpy(&lbeentry,
+           unvolatize(PgBackendStatus *, vbeentry),
+           sizeof(PgBackendStatus));
+
+    /* These structs can just start from zeroes each time, though */
+#ifdef USE_SSL
+    memset(&lsslstatus, 0, sizeof(lsslstatus));
+#endif
+#ifdef ENABLE_GSS
+    memset(&lgssstatus, 0, sizeof(lgssstatus));
+#endif

     /*
-     * Initialize my status entry, following the protocol of bumping
-     * st_changecount before and after; and make sure it's even afterwards. We
-     * use a volatile pointer here to ensure the compiler doesn't try to get
-     * cute.
+     * Now fill in all the fields of lbeentry, except strings that are
+     * out-of-line data.
      */
-    beentry = MyBEEntry;
-
-    /* pgstats state must be initialized from pgstat_initialize() */
-    Assert(beentry != NULL);
+    lbeentry.st_procpid = MyProcPid;

     if (MyBackendId != InvalidBackendId)
     {
         if (IsAutoVacuumLauncherProcess())
         {
             /* Autovacuum Launcher */
-            beentry->st_backendType = B_AUTOVAC_LAUNCHER;
+            lbeentry.st_backendType = B_AUTOVAC_LAUNCHER;
         }
         else if (IsAutoVacuumWorkerProcess())
         {
             /* Autovacuum Worker */
-            beentry->st_backendType = B_AUTOVAC_WORKER;
+            lbeentry.st_backendType = B_AUTOVAC_WORKER;
         }
         else if (am_walsender)
         {
             /* Wal sender */
-            beentry->st_backendType = B_WAL_SENDER;
+            lbeentry.st_backendType = B_WAL_SENDER;
         }
         else if (IsBackgroundWorker)
         {
             /* bgworker */
-            beentry->st_backendType = B_BG_WORKER;
+            lbeentry.st_backendType = B_BG_WORKER;
         }
         else
         {
             /* client-backend */
-            beentry->st_backendType = B_BACKEND;
+            lbeentry.st_backendType = B_BACKEND;
         }
     }
     else
@@ -2918,99 +2931,92 @@ pgstat_bestart(void)
         switch (MyAuxProcType)
         {
             case StartupProcess:
-                beentry->st_backendType = B_STARTUP;
+                lbeentry.st_backendType = B_STARTUP;
                 break;
             case BgWriterProcess:
-                beentry->st_backendType = B_BG_WRITER;
+                lbeentry.st_backendType = B_BG_WRITER;
                 break;
             case CheckpointerProcess:
-                beentry->st_backendType = B_CHECKPOINTER;
+                lbeentry.st_backendType = B_CHECKPOINTER;
                 break;
             case WalWriterProcess:
-                beentry->st_backendType = B_WAL_WRITER;
+                lbeentry.st_backendType = B_WAL_WRITER;
                 break;
             case WalReceiverProcess:
-                beentry->st_backendType = B_WAL_RECEIVER;
+                lbeentry.st_backendType = B_WAL_RECEIVER;
                 break;
             default:
                 elog(FATAL, "unrecognized process type: %d",
                      (int) MyAuxProcType);
-                proc_exit(1);
         }
     }

-    do
-    {
-        pgstat_increment_changecount_before(beentry);
-    } while ((beentry->st_changecount & 1) == 0);
-
-    beentry->st_procpid = MyProcPid;
-    beentry->st_proc_start_timestamp = MyStartTimestamp;
-    beentry->st_activity_start_timestamp = 0;
-    beentry->st_state_start_timestamp = 0;
-    beentry->st_xact_start_timestamp = 0;
-    beentry->st_databaseid = MyDatabaseId;
+    lbeentry.st_proc_start_timestamp = MyStartTimestamp;
+    lbeentry.st_activity_start_timestamp = 0;
+    lbeentry.st_state_start_timestamp = 0;
+    lbeentry.st_xact_start_timestamp = 0;
+    lbeentry.st_databaseid = MyDatabaseId;

     /* We have userid for client-backends, wal-sender and bgworker processes */
-    if (beentry->st_backendType == B_BACKEND
-        || beentry->st_backendType == B_WAL_SENDER
-        || beentry->st_backendType == B_BG_WORKER)
-        beentry->st_userid = GetSessionUserId();
+    if (lbeentry.st_backendType == B_BACKEND
+        || lbeentry.st_backendType == B_WAL_SENDER
+        || lbeentry.st_backendType == B_BG_WORKER)
+        lbeentry.st_userid = GetSessionUserId();
     else
-        beentry->st_userid = InvalidOid;
+        lbeentry.st_userid = InvalidOid;

-    beentry->st_clientaddr = clientaddr;
-    if (MyProcPort && MyProcPort->remote_hostname)
-        strlcpy(beentry->st_clienthostname, MyProcPort->remote_hostname,
-                NAMEDATALEN);
+    /*
+     * We may not have a MyProcPort (eg, if this is the autovacuum process).
+     * If so, use all-zeroes client address, which is dealt with specially in
+     * pg_stat_get_backend_client_addr and pg_stat_get_backend_client_port.
+     */
+    if (MyProcPort)
+        memcpy(&lbeentry.st_clientaddr, &MyProcPort->raddr,
+               sizeof(lbeentry.st_clientaddr));
     else
-        beentry->st_clienthostname[0] = '\0';
+        MemSet(&lbeentry.st_clientaddr, 0, sizeof(lbeentry.st_clientaddr));
+
 #ifdef USE_SSL
     if (MyProcPort && MyProcPort->ssl != NULL)
     {
-        beentry->st_ssl = true;
-        beentry->st_sslstatus->ssl_bits = be_tls_get_cipher_bits(MyProcPort);
-        beentry->st_sslstatus->ssl_compression = be_tls_get_compression(MyProcPort);
-        strlcpy(beentry->st_sslstatus->ssl_version, be_tls_get_version(MyProcPort), NAMEDATALEN);
-        strlcpy(beentry->st_sslstatus->ssl_cipher, be_tls_get_cipher(MyProcPort), NAMEDATALEN);
-        be_tls_get_peer_subject_name(MyProcPort, beentry->st_sslstatus->ssl_client_dn, NAMEDATALEN);
-        be_tls_get_peer_serial(MyProcPort, beentry->st_sslstatus->ssl_client_serial, NAMEDATALEN);
-        be_tls_get_peer_issuer_name(MyProcPort, beentry->st_sslstatus->ssl_issuer_dn, NAMEDATALEN);
+        lbeentry.st_ssl = true;
+        lsslstatus.ssl_bits = be_tls_get_cipher_bits(MyProcPort);
+        lsslstatus.ssl_compression = be_tls_get_compression(MyProcPort);
+        strlcpy(lsslstatus.ssl_version, be_tls_get_version(MyProcPort), NAMEDATALEN);
+        strlcpy(lsslstatus.ssl_cipher, be_tls_get_cipher(MyProcPort), NAMEDATALEN);
+        be_tls_get_peer_subject_name(MyProcPort, lsslstatus.ssl_client_dn, NAMEDATALEN);
+        be_tls_get_peer_serial(MyProcPort, lsslstatus.ssl_client_serial, NAMEDATALEN);
+        be_tls_get_peer_issuer_name(MyProcPort, lsslstatus.ssl_issuer_dn, NAMEDATALEN);
     }
     else
     {
-        beentry->st_ssl = false;
+        lbeentry.st_ssl = false;
     }
 #else
-    beentry->st_ssl = false;
+    lbeentry.st_ssl = false;
 #endif

 #ifdef ENABLE_GSS
     if (MyProcPort && MyProcPort->gss != NULL)
     {
-        beentry->st_gss = true;
-        beentry->st_gssstatus->gss_auth = be_gssapi_get_auth(MyProcPort);
-        beentry->st_gssstatus->gss_enc = be_gssapi_get_enc(MyProcPort);
+        lbeentry.st_gss = true;
+        lgssstatus.gss_auth = be_gssapi_get_auth(MyProcPort);
+        lgssstatus.gss_enc = be_gssapi_get_enc(MyProcPort);

-        if (beentry->st_gssstatus->gss_auth)
-            strlcpy(beentry->st_gssstatus->gss_princ, be_gssapi_get_princ(MyProcPort), NAMEDATALEN);
+        if (lgssstatus.gss_auth)
+            strlcpy(lgssstatus.gss_princ, be_gssapi_get_princ(MyProcPort), NAMEDATALEN);
     }
     else
     {
-        beentry->st_gss = false;
+        lbeentry.st_gss = false;
     }
 #else
-    beentry->st_gss = false;
+    lbeentry.st_gss = false;
 #endif
-    beentry->st_state = STATE_UNDEFINED;
-    beentry->st_appname[0] = '\0';
-    beentry->st_activity_raw[0] = '\0';
-    /* Also make sure the last byte in each string area is always 0 */
-    beentry->st_clienthostname[NAMEDATALEN - 1] = '\0';
-    beentry->st_appname[NAMEDATALEN - 1] = '\0';
-    beentry->st_activity_raw[pgstat_track_activity_query_size - 1] = '\0';
-    beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
-    beentry->st_progress_command_target = InvalidOid;
+
+    lbeentry.st_state = STATE_UNDEFINED;
+    lbeentry.st_progress_command = PROGRESS_COMMAND_INVALID;
+    lbeentry.st_progress_command_target = InvalidOid;

     /*
      * we don't zero st_progress_param here to save cycles; nobody should
@@ -3018,7 +3024,45 @@ pgstat_bestart(void)
      * than PROGRESS_COMMAND_INVALID
      */

-    pgstat_increment_changecount_after(beentry);
+    /*
+     * Finally, we can enter the critical section that fills the shared-memory
+     * status entry.  We follow the protocol of bumping st_changecount before
+     * and after; and make sure it's even afterwards.  We use a volatile
+     * pointer here to ensure the compiler doesn't try to get cute.
+     */
+    PGSTAT_BEGIN_WRITE_ACTIVITY(vbeentry);
+
+    /* make sure we'll memcpy the same st_changecount back */
+    lbeentry.st_changecount = vbeentry->st_changecount;
+
+    memcpy(unvolatize(PgBackendStatus *, vbeentry),
+           &lbeentry,
+           sizeof(PgBackendStatus));
+
+    /*
+     * We can write the out-of-line strings and structs using the pointers
+     * that are in lbeentry; this saves some de-volatilizing messiness.
+     */
+    lbeentry.st_appname[0] = '\0';
+    if (MyProcPort && MyProcPort->remote_hostname)
+        strlcpy(lbeentry.st_clienthostname, MyProcPort->remote_hostname,
+                NAMEDATALEN);
+    else
+        lbeentry.st_clienthostname[0] = '\0';
+    lbeentry.st_activity_raw[0] = '\0';
+    /* Also make sure the last byte in each string area is always 0 */
+    lbeentry.st_appname[NAMEDATALEN - 1] = '\0';
+    lbeentry.st_clienthostname[NAMEDATALEN - 1] = '\0';
+    lbeentry.st_activity_raw[pgstat_track_activity_query_size - 1] = '\0';
+
+#ifdef USE_SSL
+    memcpy(lbeentry.st_sslstatus, &lsslstatus, sizeof(PgBackendSSLStatus));
+#endif
+#ifdef ENABLE_GSS
+    memcpy(lbeentry.st_gssstatus, &lgssstatus, sizeof(PgBackendGSSStatus));
+#endif
+
+    PGSTAT_END_WRITE_ACTIVITY(vbeentry);

     /* Update app name to current GUC setting */
     if (application_name)
@@ -3053,11 +3097,11 @@ pgstat_beshutdown_hook(int code, Datum arg)
      * before and after.  We use a volatile pointer here to ensure the
      * compiler doesn't try to get cute.
      */
-    pgstat_increment_changecount_before(beentry);
+    PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);

     beentry->st_procpid = 0;    /* mark invalid */

-    pgstat_increment_changecount_after(beentry);
+    PGSTAT_END_WRITE_ACTIVITY(beentry);
 }


@@ -3096,7 +3140,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
              * non-disabled state.  As our final update, change the state and
              * clear fields we will not be updating anymore.
              */
-            pgstat_increment_changecount_before(beentry);
+            PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
             beentry->st_state = STATE_DISABLED;
             beentry->st_state_start_timestamp = 0;
             beentry->st_activity_raw[0] = '\0';
@@ -3104,14 +3148,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
             /* st_xact_start_timestamp and wait_event_info are also disabled */
             beentry->st_xact_start_timestamp = 0;
             proc->wait_event_info = 0;
-            pgstat_increment_changecount_after(beentry);
+            PGSTAT_END_WRITE_ACTIVITY(beentry);
         }
         return;
     }

     /*
-     * To minimize the time spent modifying the entry, fetch all the needed
-     * data first.
+     * To minimize the time spent modifying the entry, and avoid risk of
+     * errors inside the critical section, fetch all the needed data first.
      */
     start_timestamp = GetCurrentStatementStartTimestamp();
     if (cmd_str != NULL)
@@ -3128,7 +3172,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
     /*
      * Now update the status entry
      */
-    pgstat_increment_changecount_before(beentry);
+    PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);

     beentry->st_state = state;
     beentry->st_state_start_timestamp = current_timestamp;
@@ -3140,7 +3184,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
         beentry->st_activity_start_timestamp = start_timestamp;
     }

-    pgstat_increment_changecount_after(beentry);
+    PGSTAT_END_WRITE_ACTIVITY(beentry);
 }

 /*-----------
@@ -3158,11 +3202,11 @@ pgstat_progress_start_command(ProgressCommandType cmdtype, Oid relid)
     if (!beentry || !pgstat_track_activities)
         return;

-    pgstat_increment_changecount_before(beentry);
+    PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
     beentry->st_progress_command = cmdtype;
     beentry->st_progress_command_target = relid;
     MemSet(&beentry->st_progress_param, 0, sizeof(beentry->st_progress_param));
-    pgstat_increment_changecount_after(beentry);
+    PGSTAT_END_WRITE_ACTIVITY(beentry);
 }

 /*-----------
@@ -3181,9 +3225,9 @@ pgstat_progress_update_param(int index, int64 val)
     if (!beentry || !pgstat_track_activities)
         return;

-    pgstat_increment_changecount_before(beentry);
+    PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
     beentry->st_progress_param[index] = val;
-    pgstat_increment_changecount_after(beentry);
+    PGSTAT_END_WRITE_ACTIVITY(beentry);
 }

 /*-----------
@@ -3203,7 +3247,7 @@ pgstat_progress_update_multi_param(int nparam, const int *index,
     if (!beentry || !pgstat_track_activities || nparam == 0)
         return;

-    pgstat_increment_changecount_before(beentry);
+    PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);

     for (i = 0; i < nparam; ++i)
     {
@@ -3212,7 +3256,7 @@ pgstat_progress_update_multi_param(int nparam, const int *index,
         beentry->st_progress_param[index[i]] = val[i];
     }

-    pgstat_increment_changecount_after(beentry);
+    PGSTAT_END_WRITE_ACTIVITY(beentry);
 }

 /*-----------
@@ -3233,10 +3277,10 @@ pgstat_progress_end_command(void)
         && beentry->st_progress_command == PROGRESS_COMMAND_INVALID)
         return;

-    pgstat_increment_changecount_before(beentry);
+    PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
     beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
     beentry->st_progress_command_target = InvalidOid;
-    pgstat_increment_changecount_after(beentry);
+    PGSTAT_END_WRITE_ACTIVITY(beentry);
 }

 /* ----------
@@ -3262,12 +3306,12 @@ pgstat_report_appname(const char *appname)
      * st_changecount before and after.  We use a volatile pointer here to
      * ensure the compiler doesn't try to get cute.
      */
-    pgstat_increment_changecount_before(beentry);
+    PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);

     memcpy((char *) beentry->st_appname, appname, len);
     beentry->st_appname[len] = '\0';

-    pgstat_increment_changecount_after(beentry);
+    PGSTAT_END_WRITE_ACTIVITY(beentry);
 }

 /*
@@ -3287,9 +3331,11 @@ pgstat_report_xact_timestamp(TimestampTz tstamp)
      * st_changecount before and after.  We use a volatile pointer here to
      * ensure the compiler doesn't try to get cute.
      */
-    pgstat_increment_changecount_before(beentry);
+    PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+
     beentry->st_xact_start_timestamp = tstamp;
-    pgstat_increment_changecount_after(beentry);
+
+    PGSTAT_END_WRITE_ACTIVITY(beentry);
 }

 /* ----------
@@ -3311,6 +3357,9 @@ pgstat_read_current_status(void)
 #ifdef USE_SSL
     PgBackendSSLStatus *localsslstatus;
 #endif
+#ifdef ENABLE_GSS
+    PgBackendGSSStatus *localgssstatus;
+#endif
     int            i;

     Assert(!pgStatRunningInCollector);
@@ -3344,6 +3393,11 @@ pgstat_read_current_status(void)
         MemoryContextAlloc(pgStatLocalContext,
                            sizeof(PgBackendSSLStatus) * NumBackendStatSlots);
 #endif
+#ifdef ENABLE_GSS
+    localgssstatus = (PgBackendGSSStatus *)
+        MemoryContextAlloc(pgStatLocalContext,
+                           sizeof(PgBackendGSSStatus) * NumBackendStatSlots);
+#endif

     localNumBackends = 0;

@@ -3363,14 +3417,19 @@ pgstat_read_current_status(void)
             int            before_changecount;
             int            after_changecount;

-            pgstat_save_changecount_before(beentry, before_changecount);
+            pgstat_begin_read_activity(beentry, before_changecount);

             localentry->backendStatus.st_procpid = beentry->st_procpid;
+            /* Skip all the data-copying work if entry is not in use */
             if (localentry->backendStatus.st_procpid > 0)
             {
                 memcpy(&localentry->backendStatus, unvolatize(PgBackendStatus *, beentry), sizeof(PgBackendStatus));

                 /*
+                 * For each PgBackendStatus field that is a pointer, copy the
+                 * pointed-to data, then adjust the local copy of the pointer
+                 * field to point at the local copy of the data.
+                 *
                  * strcpy is safe even if the string is modified concurrently,
                  * because there's always a \0 at the end of the buffer.
                  */
@@ -3380,7 +3439,6 @@ pgstat_read_current_status(void)
                 localentry->backendStatus.st_clienthostname = localclienthostname;
                 strcpy(localactivity, (char *) beentry->st_activity_raw);
                 localentry->backendStatus.st_activity_raw = localactivity;
-                localentry->backendStatus.st_ssl = beentry->st_ssl;
 #ifdef USE_SSL
                 if (beentry->st_ssl)
                 {
@@ -3388,11 +3446,19 @@ pgstat_read_current_status(void)
                     localentry->backendStatus.st_sslstatus = localsslstatus;
                 }
 #endif
+#ifdef ENABLE_GSS
+                if (beentry->st_gss)
+                {
+                    memcpy(localgssstatus, beentry->st_gssstatus, sizeof(PgBackendGSSStatus));
+                    localentry->backendStatus.st_gssstatus = localgssstatus;
+                }
+#endif
             }

-            pgstat_save_changecount_after(beentry, after_changecount);
-            if (before_changecount == after_changecount &&
-                (before_changecount & 1) == 0)
+            pgstat_end_read_activity(beentry, after_changecount);
+
+            if (pgstat_read_activity_complete(before_changecount,
+                                              after_changecount))
                 break;

             /* Make sure we can break out of loop if stuck... */
@@ -3414,6 +3480,9 @@ pgstat_read_current_status(void)
 #ifdef USE_SSL
             localsslstatus++;
 #endif
+#ifdef ENABLE_GSS
+            localgssstatus++;
+#endif
             localNumBackends++;
         }
     }
@@ -4090,14 +4159,14 @@ pgstat_get_backend_current_activity(int pid, bool checkUser)
             int            before_changecount;
             int            after_changecount;

-            pgstat_save_changecount_before(vbeentry, before_changecount);
+            pgstat_begin_read_activity(vbeentry, before_changecount);

             found = (vbeentry->st_procpid == pid);

-            pgstat_save_changecount_after(vbeentry, after_changecount);
+            pgstat_end_read_activity(vbeentry, after_changecount);

-            if (before_changecount == after_changecount &&
-                (before_changecount & 1) == 0)
+            if (pgstat_read_activity_complete(before_changecount,
+                                              after_changecount))
                 break;

             /* Make sure we can break out of loop if stuck... */
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 9fbc492..9299871 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1035,11 +1035,12 @@ typedef struct PgBackendStatus
      * the copy is valid; otherwise start over.  This makes updates cheap
      * while reads are potentially expensive, but that's the tradeoff we want.
      *
-     * The above protocol needs the memory barriers to ensure that the
-     * apparent order of execution is as it desires. Otherwise, for example,
-     * the CPU might rearrange the code so that st_changecount is incremented
-     * twice before the modification on a machine with weak memory ordering.
-     * This surprising result can lead to bugs.
+     * The above protocol needs memory barriers to ensure that the apparent
+     * order of execution is as it desires.  Otherwise, for example, the CPU
+     * might rearrange the code so that st_changecount is incremented twice
+     * before the modification on a machine with weak memory ordering.  Hence,
+     * use the macros defined below for manipulating st_changecount, rather
+     * than touching it directly.
      */
     int            st_changecount;

@@ -1099,42 +1100,66 @@ typedef struct PgBackendStatus
 } PgBackendStatus;

 /*
- * Macros to load and store st_changecount with the memory barriers.
+ * Macros to load and store st_changecount with appropriate memory barriers.
  *
- * pgstat_increment_changecount_before() and
- * pgstat_increment_changecount_after() need to be called before and after
- * PgBackendStatus entries are modified, respectively. This makes sure that
- * st_changecount is incremented around the modification.
+ * Use PGSTAT_BEGIN_WRITE_ACTIVITY() before, and PGSTAT_END_WRITE_ACTIVITY()
+ * after, modifying the current process's PgBackendStatus data.  Note that,
+ * since there is no mechanism for cleaning up st_changecount after an error,
+ * THESE MACROS FORM A CRITICAL SECTION.  Any error between them will be
+ * promoted to PANIC, causing a database restart to clean up shared memory!
+ * Hence, keep the critical section as short and straight-line as possible.
+ * Aside from being safer, that minimizes the window in which readers will
+ * have to loop.
  *
- * Also pgstat_save_changecount_before() and pgstat_save_changecount_after()
- * need to be called before and after PgBackendStatus entries are copied into
- * private memory, respectively.
+ * Reader logic should follow this sketch:
+ *
+ *        for (;;)
+ *        {
+ *            int before_ct, after_ct;
+ *
+ *            pgstat_begin_read_activity(beentry, before_ct);
+ *            ... copy beentry data to local memory ...
+ *            pgstat_end_read_activity(beentry, after_ct);
+ *            if (pgstat_read_activity_complete(before_ct, after_ct))
+ *                break;
+ *            CHECK_FOR_INTERRUPTS();
+ *        }
+ *
+ * For extra safety, we generally use volatile beentry pointers, although
+ * the memory barriers should theoretically be sufficient.
  */
-#define pgstat_increment_changecount_before(beentry)    \
-    do {    \
-        beentry->st_changecount++;    \
+#define PGSTAT_BEGIN_WRITE_ACTIVITY(beentry) \
+    do { \
+        START_CRIT_SECTION(); \
+        (beentry)->st_changecount++; \
         pg_write_barrier(); \
     } while (0)

-#define pgstat_increment_changecount_after(beentry) \
-    do {    \
+#define PGSTAT_END_WRITE_ACTIVITY(beentry) \
+    do { \
         pg_write_barrier(); \
-        beentry->st_changecount++;    \
-        Assert((beentry->st_changecount & 1) == 0); \
+        (beentry)->st_changecount++; \
+        Assert(((beentry)->st_changecount & 1) == 0); \
+        END_CRIT_SECTION(); \
     } while (0)

-#define pgstat_save_changecount_before(beentry, save_changecount)    \
-    do {    \
-        save_changecount = beentry->st_changecount; \
-        pg_read_barrier();    \
+#define pgstat_begin_read_activity(beentry, before_changecount) \
+    do { \
+        (before_changecount) = (beentry)->st_changecount; \
+        pg_read_barrier(); \
     } while (0)

-#define pgstat_save_changecount_after(beentry, save_changecount)    \
-    do {    \
-        pg_read_barrier();    \
-        save_changecount = beentry->st_changecount; \
+#define pgstat_end_read_activity(beentry, after_changecount) \
+    do { \
+        pg_read_barrier(); \
+        (after_changecount) = (beentry)->st_changecount; \
     } while (0)

+#define pgstat_read_activity_complete(before_changecount, after_changecount) \
+    ((before_changecount) == (after_changecount) && \
+     ((before_changecount) & 1) == 0)
+
+
 /* ----------
  * LocalPgBackendStatus
  *

Re: Query on pg_stat_activity table got stuck

От
Tom Lane
Дата:
neeraj kumar <neeru.cse@gmail.com> writes:
> --> But the right answer is to fix it on the writing side.

> Yes I agree with this part. Even though there is very low probability, a
> process can still be killed in middle when writing. So what is your
> suggestion on how to recover from this automatically?

Here's a draft patch.  This makes more cosmetic changes than are strictly
necessary to fix the bug, but I'm concerned about preventing future
mistakes of the same sort, not just dealing with the immediate bug.

Anyway the idea here is (1) reduce the size of the critical section in
pgstat_bestart where the changecount is odd, so that nothing except
data copying happens there, and (2) adjust the macros for writing
st_changecount so that those sections actually are critical sections.
That means that if something throws an error inside those sections,
it'll turn into a PANIC and database restart.  That shouldn't ever happen,
of course, but if it does then a PANIC is better than a frozen system.
I renamed the macros for writing st_changecount to make it more apparent
that those are now critical-section boundaries.

The out-of-line string fields make the pgstat_bestart code more ticklish
than one could wish, since they have to be treated differently from
in-line fields.  But I'm not sure there's much we can do to avoid that.
It doesn't seem like a good idea to change the layout of the shared-
memory structure, at least not in released branches.

I found two other embarrassing bugs while I was at it.  The stanza
initializing st_backendType was clearly inserted with the aid of a
dartboard, because it was actually before the initial st_changecount bump.
That probably has little if any real-world impact, but it's still an
indication of sloppy patching.  And pgstat_read_current_status had not
been taught to copy the st_gssstatus out-of-line structure to local
storage, so that those values might fail to hold still while a
transaction examines the pg_stat_activity data.  That *is* a live bug.

This patch is against HEAD --- I've not looked at how much adjustment
it'll need for the back branches, but I'm sure there's some.

            regards, tom lane

diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 64986b1..25a1bd9 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -2849,66 +2849,79 @@ pgstat_initialize(void)
  *    Apart from auxiliary processes, MyBackendId, MyDatabaseId,
  *    session userid, and application_name must be set for a
  *    backend (hence, this cannot be combined with pgstat_initialize).
+ *    Note also that we must be inside a transaction.
  * ----------
  */
 void
 pgstat_bestart(void)
 {
-    SockAddr    clientaddr;
-    volatile PgBackendStatus *beentry;
+    volatile PgBackendStatus *vbeentry = MyBEEntry;
+    PgBackendStatus lbeentry;
+#ifdef USE_SSL
+    PgBackendSSLStatus lsslstatus;
+#endif
+#ifdef ENABLE_GSS
+    PgBackendGSSStatus lgssstatus;
+#endif

-    /*
-     * To minimize the time spent modifying the PgBackendStatus entry, fetch
-     * all the needed data first.
-     */
+    /* pgstats state must be initialized from pgstat_initialize() */
+    Assert(vbeentry != NULL);

     /*
-     * We may not have a MyProcPort (eg, if this is the autovacuum process).
-     * If so, use all-zeroes client address, which is dealt with specially in
-     * pg_stat_get_backend_client_addr and pg_stat_get_backend_client_port.
+     * To minimize the time spent modifying the PgBackendStatus entry, and
+     * avoid risk of errors inside the critical section, we first copy the
+     * shared-memory struct to a local variable, then modify the data in the
+     * local variable, then copy the local variable back to shared memory.
+     * Only the last step has to be inside the critical section.
+     *
+     * Most of the data we copy here is just going to be overwritten, but the
+     * struct's not so large that it's worth the maintenance hassle to copy
+     * only the needful fields.
      */
-    if (MyProcPort)
-        memcpy(&clientaddr, &MyProcPort->raddr, sizeof(clientaddr));
-    else
-        MemSet(&clientaddr, 0, sizeof(clientaddr));
+    memcpy(&lbeentry,
+           unvolatize(PgBackendStatus *, vbeentry),
+           sizeof(PgBackendStatus));
+
+    /* These structs can just start from zeroes each time, though */
+#ifdef USE_SSL
+    memset(&lsslstatus, 0, sizeof(lsslstatus));
+#endif
+#ifdef ENABLE_GSS
+    memset(&lgssstatus, 0, sizeof(lgssstatus));
+#endif

     /*
-     * Initialize my status entry, following the protocol of bumping
-     * st_changecount before and after; and make sure it's even afterwards. We
-     * use a volatile pointer here to ensure the compiler doesn't try to get
-     * cute.
+     * Now fill in all the fields of lbeentry, except strings that are
+     * out-of-line data.
      */
-    beentry = MyBEEntry;
-
-    /* pgstats state must be initialized from pgstat_initialize() */
-    Assert(beentry != NULL);
+    lbeentry.st_procpid = MyProcPid;

     if (MyBackendId != InvalidBackendId)
     {
         if (IsAutoVacuumLauncherProcess())
         {
             /* Autovacuum Launcher */
-            beentry->st_backendType = B_AUTOVAC_LAUNCHER;
+            lbeentry.st_backendType = B_AUTOVAC_LAUNCHER;
         }
         else if (IsAutoVacuumWorkerProcess())
         {
             /* Autovacuum Worker */
-            beentry->st_backendType = B_AUTOVAC_WORKER;
+            lbeentry.st_backendType = B_AUTOVAC_WORKER;
         }
         else if (am_walsender)
         {
             /* Wal sender */
-            beentry->st_backendType = B_WAL_SENDER;
+            lbeentry.st_backendType = B_WAL_SENDER;
         }
         else if (IsBackgroundWorker)
         {
             /* bgworker */
-            beentry->st_backendType = B_BG_WORKER;
+            lbeentry.st_backendType = B_BG_WORKER;
         }
         else
         {
             /* client-backend */
-            beentry->st_backendType = B_BACKEND;
+            lbeentry.st_backendType = B_BACKEND;
         }
     }
     else
@@ -2918,99 +2931,92 @@ pgstat_bestart(void)
         switch (MyAuxProcType)
         {
             case StartupProcess:
-                beentry->st_backendType = B_STARTUP;
+                lbeentry.st_backendType = B_STARTUP;
                 break;
             case BgWriterProcess:
-                beentry->st_backendType = B_BG_WRITER;
+                lbeentry.st_backendType = B_BG_WRITER;
                 break;
             case CheckpointerProcess:
-                beentry->st_backendType = B_CHECKPOINTER;
+                lbeentry.st_backendType = B_CHECKPOINTER;
                 break;
             case WalWriterProcess:
-                beentry->st_backendType = B_WAL_WRITER;
+                lbeentry.st_backendType = B_WAL_WRITER;
                 break;
             case WalReceiverProcess:
-                beentry->st_backendType = B_WAL_RECEIVER;
+                lbeentry.st_backendType = B_WAL_RECEIVER;
                 break;
             default:
                 elog(FATAL, "unrecognized process type: %d",
                      (int) MyAuxProcType);
-                proc_exit(1);
         }
     }

-    do
-    {
-        pgstat_increment_changecount_before(beentry);
-    } while ((beentry->st_changecount & 1) == 0);
-
-    beentry->st_procpid = MyProcPid;
-    beentry->st_proc_start_timestamp = MyStartTimestamp;
-    beentry->st_activity_start_timestamp = 0;
-    beentry->st_state_start_timestamp = 0;
-    beentry->st_xact_start_timestamp = 0;
-    beentry->st_databaseid = MyDatabaseId;
+    lbeentry.st_proc_start_timestamp = MyStartTimestamp;
+    lbeentry.st_activity_start_timestamp = 0;
+    lbeentry.st_state_start_timestamp = 0;
+    lbeentry.st_xact_start_timestamp = 0;
+    lbeentry.st_databaseid = MyDatabaseId;

     /* We have userid for client-backends, wal-sender and bgworker processes */
-    if (beentry->st_backendType == B_BACKEND
-        || beentry->st_backendType == B_WAL_SENDER
-        || beentry->st_backendType == B_BG_WORKER)
-        beentry->st_userid = GetSessionUserId();
+    if (lbeentry.st_backendType == B_BACKEND
+        || lbeentry.st_backendType == B_WAL_SENDER
+        || lbeentry.st_backendType == B_BG_WORKER)
+        lbeentry.st_userid = GetSessionUserId();
     else
-        beentry->st_userid = InvalidOid;
+        lbeentry.st_userid = InvalidOid;

-    beentry->st_clientaddr = clientaddr;
-    if (MyProcPort && MyProcPort->remote_hostname)
-        strlcpy(beentry->st_clienthostname, MyProcPort->remote_hostname,
-                NAMEDATALEN);
+    /*
+     * We may not have a MyProcPort (eg, if this is the autovacuum process).
+     * If so, use all-zeroes client address, which is dealt with specially in
+     * pg_stat_get_backend_client_addr and pg_stat_get_backend_client_port.
+     */
+    if (MyProcPort)
+        memcpy(&lbeentry.st_clientaddr, &MyProcPort->raddr,
+               sizeof(lbeentry.st_clientaddr));
     else
-        beentry->st_clienthostname[0] = '\0';
+        MemSet(&lbeentry.st_clientaddr, 0, sizeof(lbeentry.st_clientaddr));
+
 #ifdef USE_SSL
     if (MyProcPort && MyProcPort->ssl != NULL)
     {
-        beentry->st_ssl = true;
-        beentry->st_sslstatus->ssl_bits = be_tls_get_cipher_bits(MyProcPort);
-        beentry->st_sslstatus->ssl_compression = be_tls_get_compression(MyProcPort);
-        strlcpy(beentry->st_sslstatus->ssl_version, be_tls_get_version(MyProcPort), NAMEDATALEN);
-        strlcpy(beentry->st_sslstatus->ssl_cipher, be_tls_get_cipher(MyProcPort), NAMEDATALEN);
-        be_tls_get_peer_subject_name(MyProcPort, beentry->st_sslstatus->ssl_client_dn, NAMEDATALEN);
-        be_tls_get_peer_serial(MyProcPort, beentry->st_sslstatus->ssl_client_serial, NAMEDATALEN);
-        be_tls_get_peer_issuer_name(MyProcPort, beentry->st_sslstatus->ssl_issuer_dn, NAMEDATALEN);
+        lbeentry.st_ssl = true;
+        lsslstatus.ssl_bits = be_tls_get_cipher_bits(MyProcPort);
+        lsslstatus.ssl_compression = be_tls_get_compression(MyProcPort);
+        strlcpy(lsslstatus.ssl_version, be_tls_get_version(MyProcPort), NAMEDATALEN);
+        strlcpy(lsslstatus.ssl_cipher, be_tls_get_cipher(MyProcPort), NAMEDATALEN);
+        be_tls_get_peer_subject_name(MyProcPort, lsslstatus.ssl_client_dn, NAMEDATALEN);
+        be_tls_get_peer_serial(MyProcPort, lsslstatus.ssl_client_serial, NAMEDATALEN);
+        be_tls_get_peer_issuer_name(MyProcPort, lsslstatus.ssl_issuer_dn, NAMEDATALEN);
     }
     else
     {
-        beentry->st_ssl = false;
+        lbeentry.st_ssl = false;
     }
 #else
-    beentry->st_ssl = false;
+    lbeentry.st_ssl = false;
 #endif

 #ifdef ENABLE_GSS
     if (MyProcPort && MyProcPort->gss != NULL)
     {
-        beentry->st_gss = true;
-        beentry->st_gssstatus->gss_auth = be_gssapi_get_auth(MyProcPort);
-        beentry->st_gssstatus->gss_enc = be_gssapi_get_enc(MyProcPort);
+        lbeentry.st_gss = true;
+        lgssstatus.gss_auth = be_gssapi_get_auth(MyProcPort);
+        lgssstatus.gss_enc = be_gssapi_get_enc(MyProcPort);

-        if (beentry->st_gssstatus->gss_auth)
-            strlcpy(beentry->st_gssstatus->gss_princ, be_gssapi_get_princ(MyProcPort), NAMEDATALEN);
+        if (lgssstatus.gss_auth)
+            strlcpy(lgssstatus.gss_princ, be_gssapi_get_princ(MyProcPort), NAMEDATALEN);
     }
     else
     {
-        beentry->st_gss = false;
+        lbeentry.st_gss = false;
     }
 #else
-    beentry->st_gss = false;
+    lbeentry.st_gss = false;
 #endif
-    beentry->st_state = STATE_UNDEFINED;
-    beentry->st_appname[0] = '\0';
-    beentry->st_activity_raw[0] = '\0';
-    /* Also make sure the last byte in each string area is always 0 */
-    beentry->st_clienthostname[NAMEDATALEN - 1] = '\0';
-    beentry->st_appname[NAMEDATALEN - 1] = '\0';
-    beentry->st_activity_raw[pgstat_track_activity_query_size - 1] = '\0';
-    beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
-    beentry->st_progress_command_target = InvalidOid;
+
+    lbeentry.st_state = STATE_UNDEFINED;
+    lbeentry.st_progress_command = PROGRESS_COMMAND_INVALID;
+    lbeentry.st_progress_command_target = InvalidOid;

     /*
      * we don't zero st_progress_param here to save cycles; nobody should
@@ -3018,7 +3024,45 @@ pgstat_bestart(void)
      * than PROGRESS_COMMAND_INVALID
      */

-    pgstat_increment_changecount_after(beentry);
+    /*
+     * Finally, we can enter the critical section that fills the shared-memory
+     * status entry.  We follow the protocol of bumping st_changecount before
+     * and after; and make sure it's even afterwards.  We use a volatile
+     * pointer here to ensure the compiler doesn't try to get cute.
+     */
+    PGSTAT_BEGIN_WRITE_ACTIVITY(vbeentry);
+
+    /* make sure we'll memcpy the same st_changecount back */
+    lbeentry.st_changecount = vbeentry->st_changecount;
+
+    memcpy(unvolatize(PgBackendStatus *, vbeentry),
+           &lbeentry,
+           sizeof(PgBackendStatus));
+
+    /*
+     * We can write the out-of-line strings and structs using the pointers
+     * that are in lbeentry; this saves some de-volatilizing messiness.
+     */
+    lbeentry.st_appname[0] = '\0';
+    if (MyProcPort && MyProcPort->remote_hostname)
+        strlcpy(lbeentry.st_clienthostname, MyProcPort->remote_hostname,
+                NAMEDATALEN);
+    else
+        lbeentry.st_clienthostname[0] = '\0';
+    lbeentry.st_activity_raw[0] = '\0';
+    /* Also make sure the last byte in each string area is always 0 */
+    lbeentry.st_appname[NAMEDATALEN - 1] = '\0';
+    lbeentry.st_clienthostname[NAMEDATALEN - 1] = '\0';
+    lbeentry.st_activity_raw[pgstat_track_activity_query_size - 1] = '\0';
+
+#ifdef USE_SSL
+    memcpy(lbeentry.st_sslstatus, &lsslstatus, sizeof(PgBackendSSLStatus));
+#endif
+#ifdef ENABLE_GSS
+    memcpy(lbeentry.st_gssstatus, &lgssstatus, sizeof(PgBackendGSSStatus));
+#endif
+
+    PGSTAT_END_WRITE_ACTIVITY(vbeentry);

     /* Update app name to current GUC setting */
     if (application_name)
@@ -3053,11 +3097,11 @@ pgstat_beshutdown_hook(int code, Datum arg)
      * before and after.  We use a volatile pointer here to ensure the
      * compiler doesn't try to get cute.
      */
-    pgstat_increment_changecount_before(beentry);
+    PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);

     beentry->st_procpid = 0;    /* mark invalid */

-    pgstat_increment_changecount_after(beentry);
+    PGSTAT_END_WRITE_ACTIVITY(beentry);
 }


@@ -3096,7 +3140,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
              * non-disabled state.  As our final update, change the state and
              * clear fields we will not be updating anymore.
              */
-            pgstat_increment_changecount_before(beentry);
+            PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
             beentry->st_state = STATE_DISABLED;
             beentry->st_state_start_timestamp = 0;
             beentry->st_activity_raw[0] = '\0';
@@ -3104,14 +3148,14 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
             /* st_xact_start_timestamp and wait_event_info are also disabled */
             beentry->st_xact_start_timestamp = 0;
             proc->wait_event_info = 0;
-            pgstat_increment_changecount_after(beentry);
+            PGSTAT_END_WRITE_ACTIVITY(beentry);
         }
         return;
     }

     /*
-     * To minimize the time spent modifying the entry, fetch all the needed
-     * data first.
+     * To minimize the time spent modifying the entry, and avoid risk of
+     * errors inside the critical section, fetch all the needed data first.
      */
     start_timestamp = GetCurrentStatementStartTimestamp();
     if (cmd_str != NULL)
@@ -3128,7 +3172,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
     /*
      * Now update the status entry
      */
-    pgstat_increment_changecount_before(beentry);
+    PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);

     beentry->st_state = state;
     beentry->st_state_start_timestamp = current_timestamp;
@@ -3140,7 +3184,7 @@ pgstat_report_activity(BackendState state, const char *cmd_str)
         beentry->st_activity_start_timestamp = start_timestamp;
     }

-    pgstat_increment_changecount_after(beentry);
+    PGSTAT_END_WRITE_ACTIVITY(beentry);
 }

 /*-----------
@@ -3158,11 +3202,11 @@ pgstat_progress_start_command(ProgressCommandType cmdtype, Oid relid)
     if (!beentry || !pgstat_track_activities)
         return;

-    pgstat_increment_changecount_before(beentry);
+    PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
     beentry->st_progress_command = cmdtype;
     beentry->st_progress_command_target = relid;
     MemSet(&beentry->st_progress_param, 0, sizeof(beentry->st_progress_param));
-    pgstat_increment_changecount_after(beentry);
+    PGSTAT_END_WRITE_ACTIVITY(beentry);
 }

 /*-----------
@@ -3181,9 +3225,9 @@ pgstat_progress_update_param(int index, int64 val)
     if (!beentry || !pgstat_track_activities)
         return;

-    pgstat_increment_changecount_before(beentry);
+    PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
     beentry->st_progress_param[index] = val;
-    pgstat_increment_changecount_after(beentry);
+    PGSTAT_END_WRITE_ACTIVITY(beentry);
 }

 /*-----------
@@ -3203,7 +3247,7 @@ pgstat_progress_update_multi_param(int nparam, const int *index,
     if (!beentry || !pgstat_track_activities || nparam == 0)
         return;

-    pgstat_increment_changecount_before(beentry);
+    PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);

     for (i = 0; i < nparam; ++i)
     {
@@ -3212,7 +3256,7 @@ pgstat_progress_update_multi_param(int nparam, const int *index,
         beentry->st_progress_param[index[i]] = val[i];
     }

-    pgstat_increment_changecount_after(beentry);
+    PGSTAT_END_WRITE_ACTIVITY(beentry);
 }

 /*-----------
@@ -3233,10 +3277,10 @@ pgstat_progress_end_command(void)
         && beentry->st_progress_command == PROGRESS_COMMAND_INVALID)
         return;

-    pgstat_increment_changecount_before(beentry);
+    PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
     beentry->st_progress_command = PROGRESS_COMMAND_INVALID;
     beentry->st_progress_command_target = InvalidOid;
-    pgstat_increment_changecount_after(beentry);
+    PGSTAT_END_WRITE_ACTIVITY(beentry);
 }

 /* ----------
@@ -3262,12 +3306,12 @@ pgstat_report_appname(const char *appname)
      * st_changecount before and after.  We use a volatile pointer here to
      * ensure the compiler doesn't try to get cute.
      */
-    pgstat_increment_changecount_before(beentry);
+    PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);

     memcpy((char *) beentry->st_appname, appname, len);
     beentry->st_appname[len] = '\0';

-    pgstat_increment_changecount_after(beentry);
+    PGSTAT_END_WRITE_ACTIVITY(beentry);
 }

 /*
@@ -3287,9 +3331,11 @@ pgstat_report_xact_timestamp(TimestampTz tstamp)
      * st_changecount before and after.  We use a volatile pointer here to
      * ensure the compiler doesn't try to get cute.
      */
-    pgstat_increment_changecount_before(beentry);
+    PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+
     beentry->st_xact_start_timestamp = tstamp;
-    pgstat_increment_changecount_after(beentry);
+
+    PGSTAT_END_WRITE_ACTIVITY(beentry);
 }

 /* ----------
@@ -3311,6 +3357,9 @@ pgstat_read_current_status(void)
 #ifdef USE_SSL
     PgBackendSSLStatus *localsslstatus;
 #endif
+#ifdef ENABLE_GSS
+    PgBackendGSSStatus *localgssstatus;
+#endif
     int            i;

     Assert(!pgStatRunningInCollector);
@@ -3344,6 +3393,11 @@ pgstat_read_current_status(void)
         MemoryContextAlloc(pgStatLocalContext,
                            sizeof(PgBackendSSLStatus) * NumBackendStatSlots);
 #endif
+#ifdef ENABLE_GSS
+    localgssstatus = (PgBackendGSSStatus *)
+        MemoryContextAlloc(pgStatLocalContext,
+                           sizeof(PgBackendGSSStatus) * NumBackendStatSlots);
+#endif

     localNumBackends = 0;

@@ -3363,14 +3417,19 @@ pgstat_read_current_status(void)
             int            before_changecount;
             int            after_changecount;

-            pgstat_save_changecount_before(beentry, before_changecount);
+            pgstat_begin_read_activity(beentry, before_changecount);

             localentry->backendStatus.st_procpid = beentry->st_procpid;
+            /* Skip all the data-copying work if entry is not in use */
             if (localentry->backendStatus.st_procpid > 0)
             {
                 memcpy(&localentry->backendStatus, unvolatize(PgBackendStatus *, beentry), sizeof(PgBackendStatus));

                 /*
+                 * For each PgBackendStatus field that is a pointer, copy the
+                 * pointed-to data, then adjust the local copy of the pointer
+                 * field to point at the local copy of the data.
+                 *
                  * strcpy is safe even if the string is modified concurrently,
                  * because there's always a \0 at the end of the buffer.
                  */
@@ -3380,7 +3439,6 @@ pgstat_read_current_status(void)
                 localentry->backendStatus.st_clienthostname = localclienthostname;
                 strcpy(localactivity, (char *) beentry->st_activity_raw);
                 localentry->backendStatus.st_activity_raw = localactivity;
-                localentry->backendStatus.st_ssl = beentry->st_ssl;
 #ifdef USE_SSL
                 if (beentry->st_ssl)
                 {
@@ -3388,11 +3446,19 @@ pgstat_read_current_status(void)
                     localentry->backendStatus.st_sslstatus = localsslstatus;
                 }
 #endif
+#ifdef ENABLE_GSS
+                if (beentry->st_gss)
+                {
+                    memcpy(localgssstatus, beentry->st_gssstatus, sizeof(PgBackendGSSStatus));
+                    localentry->backendStatus.st_gssstatus = localgssstatus;
+                }
+#endif
             }

-            pgstat_save_changecount_after(beentry, after_changecount);
-            if (before_changecount == after_changecount &&
-                (before_changecount & 1) == 0)
+            pgstat_end_read_activity(beentry, after_changecount);
+
+            if (pgstat_read_activity_complete(before_changecount,
+                                              after_changecount))
                 break;

             /* Make sure we can break out of loop if stuck... */
@@ -3414,6 +3480,9 @@ pgstat_read_current_status(void)
 #ifdef USE_SSL
             localsslstatus++;
 #endif
+#ifdef ENABLE_GSS
+            localgssstatus++;
+#endif
             localNumBackends++;
         }
     }
@@ -4090,14 +4159,14 @@ pgstat_get_backend_current_activity(int pid, bool checkUser)
             int            before_changecount;
             int            after_changecount;

-            pgstat_save_changecount_before(vbeentry, before_changecount);
+            pgstat_begin_read_activity(vbeentry, before_changecount);

             found = (vbeentry->st_procpid == pid);

-            pgstat_save_changecount_after(vbeentry, after_changecount);
+            pgstat_end_read_activity(vbeentry, after_changecount);

-            if (before_changecount == after_changecount &&
-                (before_changecount & 1) == 0)
+            if (pgstat_read_activity_complete(before_changecount,
+                                              after_changecount))
                 break;

             /* Make sure we can break out of loop if stuck... */
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 9fbc492..9299871 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -1035,11 +1035,12 @@ typedef struct PgBackendStatus
      * the copy is valid; otherwise start over.  This makes updates cheap
      * while reads are potentially expensive, but that's the tradeoff we want.
      *
-     * The above protocol needs the memory barriers to ensure that the
-     * apparent order of execution is as it desires. Otherwise, for example,
-     * the CPU might rearrange the code so that st_changecount is incremented
-     * twice before the modification on a machine with weak memory ordering.
-     * This surprising result can lead to bugs.
+     * The above protocol needs memory barriers to ensure that the apparent
+     * order of execution is as it desires.  Otherwise, for example, the CPU
+     * might rearrange the code so that st_changecount is incremented twice
+     * before the modification on a machine with weak memory ordering.  Hence,
+     * use the macros defined below for manipulating st_changecount, rather
+     * than touching it directly.
      */
     int            st_changecount;

@@ -1099,42 +1100,66 @@ typedef struct PgBackendStatus
 } PgBackendStatus;

 /*
- * Macros to load and store st_changecount with the memory barriers.
+ * Macros to load and store st_changecount with appropriate memory barriers.
  *
- * pgstat_increment_changecount_before() and
- * pgstat_increment_changecount_after() need to be called before and after
- * PgBackendStatus entries are modified, respectively. This makes sure that
- * st_changecount is incremented around the modification.
+ * Use PGSTAT_BEGIN_WRITE_ACTIVITY() before, and PGSTAT_END_WRITE_ACTIVITY()
+ * after, modifying the current process's PgBackendStatus data.  Note that,
+ * since there is no mechanism for cleaning up st_changecount after an error,
+ * THESE MACROS FORM A CRITICAL SECTION.  Any error between them will be
+ * promoted to PANIC, causing a database restart to clean up shared memory!
+ * Hence, keep the critical section as short and straight-line as possible.
+ * Aside from being safer, that minimizes the window in which readers will
+ * have to loop.
  *
- * Also pgstat_save_changecount_before() and pgstat_save_changecount_after()
- * need to be called before and after PgBackendStatus entries are copied into
- * private memory, respectively.
+ * Reader logic should follow this sketch:
+ *
+ *        for (;;)
+ *        {
+ *            int before_ct, after_ct;
+ *
+ *            pgstat_begin_read_activity(beentry, before_ct);
+ *            ... copy beentry data to local memory ...
+ *            pgstat_end_read_activity(beentry, after_ct);
+ *            if (pgstat_read_activity_complete(before_ct, after_ct))
+ *                break;
+ *            CHECK_FOR_INTERRUPTS();
+ *        }
+ *
+ * For extra safety, we generally use volatile beentry pointers, although
+ * the memory barriers should theoretically be sufficient.
  */
-#define pgstat_increment_changecount_before(beentry)    \
-    do {    \
-        beentry->st_changecount++;    \
+#define PGSTAT_BEGIN_WRITE_ACTIVITY(beentry) \
+    do { \
+        START_CRIT_SECTION(); \
+        (beentry)->st_changecount++; \
         pg_write_barrier(); \
     } while (0)

-#define pgstat_increment_changecount_after(beentry) \
-    do {    \
+#define PGSTAT_END_WRITE_ACTIVITY(beentry) \
+    do { \
         pg_write_barrier(); \
-        beentry->st_changecount++;    \
-        Assert((beentry->st_changecount & 1) == 0); \
+        (beentry)->st_changecount++; \
+        Assert(((beentry)->st_changecount & 1) == 0); \
+        END_CRIT_SECTION(); \
     } while (0)

-#define pgstat_save_changecount_before(beentry, save_changecount)    \
-    do {    \
-        save_changecount = beentry->st_changecount; \
-        pg_read_barrier();    \
+#define pgstat_begin_read_activity(beentry, before_changecount) \
+    do { \
+        (before_changecount) = (beentry)->st_changecount; \
+        pg_read_barrier(); \
     } while (0)

-#define pgstat_save_changecount_after(beentry, save_changecount)    \
-    do {    \
-        pg_read_barrier();    \
-        save_changecount = beentry->st_changecount; \
+#define pgstat_end_read_activity(beentry, after_changecount) \
+    do { \
+        pg_read_barrier(); \
+        (after_changecount) = (beentry)->st_changecount; \
     } while (0)

+#define pgstat_read_activity_complete(before_changecount, after_changecount) \
+    ((before_changecount) == (after_changecount) && \
+     ((before_changecount) & 1) == 0)
+
+
 /* ----------
  * LocalPgBackendStatus
  *

Re: Query on pg_stat_activity table got stuck

От
Tom Lane
Дата:
I wrote:
> This patch is against HEAD --- I've not looked at how much adjustment
> it'll need for the back branches, but I'm sure there's some.

I've back-patched this now.  If you want to test the patch, the v10-branch
version is at

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c3d113136bbaa86fbf1edde7aaf70ba06a6166b7

            regards, tom lane



Re: Query on pg_stat_activity table got stuck

От
Tom Lane
Дата:
I wrote:
> This patch is against HEAD --- I've not looked at how much adjustment
> it'll need for the back branches, but I'm sure there's some.

I've back-patched this now.  If you want to test the patch, the v10-branch
version is at

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=c3d113136bbaa86fbf1edde7aaf70ba06a6166b7

            regards, tom lane