Обсуждение: Re: pgsql: Add function to log the memory contexts of specified backend pro
On Tue, Apr 6, 2021 at 12:45 AM Fujii Masao <fujii@postgresql.org> wrote:
> Add function to log the memory contexts of specified backend process.
Hi,
I think this might need a recursion guard. I tried this:
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index dc4c600922d..b219a934034 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -3532,7 +3532,7 @@ ProcessInterrupts(void)
if (ParallelMessagePending)
ProcessParallelMessages();
- if (LogMemoryContextPending)
+ if (true)
ProcessLogMemoryContextInterrupt();
if (PublishMemoryContextPending)
diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
index 72f5655fb34..867fd7b0ad5 100644
--- a/src/include/miscadmin.h
+++ b/src/include/miscadmin.h
@@ -112,7 +112,7 @@ extern void ProcessInterrupts(void);
/* Test whether an interrupt is pending */
#ifndef WIN32
#define INTERRUPTS_PENDING_CONDITION() \
- (unlikely(InterruptPending))
+ (unlikely(InterruptPending) || true)
#else
#define INTERRUPTS_PENDING_CONDITION() \
(unlikely(UNBLOCKED_SIGNAL_QUEUE()) ?
pgwin32_dispatch_queued_signals() : 0, \
That immediately caused infinite recursion, ending in a core dump:
frame #13: 0x0000000104607b00
postgres`errfinish(filename=<unavailable>, lineno=<unavailable>,
funcname=<unavailable>) at elog.c:543:2 [opt]
frame #14: 0x0000000104637078
postgres`ProcessLogMemoryContextInterrupt at mcxt.c:1392:2 [opt]
frame #15: 0x00000001044a901c postgres`ProcessInterrupts at
postgres.c:3536:3 [opt]
frame #16: 0x0000000104607b54
postgres`errfinish(filename=<unavailable>, lineno=<unavailable>,
funcname=<unavailable>) at elog.c:608:2 [opt] [artificial]
frame #17: 0x0000000104637078
postgres`ProcessLogMemoryContextInterrupt at mcxt.c:1392:2 [opt]
frame #18: 0x00000001044a901c postgres`ProcessInterrupts at
postgres.c:3536:3 [opt]
<repeat until we have 174241 frames on the stack, then dump core>
It might be unlikely that a process can be signalled fast enough to
actually fail in this way, but I'm not sure it's impossible, and I
think we should be defending against it. The most trivial recursion
guard would be HOLD_INTERRUPTS()/RESUME_INTERRUPTS() around
ProcessLogMemoryContextInterrupt(), but I think that's probably not
quite good enough because it would make the backend impervious to
pg_terminate_backend() while it's dumping memory contexts, and that
could be a long time if the write blocks.
--
Robert Haas
EDB: http://www.enterprisedb.com
On 2025/05/01 2:15, Robert Haas wrote:
> On Tue, Apr 6, 2021 at 12:45 AM Fujii Masao <fujii@postgresql.org> wrote:
>> Add function to log the memory contexts of specified backend process.
>
> Hi,
>
> I think this might need a recursion guard. I tried this:
>
> diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
> index dc4c600922d..b219a934034 100644
> --- a/src/backend/tcop/postgres.c
> +++ b/src/backend/tcop/postgres.c
> @@ -3532,7 +3532,7 @@ ProcessInterrupts(void)
> if (ParallelMessagePending)
> ProcessParallelMessages();
>
> - if (LogMemoryContextPending)
> + if (true)
> ProcessLogMemoryContextInterrupt();
>
> if (PublishMemoryContextPending)
> diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
> index 72f5655fb34..867fd7b0ad5 100644
> --- a/src/include/miscadmin.h
> +++ b/src/include/miscadmin.h
> @@ -112,7 +112,7 @@ extern void ProcessInterrupts(void);
> /* Test whether an interrupt is pending */
> #ifndef WIN32
> #define INTERRUPTS_PENDING_CONDITION() \
> - (unlikely(InterruptPending))
> + (unlikely(InterruptPending) || true)
> #else
> #define INTERRUPTS_PENDING_CONDITION() \
> (unlikely(UNBLOCKED_SIGNAL_QUEUE()) ?
> pgwin32_dispatch_queued_signals() : 0, \
>
> That immediately caused infinite recursion, ending in a core dump:
>
> frame #13: 0x0000000104607b00
> postgres`errfinish(filename=<unavailable>, lineno=<unavailable>,
> funcname=<unavailable>) at elog.c:543:2 [opt]
> frame #14: 0x0000000104637078
> postgres`ProcessLogMemoryContextInterrupt at mcxt.c:1392:2 [opt]
> frame #15: 0x00000001044a901c postgres`ProcessInterrupts at
> postgres.c:3536:3 [opt]
> frame #16: 0x0000000104607b54
> postgres`errfinish(filename=<unavailable>, lineno=<unavailable>,
> funcname=<unavailable>) at elog.c:608:2 [opt] [artificial]
> frame #17: 0x0000000104637078
> postgres`ProcessLogMemoryContextInterrupt at mcxt.c:1392:2 [opt]
> frame #18: 0x00000001044a901c postgres`ProcessInterrupts at
> postgres.c:3536:3 [opt]
> <repeat until we have 174241 frames on the stack, then dump core>
>
> It might be unlikely that a process can be signalled fast enough to
> actually fail in this way, but I'm not sure it's impossible, and I
> think we should be defending against it. The most trivial recursion
> guard would be HOLD_INTERRUPTS()/RESUME_INTERRUPTS() around
> ProcessLogMemoryContextInterrupt(), but I think that's probably not
> quite good enough because it would make the backend impervious to
> pg_terminate_backend() while it's dumping memory contexts, and that
> could be a long time if the write blocks.
Just idea, what do you think about adding a flag to indicate whether
ProcessLogMemoryContextInterrupt() is currently running? Then,
when a backend receives a signal and ProcessLogMemoryContextInterrupt()
is invoked, it can simply return immediately if the flag is already set
like this:
------------------------------
@ -1383,8 +1383,14 @@ HandleGetMemoryContextInterrupt(void)
void
ProcessLogMemoryContextInterrupt(void)
{
+ static bool loggingMemoryContext = false;
+
LogMemoryContextPending = false;
+ if (loggingMemoryContext)
+ return;
+ loggingMemoryContext = true;
+
/*
* Use LOG_SERVER_ONLY to prevent this message from being sent to the
* connected client.
@@ -1406,6 +1412,8 @@ ProcessLogMemoryContextInterrupt(void)
* details about individual siblings beyond 100 will not be large.
*/
MemoryContextStatsDetail(TopMemoryContext, 100, 100, false);
+
+ loggingMemoryContext = false;
}
------------------------------
This way, we can safely ignore overlapping
pg_log_backend_memory_contexts() requests while the function
is already running. Thoughts?
Regards,
--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
On Thu, May 1, 2025 at 3:53 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > Just idea, what do you think about adding a flag to indicate whether > ProcessLogMemoryContextInterrupt() is currently running? Then, > when a backend receives a signal and ProcessLogMemoryContextInterrupt() > is invoked, it can simply return immediately if the flag is already set > like this: I think that something like this could work, but you would need more than this. Otherwise, if the function errors out, the flag would remain permanently set. -- Robert Haas EDB: http://www.enterprisedb.com
On 2025/05/01 21:42, Robert Haas wrote: > On Thu, May 1, 2025 at 3:53 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> Just idea, what do you think about adding a flag to indicate whether >> ProcessLogMemoryContextInterrupt() is currently running? Then, >> when a backend receives a signal and ProcessLogMemoryContextInterrupt() >> is invoked, it can simply return immediately if the flag is already set >> like this: > > I think that something like this could work, but you would need more > than this. Otherwise, if the function errors out, the flag would > remain permanently set. Yes, we need to either use PG_TRY()/PG_FINALLY() or handle the flag as a global variable and reset it in the error handling path. I think using PG_TRY()/PG_FINALLY() is the simpler option. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2025/05/02 2:27, Fujii Masao wrote: > > > On 2025/05/01 21:42, Robert Haas wrote: >> On Thu, May 1, 2025 at 3:53 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>> Just idea, what do you think about adding a flag to indicate whether >>> ProcessLogMemoryContextInterrupt() is currently running? Then, >>> when a backend receives a signal and ProcessLogMemoryContextInterrupt() >>> is invoked, it can simply return immediately if the flag is already set >>> like this: >> >> I think that something like this could work, but you would need more >> than this. Otherwise, if the function errors out, the flag would >> remain permanently set. > > Yes, we need to either use PG_TRY()/PG_FINALLY() or handle the flag as > a global variable and reset it in the error handling path. I think using > PG_TRY()/PG_FINALLY() is the simpler option. I've implemented the patch in that way. Patch attached. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Вложения
On 2025-05-02 09:02, Fujii Masao wrote: > On 2025/05/02 2:27, Fujii Masao wrote: >> >> >> On 2025/05/01 21:42, Robert Haas wrote: >>> On Thu, May 1, 2025 at 3:53 AM Fujii Masao >>> <masao.fujii@oss.nttdata.com> wrote: >>>> Just idea, what do you think about adding a flag to indicate whether >>>> ProcessLogMemoryContextInterrupt() is currently running? Then, >>>> when a backend receives a signal and >>>> ProcessLogMemoryContextInterrupt() >>>> is invoked, it can simply return immediately if the flag is already >>>> set >>>> like this: >>> >>> I think that something like this could work, but you would need more >>> than this. Otherwise, if the function errors out, the flag would >>> remain permanently set. >> >> Yes, we need to either use PG_TRY()/PG_FINALLY() or handle the flag as >> a global variable and reset it in the error handling path. I think >> using >> PG_TRY()/PG_FINALLY() is the simpler option. > > I've implemented the patch in that way. Patch attached. Thank you for the patch! I confirmed that with this patch applied, the process no longer crashes even after applying the change Robert used to trigger the crash. a small nitpick: + * requested repeatedly and rapidly, potentially leading to infinite It looks like there are two spaces between "requested" and "repeatedly". -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
On 2025/05/02 14:58, torikoshia wrote: > I confirmed that with this patch applied, the process no longer crashes even after applying the change Robert used to triggerthe crash. > > a small nitpick: > > + * requested repeatedly and rapidly, potentially leading to infinite > > It looks like there are two spaces between "requested" and "repeatedly". Thanks for the review and testing! I've fixed the issue you pointed out. Updated patch attached. Since git cherry-pick didn't work cleanly for v16 and earlier, I've also prepared a separate patch for those versions. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Вложения
On Fri, May 2, 2025 at 9:54 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > Thanks for the review and testing! I've fixed the issue you pointed out. > Updated patch attached. Thanks for addressing this. However, I believe this commit may need to take note of the following comment from elog.h: * Note: if a local variable of the function containing PG_TRY is modified * in the PG_TRY section and used in the PG_CATCH section, that variable * must be declared "volatile" for POSIX compliance. This is not mere * pedantry; we have seen bugs from compilers improperly optimizing code * away when such a variable was not marked. Beware that gcc's -Wclobbered * warnings are just about entirely useless for catching such oversights. Based on this comment, I believe in_progress must be declared volatile. As a stylistic comment, I think I would prefer making in_progress a file-level global and giving it a less generic name (e.g. LogMemoryContextInProgress). However, perhaps others will disagree. -- Robert Haas EDB: http://www.enterprisedb.com
On 2025/05/05 23:57, Robert Haas wrote: > On Fri, May 2, 2025 at 9:54 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> Thanks for the review and testing! I've fixed the issue you pointed out. >> Updated patch attached. > > Thanks for addressing this. However, I believe this commit may need to > take note of the following comment from elog.h: Thanks for the review! > * Note: if a local variable of the function containing PG_TRY is modified > * in the PG_TRY section and used in the PG_CATCH section, that variable > * must be declared "volatile" for POSIX compliance. This is not mere > * pedantry; we have seen bugs from compilers improperly optimizing code > * away when such a variable was not marked. Beware that gcc's -Wclobbered > * warnings are just about entirely useless for catching such oversights. > > Based on this comment, I believe in_progress must be declared volatile. You're right. OTOH, setting the flag inside the PG_TRY() block isn't necessary, so I've moved it outside instead of leaving it inside and marking the flag volatile. > As a stylistic comment, I think I would prefer making in_progress a > file-level global and giving it a less generic name (e.g. > LogMemoryContextInProgress). However, perhaps others will disagree. I'm fine with this. I've renamed the flag and made it a file-level global variable as suggested. Updated patch is attached. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Вложения
On 2025/05/07 18:06, Fujii Masao wrote: > > > On 2025/05/05 23:57, Robert Haas wrote: >> On Fri, May 2, 2025 at 9:54 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>> Thanks for the review and testing! I've fixed the issue you pointed out. >>> Updated patch attached. >> >> Thanks for addressing this. However, I believe this commit may need to >> take note of the following comment from elog.h: > > Thanks for the review! > > >> * Note: if a local variable of the function containing PG_TRY is modified >> * in the PG_TRY section and used in the PG_CATCH section, that variable >> * must be declared "volatile" for POSIX compliance. This is not mere >> * pedantry; we have seen bugs from compilers improperly optimizing code >> * away when such a variable was not marked. Beware that gcc's -Wclobbered >> * warnings are just about entirely useless for catching such oversights. >> >> Based on this comment, I believe in_progress must be declared volatile. > > You're right. OTOH, setting the flag inside the PG_TRY() block isn't necessary, > so I've moved it outside instead of leaving it inside and marking the flag volatile. > > >> As a stylistic comment, I think I would prefer making in_progress a >> file-level global and giving it a less generic name (e.g. >> LogMemoryContextInProgress). However, perhaps others will disagree. > > I'm fine with this. I've renamed the flag and made it a file-level global > variable as suggested. Updated patch is attached. I've attached the rebased versions of the patches. The patch for v14–v16 is labeled with a .txt extension to prevent cfbot from treating it as a patch for master, which would cause it to fail when applying. Regards, -- Fujii Masao NTT DATA Japan Corporation
Вложения
Hi Fujii,
I did your patch review. It successfully applies to all targeted branches: REL_14_STABLE (48969555447), REL_15_STABLE (b9a02b9780b), REL_16_STABLE (4d689a17693), REL_17_STABLE (cad40cec24f), REL_18_STABLE (5278222853c) and master (d0d0ba6cf66). All tests successfully pass on MacOS 15.7 for every revision. Everything seems fine with the patch, I think it's ready for committer.
--