Обсуждение: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o
pgstat: Bring up pgstat in BaseInit() to fix uninitialized use of pgstat by AV. Previously pgstat_initialize() was called in InitPostgres() and AuxiliaryProcessMain(). As it turns out there was at least one case where we reported stats before pgstat_initialize() was called, see AutoVacWorkerMain()'s intentionally early call to pgstat_report_autovac(). This turns out to not be a problem with the current pgstat implementation as pgstat_initialize() only registers a shutdown callback. But in the shared memory based stats implementation we are working towards pgstat_initialize() has to do more work. After b406478b87e BaseInit() is a central place where initialization shared by normal backends and auxiliary backends can be put. Obviously BaseInit() is called before InitPostgres() registers ShutdownPostgres. Previously ShutdownPostgres was the first before_shmem_exit callback, now that's commonly pgstats. That should be fine. Previously pgstat_initialize() was not called in bootstrap mode, but there does not appear to be a need for that. It's now done unconditionally. To detect future issues like this, assertions are added to a few places verifying that the pgstat subsystem is initialized and not yet shut down. Author: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20210405092914.mmxqe7j56lsjfsej@alap3.anarazel.de Discussion: https://postgr.es/m/20210802164124.ufo5buo4apl6yuvs@alap3.anarazel.de Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/ee3f8d3d3aec0d7c961d6b398d31504bb272a450 Modified Files -------------- src/backend/postmaster/auxprocess.c | 2 -- src/backend/postmaster/pgstat.c | 53 +++++++++++++++++++++++++++++++++++-- src/backend/utils/init/postinit.c | 23 +++++++++------- 3 files changed, 65 insertions(+), 13 deletions(-)
Andres Freund <andres@anarazel.de> writes: > pgstat: Bring up pgstat in BaseInit() to fix uninitialized use of pgstat by AV. sifaka took exception to this, or at least I guess it was this one out of the four you pushed at once: TRAP: FailedAssertion("pgstat_is_initialized && !pgstat_is_shutdown", File: "pgstat.c", Line: 4810, PID: 74447) 0 postgres 0x0000000100e5a520 ExceptionalCondition + 124 1 postgres 0x0000000100ca1dec pgstat_reset_counters + 0 2 postgres 0x0000000100ca2548 pgstat_report_tempfile + 80 3 postgres 0x0000000100d07ca4 FileClose + 536 4 postgres 0x0000000100d09764 CleanupTempFiles + 160 5 postgres 0x0000000100d0f8c0 proc_exit_prepare + 228 6 postgres 0x0000000100d0f79c proc_exit + 24 7 postgres 0x0000000100e5ae04 errfinish + 856 8 postgres 0x0000000100f1a804 ProcessInterrupts.cold.9 + 88 9 postgres 0x0000000100d36a98 ProcessInterrupts + 604 10 postgres 0x0000000100cd93c4 sendDir + 1516 11 postgres 0x0000000100cd83a0 perform_base_backup + 3592 12 postgres 0x0000000100cd72c0 SendBaseBackup + 256 13 postgres 0x0000000100ce6a48 exec_replication_command + 1852 14 postgres 0x0000000100d39190 PostgresMain + 3260 15 postgres 0x0000000100ca9c78 process_startup_packet_die + 0 16 postgres 0x0000000100ca94ec ClosePostmasterPorts + 0 17 postgres 0x0000000100ca6a0c PostmasterMain + 4584 18 postgres 0x0000000100c0b798 help + 0 19 libdyld.dylib 0x0000000184391430 start + 4 regards, tom lane
Hi, On 2021-08-06 22:44:07 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > pgstat: Bring up pgstat in BaseInit() to fix uninitialized use of pgstat by AV. > > sifaka took exception to this, or at least I guess it was this one out > of the four you pushed at once: Longfin as well. It's the assertions from ee3f8d3d3ae, but possibly only exposed after fb2c5028e63. Not sure why it's your two animals that report this issue, but not others? Why would a backend doing SendBaseBackup() previously have allocated temp files? Don't get me wrong - it's good that they surfaced the issue, and it's an issue independent of the specific trigger. Glad I added the assertions... See also https://www.postgresql.org/message-id/20210803023612.iziacxk5syn2r4ut%40alap3.anarazel.de > TRAP: FailedAssertion("pgstat_is_initialized && !pgstat_is_shutdown", File: "pgstat.c", Line: 4810, PID: 74447) > 0 postgres 0x0000000100e5a520 ExceptionalCondition + 124 > 1 postgres 0x0000000100ca1dec pgstat_reset_counters + 0 > 2 postgres 0x0000000100ca2548 pgstat_report_tempfile + 80 > 3 postgres 0x0000000100d07ca4 FileClose + 536 > 4 postgres 0x0000000100d09764 CleanupTempFiles + 160 > 5 postgres 0x0000000100d0f8c0 proc_exit_prepare + 228 > 6 postgres 0x0000000100d0f79c proc_exit + 24 > 7 postgres 0x0000000100e5ae04 errfinish + 856 > 8 postgres 0x0000000100f1a804 ProcessInterrupts.cold.9 + 88 > 9 postgres 0x0000000100d36a98 ProcessInterrupts + 604 > 10 postgres 0x0000000100cd93c4 sendDir + 1516 > 11 postgres 0x0000000100cd83a0 perform_base_backup + 3592 > 12 postgres 0x0000000100cd72c0 SendBaseBackup + 256 > 13 postgres 0x0000000100ce6a48 exec_replication_command + 1852 > 14 postgres 0x0000000100d39190 PostgresMain + 3260 > 15 postgres 0x0000000100ca9c78 process_startup_packet_die + 0 > 16 postgres 0x0000000100ca94ec ClosePostmasterPorts + 0 > 17 postgres 0x0000000100ca6a0c PostmasterMain + 4584 > 18 postgres 0x0000000100c0b798 help + 0 > 19 libdyld.dylib 0x0000000184391430 start + 4 Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > Not sure why it's your two animals that report this issue, but not others? Why > would a backend doing SendBaseBackup() previously have allocated temp files? Guessing the common factor is "macOS", but that's just a guess. I can poke into it tomorrow. regards, tom lane
Hi, On 2021-08-06 23:13:28 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > Not sure why it's your two animals that report this issue, but not others? Why > > would a backend doing SendBaseBackup() previously have allocated temp files? > > Guessing the common factor is "macOS", but that's just a guess. Probably not a bad one... > I can poke into it tomorrow. Thanks! Might be interesting to run the pg_basebackup tests with log_temp_files=0... Greetings, Andres Freund
I wrote: > Guessing the common factor is "macOS", but that's just a guess. > I can poke into it tomorrow. I did try it real quick on my Mac laptop, and that fails too. Here's a more accurate backtrace, in case that helps. (lldb) bt * thread #1, stop reason = signal SIGSTOP * frame #0: 0x00007fff2033e92e libsystem_kernel.dylib`__pthread_kill + 10 frame #1: 0x00007fff2036d5bd libsystem_pthread.dylib`pthread_kill + 263 frame #2: 0x00007fff202c2406 libsystem_c.dylib`abort + 125 frame #3: 0x0000000108cdf87f postgres`ExceptionalCondition(conditionName=<unavailable>, errorType=<unavailable>, fileName=<unavailable>,lineNumber=<unavailable>) at assert.c:69:2 [opt] frame #4: 0x0000000108b0f551 postgres`pgstat_send [inlined] pgstat_assert_is_up at pgstat.c:4810:2 [opt] frame #5: 0x0000000108b0f532 postgres`pgstat_send(msg=<unavailable>, len=<unavailable>) at pgstat.c:3032 [opt] frame #6: 0x0000000108b0fbef postgres`pgstat_report_tempfile(filesize=<unavailable>) at pgstat.c:1812:2 [opt] frame #7: 0x0000000108b7abfe postgres`FileClose [inlined] ReportTemporaryFileUsage(path="base/pgsql_tmp/pgsql_tmp35840.0",size=0) at fd.c:1483:2 [opt] frame #8: 0x0000000108b7abf6 postgres`FileClose(file=1) at fd.c:1987 [opt] frame #9: 0x0000000108b7c3b8 postgres`CleanupTempFiles(isCommit=false, isProcExit=true) at fd.c:0 [opt] frame #10: 0x0000000108b82661 postgres`proc_exit_prepare(code=1) at ipc.c:209:3 [opt] frame #11: 0x0000000108b8253d postgres`proc_exit(code=1) at ipc.c:107:2 [opt] frame #12: 0x0000000108ce0201 postgres`errfinish(filename=<unavailable>, lineno=<unavailable>, funcname=<unavailable>)at elog.c:666:3 [opt] frame #13: 0x0000000108db362b postgres`ProcessInterrupts.cold.9 at postgres.c:3222:3 [opt] frame #14: 0x0000000108baa2a4 postgres`ProcessInterrupts at postgres.c:3218:22 [opt] frame #15: 0x0000000108b49f4a postgres`sendDir(path=".", basepathlen=1, sizeonly=false, tablespaces=0x00007fb47f0719c8,sendtblspclinks=true, manifest=<unavailable>, spcoid=0x0000000000000000) at basebackup.c:1277:3[opt] frame #16: 0x0000000108b48dc8 postgres`perform_base_backup(opt=0x00007ffee73ae0a0) at basebackup.c:432:5 [opt] frame #17: 0x0000000108b47bf0 postgres`SendBaseBackup(cmd=<unavailable>) at basebackup.c:949:2 [opt] frame #18: 0x0000000108b58055 postgres`exec_replication_command(cmd_string="BASE_BACKUP LABEL 'pg_basebackup base backup'PROGRESS NOWAIT MANIFEST 'yes' ") at walsender.c:1625:4 [opt] frame #19: 0x0000000108bac8aa postgres`PostgresMain(argc=<unavailable>, argv=<unavailable>, dbname=<unavailable>, username=<unavailable>)at postgres.c:4484:12 [opt] frame #20: 0x0000000108b178db postgres`BackendRun(port=<unavailable>) at postmaster.c:4519:2 [opt] frame #21: 0x0000000108b17178 postgres`ServerLoop [inlined] BackendStartup(port=<unavailable>) at postmaster.c:4241:3[opt] frame #22: 0x0000000108b17154 postgres`ServerLoop at postmaster.c:1758 [opt] frame #23: 0x0000000108b1421b postgres`PostmasterMain(argc=4, argv=0x00007fb47ec06640) at postmaster.c:1430:11 [opt] frame #24: 0x0000000108a6ac13 postgres`main(argc=<unavailable>, argv=<unavailable>) at main.c:199:3 [opt] frame #25: 0x00007fff20388f3d libdyld.dylib`start + 1 I see two core dumps with what seem to be this same trace after running pg_basebackup's tests. regards, tom lane
Hi, On 2021-08-06 23:22:36 -0400, Tom Lane wrote: > I wrote: > > Guessing the common factor is "macOS", but that's just a guess. > > I can poke into it tomorrow. > > I did try it real quick on my Mac laptop, and that fails too. > Here's a more accurate backtrace, in case that helps. Thanks! I now managed to reproduce it on a CI system (after some initial failed attempts due to pg_upgrade tests failing due to path length issues): https://api.cirrus-ci.com/v1/artifact/task/4863568911794176/log/src/bin/pg_basebackup/tmp_check/log/010_pg_basebackup_main.log 2021-08-06 21:18:40.096 PDT [22031] 010_pg_basebackup.pl LOG: received replication command: BASE_BACKUP LABEL 'pg_basebackupbase backup' PROGRESS NOWAIT MANIFEST 'yes' 2021-08-06 21:18:40.096 PDT [22031] 010_pg_basebackup.pl STATEMENT: BASE_BACKUP LABEL 'pg_basebackup base backup' PROGRESS NOWAIT MANIFEST 'yes' 2021-08-06 21:18:42.210 PDT [22031] 010_pg_basebackup.pl LOG: could not send data to client: Broken pipe 2021-08-06 21:18:42.210 PDT [22031] 010_pg_basebackup.pl STATEMENT: BASE_BACKUP LABEL 'pg_basebackup base backup' PROGRESS NOWAIT MANIFEST 'yes' 2021-08-06 21:18:42.210 PDT [22031] 010_pg_basebackup.pl FATAL: connection to client lost 2021-08-06 21:18:42.210 PDT [22031] 010_pg_basebackup.pl STATEMENT: BASE_BACKUP LABEL 'pg_basebackup base backup' PROGRESS NOWAIT MANIFEST 'yes' TRAP: FailedAssertion("pgstat_is_initialized && !pgstat_is_shutdown", File: "pgstat.c", Line: 4810, PID: 22031) vs what I locally get: 2021-08-06 20:52:06.163 PDT [1397252] 010_pg_basebackup.pl LOG: received replication command: BASE_BACKUP LABEL 'pg_basebackupbase backup' PROGRESS NOWAIT MANIFEST 'yes' 2021-08-06 20:52:06.163 PDT [1397252] 010_pg_basebackup.pl STATEMENT: BASE_BACKUP LABEL 'pg_basebackup base backup' PROGRESS NOWAIT MANIFEST 'yes' 2021-08-06 20:52:08.189 PDT [1397252] 010_pg_basebackup.pl LOG: could not send data to client: Broken pipe 2021-08-06 20:52:08.189 PDT [1397252] 010_pg_basebackup.pl STATEMENT: BASE_BACKUP LABEL 'pg_basebackup base backup' PROGRESS NOWAIT MANIFEST 'yes' 2021-08-06 20:52:08.189 PDT [1397252] 010_pg_basebackup.pl ERROR: base backup could not send data, aborting backup 2021-08-06 20:52:08.189 PDT [1397252] 010_pg_basebackup.pl STATEMENT: BASE_BACKUP LABEL 'pg_basebackup base backup' PROGRESS NOWAIT MANIFEST 'yes' 2021-08-06 20:52:08.189 PDT [1397252] 010_pg_basebackup.pl FATAL: connection to client lost Note that OSX version doesn't have the "base backup could not send data, aborting backup" message. I guess that is what leads to the difference in behaviour: The temp file is created by InitializeBackupManifest(). In the !OSX case, we first abort via an ERROR, which triggers the cleanup via WalSndResourceCleanup(). On OSX however, we immediately error out with FATAL for some reason (timing? network buffering differences?), which will never reach WalSndErrorCleanup(). Therefore the temp file only gets deleted during proc_exit(), which triggers the issue... Not yet really sure what the best way to deal with this is. Presumably this issue would be fixed if AtProcExit_Files()/CleanupTempFiles() were scheduled via before_shmem_exit(). And perhaps it's not too off to schedule CleanupTempFiles() there - but it doesn't quite seem entirely right either. I'd kinda like to avoid having to overhaul the process exit infrastructure as a prerequisite to getting the shared memory stats patch in :(. Greetings, Andres Freund
Hi, On 2021-08-06 21:49:52 -0700, Andres Freund wrote: > The temp file is created by InitializeBackupManifest(). In the !OSX case, we > first abort via an ERROR, which triggers the cleanup via > WalSndResourceCleanup(). On OSX however, we immediately error out with FATAL > for some reason (timing? network buffering differences?), which will never > reach WalSndErrorCleanup(). Therefore the temp file only gets deleted during > proc_exit(), which triggers the issue... > > Not yet really sure what the best way to deal with this is. Presumably this > issue would be fixed if AtProcExit_Files()/CleanupTempFiles() were scheduled > via before_shmem_exit(). And perhaps it's not too off to schedule > CleanupTempFiles() there - but it doesn't quite seem entirely right either. Huh. I just noticed that AtProcExit_Files() is not even scheduled via on_shmem_exit() but on_proc_exit(). That means that even before fb2c5028e63 we sent pgstat messages *well* after pgstat_shutdown_hook() already ran. Crufty. Just hacking in an earlier CleanupTempFiles() does "fix" the OSX issue: https://cirrus-ci.com/task/5941265494704128?logs=macos_basebackup#L4 I'm inclined to leave things as-is until tomorrow to see if other things are shaken loose and then either commit a bandaid along those lines or revert the patch. Or something proper if we can figure it out till then. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2021-08-06 21:49:52 -0700, Andres Freund wrote: >> Not yet really sure what the best way to deal with this is. Presumably this >> issue would be fixed if AtProcExit_Files()/CleanupTempFiles() were scheduled >> via before_shmem_exit(). And perhaps it's not too off to schedule >> CleanupTempFiles() there - but it doesn't quite seem entirely right either. > Huh. I just noticed that AtProcExit_Files() is not even scheduled via > on_shmem_exit() but on_proc_exit(). That means that even before fb2c5028e63 > we sent pgstat messages *well* after pgstat_shutdown_hook() already > ran. Crufty. > Just hacking in an earlier CleanupTempFiles() does "fix" the OSX issue: > https://cirrus-ci.com/task/5941265494704128?logs=macos_basebackup#L4 So if I have the lay of the land correctly: 1. Somebody decided it'd be a great idea for temp file cleanup to send stats collector messages. 2. Temp file cleanup happens after shmem disconnection. 3. This accidentally worked, up to now, because stats transmission happens via a separate socket not shared memory. 4. We can't keep both #1 and #2 if we'd like to switch to shmem-based stats collection. Intuitively it seems like temp file management should be a low-level, backend-local function and therefore should be okay to run after shmem disconnection. I do not have a warm feeling about reversing that module layering --- what's to stop someone from breaking things by trying to use a temp file in their on_proc_exit or on_shmem_exit hook? Maybe what needs to go overboard is point 1. More generally, this points up the fact that we don't have a well-defined module hierarchy that would help us understand what code can safely do which. I'm not volunteering to design that, but maybe it needs to happen soon. regards, tom lane
Hi, On 2021-08-07 11:43:07 -0400, Tom Lane wrote: > So if I have the lay of the land correctly: > > 1. Somebody decided it'd be a great idea for temp file cleanup to send > stats collector messages. > > 2. Temp file cleanup happens after shmem disconnection. > > 3. This accidentally worked, up to now, because stats transmission happens > via a separate socket not shared memory. > > 4. We can't keep both #1 and #2 if we'd like to switch to shmem-based > stats collection. Sounds accurate to me. > Intuitively it seems like temp file management should be a low-level, > backend-local function and therefore should be okay to run after > shmem disconnection. I do not have a warm feeling about reversing that > module layering --- what's to stop someone from breaking things by > trying to use a temp file in their on_proc_exit or on_shmem_exit hook? We could just add an assert preventing that from happening. It's hard to believe that there could be a good reason to use temp files in those hook points. I'm somewhat inclined to split InitFileAccess() into two by separating out InitTemporaryFileAccess() or such. InitFileAccess() would continue to happen early and register a proc exit hook that errors out when there's a temp file (as a backstop for non cassert builds). The new InitTemporaryFileAccess() would happen a bit later and schedule CleanupTempFiles() to happen via before_shmem_access(). And we add a Assert(!proc_exit_inprogress) to the routines for opening a temp file. > Maybe what needs to go overboard is point 1. Keeping stats of temp files seems useful enough that I'm a bit hesitant to go there. I guess we could just prevent pgstats_report_tempfile() from being called when CleanupTempFiles() is called during proc exit, but that doesn't seem great. > More generally, this points up the fact that we don't have a well-defined > module hierarchy that would help us understand what code can safely do which. > I'm not volunteering to design that, but maybe it needs to happen soon. I agree. Part of the reason for whacking around process startup (in both pushed and still pending commits) was that previously it wasn't just poorly defined, it differed significantly between platforms. And I'm quite unhappy with the vagueness in which we defined the meaning of the various shutdown callbacks ([1]). I suspect to even get to the point of doing a useful redesign of the module hierarchy, we'd need to unify more of the process initialization between EXEC_BACKEND and normal builds. I've bitten by all this often enough to be motivated to propose something. However I want to get the basics of the shared memory stats stuff in first - it's a pain to keep it upated, and we'll need to find and solve all of the issues it has anyway, even if we go for a redesign of module / startup / shutdown layering. Greetings, Andres Freund [1] https://www.postgresql.org/message-id/20210803023612.iziacxk5syn2r4ut%40alap3.anarazel.de
Andres Freund <andres@anarazel.de> writes: > On 2021-08-07 11:43:07 -0400, Tom Lane wrote: >> Intuitively it seems like temp file management should be a low-level, >> backend-local function and therefore should be okay to run after >> shmem disconnection. I do not have a warm feeling about reversing that >> module layering --- what's to stop someone from breaking things by >> trying to use a temp file in their on_proc_exit or on_shmem_exit hook? > We could just add an assert preventing that from happening. It's hard to > believe that there could be a good reason to use temp files in those hook > points. The bigger picture here is that anyplace anybody ever wants to add stats collection in suddenly becomes a "must run before shmem disconnection" module. I think that way madness lies --- we can't have the entire backend shut down before shmem disconnection. So I feel like there's a serious design problem here, and it's not confined to temp files, or at least it won't stay confined to temp files. (There may indeed be other problems already, that we just haven't had the good luck to have buildfarm timing vagaries expose to us.) Maybe the solution is to acknowledge that we might lose some events during backend shutdown, and redefine the behavior as "we ignore event reports after pgstat shutdown", not "we assert that there never can be any such reports". > I'm somewhat inclined to split InitFileAccess() into two by separating out > InitTemporaryFileAccess() or such. InitFileAccess() would continue to happen > early and register a proc exit hook that errors out when there's a temp file > (as a backstop for non cassert builds). The new InitTemporaryFileAccess() > would happen a bit later and schedule CleanupTempFiles() to happen via > before_shmem_access(). And we add a Assert(!proc_exit_inprogress) to the > routines for opening a temp file. Maybe that would work, but after you multiply it by a bunch of different scenarios in different modules, it's going to get less and less attractive. > I've bitten by all this often enough to be motivated to propose > something. However I want to get the basics of the shared memory stats stuff > in first - it's a pain to keep it upated, and we'll need to find and solve all > of the issues it has anyway, even if we go for a redesign of module / startup > / shutdown layering. Fair. But I suggest that the first cut should look more like what I suggest above, ie just be willing to lose events during shutdown. The downsides of that are not so enormous that we should be willing to undertake major klugery to avoid it before we've even got a semi-working system. regards, tom lane
Hi, On 2021-08-07 13:06:47 -0400, Tom Lane wrote: > The bigger picture here is that anyplace anybody ever wants to add stats > collection in suddenly becomes a "must run before shmem disconnection" > module. I think that way madness lies --- we can't have the entire > backend shut down before shmem disconnection. So I feel like there's > a serious design problem here, and it's not confined to temp files, > or at least it won't stay confined to temp files. (There may indeed > be other problems already, that we just haven't had the good luck to > have buildfarm timing vagaries expose to us.) I think more often it should not end up as "must run before shmem disconnection" as a whole, but should be split into a portion running at that point. > Maybe the solution is to acknowledge that we might lose some events > during backend shutdown, and redefine the behavior as "we ignore > event reports after pgstat shutdown", not "we assert that there never > can be any such reports". I think it's fine to make such calls, but that it ought to reside in the stats emitting modules. Only it can decide whether needing to emit stats during shutdown is a rare edge case or a commonly expected path. E.g. the case of parallel worker shutdown sending stats too late is something common enough to be problematic, so I don't want to make it too hard to detect such cases. > > I'm somewhat inclined to split InitFileAccess() into two by separating out > > InitTemporaryFileAccess() or such. InitFileAccess() would continue to happen > > early and register a proc exit hook that errors out when there's a temp file > > (as a backstop for non cassert builds). The new InitTemporaryFileAccess() > > would happen a bit later and schedule CleanupTempFiles() to happen via > > before_shmem_access(). And we add a Assert(!proc_exit_inprogress) to the > > routines for opening a temp file. > > Maybe that would work, but after you multiply it by a bunch of different > scenarios in different modules, it's going to get less and less attractive. I'm not quite convinced. Even if we had a nicer ordering / layering of subsystems, we'd still have to deal with subsystems that don't fit nicely because they have conflicting needs. And we'd still need detection of use of subsystems that already have not been initialized / are already shut down. One example of that is imo fd.c being a very low level module that might be needed during shutdown of other modules but which also currently depends on the stats subsystem for temp file management. Which doesn't really make sense, because stats very well could depend on fd.c routines. I think needing to split the current fd.c subsystem into a lower-level (file access) and a higher level (temporary file management) module is precisely what a better designed module layering system will *force* us to do. > > I've bitten by all this often enough to be motivated to propose > > something. However I want to get the basics of the shared memory stats stuff > > in first - it's a pain to keep it upated, and we'll need to find and solve all > > of the issues it has anyway, even if we go for a redesign of module / startup > > / shutdown layering. > > Fair. But I suggest that the first cut should look more like what > I suggest above, ie just be willing to lose events during shutdown. > The downsides of that are not so enormous that we should be willing > to undertake major klugery to avoid it before we've even got a > semi-working system. I think that's more likely to hide bugs unfortunately. Consider fa91d4c91f2 - I might not have found that if we had just ignored "too late" pgstats activity in pgstats.c or fd.c, and that's not an edge case. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2021-08-07 13:06:47 -0400, Tom Lane wrote: >> Fair. But I suggest that the first cut should look more like what >> I suggest above, ie just be willing to lose events during shutdown. >> The downsides of that are not so enormous that we should be willing >> to undertake major klugery to avoid it before we've even got a >> semi-working system. > I think that's more likely to hide bugs unfortunately. Consider fa91d4c91f2 - > I might not have found that if we had just ignored "too late" pgstats activity > in pgstats.c or fd.c, and that's not an edge case. Depends what you want to define as a bug. What I am not happy about is the prospect of random assertion failures for the next six months while you finish redesigning half of the system. The rest of us have work we want to get done, too. I don't object to the idea of making no-lost-events an end goal, but we are clearly not ready for that today. regards, tom lane
Hi, On 2021-08-07 13:37:16 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2021-08-07 13:06:47 -0400, Tom Lane wrote: > >> Fair. But I suggest that the first cut should look more like what > >> I suggest above, ie just be willing to lose events during shutdown. > >> The downsides of that are not so enormous that we should be willing > >> to undertake major klugery to avoid it before we've even got a > >> semi-working system. > > > I think that's more likely to hide bugs unfortunately. Consider fa91d4c91f2 - > > I might not have found that if we had just ignored "too late" pgstats activity > > in pgstats.c or fd.c, and that's not an edge case. > > Depends what you want to define as a bug. What I am not happy about > is the prospect of random assertion failures for the next six months > while you finish redesigning half of the system. The rest of us > have work we want to get done, too. I don't object to the idea of > making no-lost-events an end goal, but we are clearly not ready > for that today. I don't know what to do about that. How would we even find these cases if they aren't hit during regression tests on my machine (nor on a lot of others)? Obviously I had ran the regression tests many times before pushing the earlier changes. The check for pgstat being up is in one central place and thus easily can be turned into a warning if problems around the shutdown sequence become a frequent issue on HEAD. If you think it's better to turn that into a WARNING now and then arm into an assert later, I can live with that as well, but I don't think it'll lead to a better outcome. The shared memory stats stuff isn't my personal project - it's Horiguchi-san's. I picked it up because it seems like an important thing to address and because it'd been in maybe a dozen CFs without a lot of progress. It just turns out that there's a lot of prerequisite changes :(. Greetings, Andres Freund
Hi, On 2021-08-07 09:48:50 -0700, Andres Freund wrote: > I'm somewhat inclined to split InitFileAccess() into two by separating out > InitTemporaryFileAccess() or such. InitFileAccess() would continue to happen > early and register a proc exit hook that errors out when there's a temp file > (as a backstop for non cassert builds). The new InitTemporaryFileAccess() > would happen a bit later and schedule CleanupTempFiles() to happen via > before_shmem_access(). And we add a Assert(!proc_exit_inprogress) to the > routines for opening a temp file. Attached is a patch showing how this could look like. Note that the PANIC should likely not be that but a WARNING, but the PANIC more useful for running some initial tests... I'm not sure whether we'd want to continue having the proc exit hook? It seems to me that asserts would provide a decent enough protection against introducing new temp files during shutdown. Alternatively we could make the asserts in OpenTemporaryFile et al elog(ERROR)s, and be pretty certain that no temp files would be open too late? Greetings, Andres Freund
Вложения
Andres Freund <andres@anarazel.de> writes: > On 2021-08-07 13:37:16 -0400, Tom Lane wrote: >> Depends what you want to define as a bug. What I am not happy about >> is the prospect of random assertion failures for the next six months >> while you finish redesigning half of the system. The rest of us >> have work we want to get done, too. I don't object to the idea of >> making no-lost-events an end goal, but we are clearly not ready >> for that today. > I don't know what to do about that. How would we even find these cases if they > aren't hit during regression tests on my machine (nor on a lot of others)? The regression tests really aren't that helpful for testing the problem scenario here, which basically is SIGTERM'ing a query-in-progress. I'm rather surprised that the buildfarm managed to exercise that at all. You might try setting up a test scaffold that runs the core regression tests and SIGINT's the postmaster, or alternatively SIGTERM's some individual session, at random times partway through. Obviously this will make the regression tests report failure, but what to look for is if anything dumps core on the way out. regards, tom lane
Hi, On 2021-08-07 15:12:38 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2021-08-07 13:37:16 -0400, Tom Lane wrote: > >> Depends what you want to define as a bug. What I am not happy about > >> is the prospect of random assertion failures for the next six months > >> while you finish redesigning half of the system. The rest of us > >> have work we want to get done, too. I don't object to the idea of > >> making no-lost-events an end goal, but we are clearly not ready > >> for that today. > > > I don't know what to do about that. How would we even find these cases if they > > aren't hit during regression tests on my machine (nor on a lot of others)? > > The regression tests really aren't that helpful for testing the problem > scenario here, which basically is SIGTERM'ing a query-in-progress. > I'm rather surprised that the buildfarm managed to exercise that at all. They're also not that helpful because this problem likely is unreachable for any tempfiles other than the one in InitializeBackupManifest(). Pretty much all, or even all, the other tempfiles are cleaned up either via transaction and/or resowner cleanup. I wonder if we should do something about WalSndResourceCleanup() not being reached for FATALs? I think at least a note in WalSndResourceCleanup() commenting on that fact seems like it might be a good idea? It seems like it could eventually be a problem that the resowners added in 0d8c9c1210c4 aren't ever cleaned up in case of a FATAL error. Most resowner cleanup actions are also backstopped with some form of on-exit hook, but I don't think it's all - e.g. buffer pins aren't. I guess I should start a thread about this on -hackers... > You might try setting up a test scaffold that runs the core regression > tests and SIGINT's the postmaster, or alternatively SIGTERM's some > individual session, at random times partway through. Obviously this > will make the regression tests report failure, but what to look for > is if anything dumps core on the way out. Worth trying. Greetings, Andres Freund
Hi, On 2021-08-07 12:01:31 -0700, Andres Freund wrote: > Attached is a patch showing how this could look like. Note that the PANIC > should likely not be that but a WARNING, but the PANIC more useful for running > some initial tests... I pushed a slightly evolved version of this. As the commit message noted, this may not be the best approach, but we can revise after further discussion. > I'm not sure whether we'd want to continue having the proc exit hook? It seems > to me that asserts would provide a decent enough protection against > introducing new temp files during shutdown. > Alternatively we could make the asserts in OpenTemporaryFile et al > elog(ERROR)s, and be pretty certain that no temp files would be open too late? I ended up removing the proc exit hook and not converting the asserts to an elog(). Happy to change either. Greetings, Andres Freund
On Sat, Aug 7, 2021 at 11:43 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Intuitively it seems like temp file management should be a low-level, > backend-local function and therefore should be okay to run after > shmem disconnection. I do not have a warm feeling about reversing that > module layering --- what's to stop someone from breaking things by > trying to use a temp file in their on_proc_exit or on_shmem_exit hook? > Maybe what needs to go overboard is point 1. > > More generally, this points up the fact that we don't have a well-defined > module hierarchy that would help us understand what code can safely do which. > I'm not volunteering to design that, but maybe it needs to happen soon. Yeah, I was quite surprised when I saw this commit, because my first reaction was - why in the world would temporary file shutdown properly precede DSM shutdown, given that temporary files are a low-level mechanism? The explanation that we're trying to send statistics at that point makes sense as far as it goes, but it seems to me that we cannot be far from having a circular dependency. All we need is to have DSM require the use of temporary files, and we'll end up needing DSM shutdown to happen both before and after temporary file cleanup. /me wonders idly about dynamic_shared_memory_type=file I think that subsystems like "memory" and "files" really ought to be the lowest-level things we have, and should be shut down last. Stuff like "send a message to the stats collector" seems like a higher level thing that may require those lower-level facilities in order to operate, and must therefore be shut down first. Maybe some subsystems need to be divided into upper and lower levels to make this work, or, well, I don't know, something else. But I'm deeply suspicious that lifting stuff like this to the front of the shutdown sequence is just papering over the problem, and not actually solving it. -- Robert Haas EDB: http://www.enterprisedb.com
Hi, On 2021-08-09 11:46:30 -0400, Robert Haas wrote: > I think that subsystems like "memory" and "files" really ought to be > the lowest-level things we have, and should be shut down last. I don't disagree with that - but there's a difference between having that as an abstract goal, and having it a dependency of committing somebodies patch. > Stuff like "send a message to the stats collector" seems like a higher level > thing that may require those lower-level facilities in order to operate, and > must therefore be shut down first. Yep. > Maybe some subsystems need to be divided into upper and lower levels to make > this work, or, well, I don't know, something else. That's what I ended up doing, right? There's now InitFileAccess() and InitTemporaryFileAccess(). > But I'm deeply suspicious that lifting stuff like this to the front of the > shutdown sequence is just papering over the problem, and not actually > solving it. If you have a concrete proposal that you think makes sense to tie shared memory stats to, I'm happy to entertain it. One main motivator for b406478b87e etc was to allow rejiggering things like this more easily. Greetings, Andres Freund
Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o
От
Masahiko Sawada
Дата:
Hi, On Sun, Aug 8, 2021 at 11:24 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2021-08-07 12:01:31 -0700, Andres Freund wrote: > > Attached is a patch showing how this could look like. Note that the PANIC > > should likely not be that but a WARNING, but the PANIC more useful for running > > some initial tests... > > I pushed a slightly evolved version of this. As the commit message noted, this > may not be the best approach, but we can revise after further discussion. While testing streaming logical replication, I got another assertion failure with the current HEAD (e2ce88b58f) when the apply worker raised an error during applying spooled changes: TRAP: FailedAssertion("pgstat_is_initialized && !pgstat_is_shutdown", File: "pgstat.c", Line: 4810, PID: 11084) 0 postgres 0x000000010704fd9a ExceptionalCondition + 234 1 postgres 0x0000000106d41e62 pgstat_assert_is_up + 66 2 postgres 0x0000000106d43854 pgstat_send + 20 3 postgres 0x0000000106d4433e pgstat_report_tempfile + 94 4 postgres 0x0000000106df9519 ReportTemporaryFileUsage + 25 5 postgres 0x0000000106df945a PathNameDeleteTemporaryFile + 282 6 postgres 0x0000000106df8c7e unlink_if_exists_fname + 174 7 postgres 0x0000000106df8b3a walkdir + 426 8 postgres 0x0000000106df8982 PathNameDeleteTemporaryDir + 82 9 postgres 0x0000000106dfe591 SharedFileSetDeleteAll + 113 10 postgres 0x0000000106dfdf62 SharedFileSetDeleteOnProcExit + 66 11 postgres 0x0000000106e05275 proc_exit_prepare + 325 12 postgres 0x0000000106e050a3 proc_exit + 19 13 postgres 0x0000000106d3ba99 StartBackgroundWorker + 649 14 postgres 0x0000000106d54e85 do_start_bgworker + 613 15 postgres 0x0000000106d4ef26 maybe_start_bgworkers + 486 16 postgres 0x0000000106d4d767 sigusr1_handler + 631 17 libsystem_platform.dylib 0x00007fff736705fd _sigtramp + 29 18 ??? 0x0000000000000000 0x0 + 0 19 postgres 0x0000000106d4c990 PostmasterMain + 6640 20 postgres 0x0000000106c24fa3 main + 819 21 libdyld.dylib 0x00007fff73477cc9 start + 1 The apply worker registers SharedFileSetDeleteOnProcExit() when creating a file set to serialize the changes. When it raises an error due to conflict during applying the change, the callback eventually reports the temp file statistics but pgstat already shut down, resulting in this assertion failure. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Tue, Aug 10, 2021 at 4:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > The apply worker registers SharedFileSetDeleteOnProcExit() when > creating a file set to serialize the changes. When it raises an error > due to conflict during applying the change, the callback eventually > reports the temp file statistics but pgstat already shut down, > resulting in this assertion failure. > I think we can try to fix this by registering to clean up these files via before_shmem_exit() as done by Andres in commit 675c945394. Similar to that commit, we can change the function name SharedFileSetDeleteOnProcExit to SharedFileSetDeleteOnShmExit and register it via before_shmem_exit() instead of on_proc_exit(). Can you try that and see if it fixes the issue for you unless you have better ideas to try out? -- With Regards, Amit Kapila.
Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o
От
Masahiko Sawada
Дата:
On Wed, Aug 11, 2021 at 10:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Aug 10, 2021 at 4:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > The apply worker registers SharedFileSetDeleteOnProcExit() when > > creating a file set to serialize the changes. When it raises an error > > due to conflict during applying the change, the callback eventually > > reports the temp file statistics but pgstat already shut down, > > resulting in this assertion failure. > > > > I think we can try to fix this by registering to clean up these files > via before_shmem_exit() as done by Andres in commit 675c945394. > Similar to that commit, we can change the function name > SharedFileSetDeleteOnProcExit to SharedFileSetDeleteOnShmExit and > register it via before_shmem_exit() instead of on_proc_exit(). Can you > try that and see if it fixes the issue for you unless you have better > ideas to try out? It seems to me that moving the shared fileset cleanup to before_shmem_exit() is the right approach to fix this problem. The issue is fixed by the attached patch. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Вложения
On Thu, Aug 12, 2021 at 7:39 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, Aug 11, 2021 at 10:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Aug 10, 2021 at 4:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > The apply worker registers SharedFileSetDeleteOnProcExit() when > > > creating a file set to serialize the changes. When it raises an error > > > due to conflict during applying the change, the callback eventually > > > reports the temp file statistics but pgstat already shut down, > > > resulting in this assertion failure. > > > > > > > I think we can try to fix this by registering to clean up these files > > via before_shmem_exit() as done by Andres in commit 675c945394. > > Similar to that commit, we can change the function name > > SharedFileSetDeleteOnProcExit to SharedFileSetDeleteOnShmExit and > > register it via before_shmem_exit() instead of on_proc_exit(). Can you > > try that and see if it fixes the issue for you unless you have better > > ideas to try out? > > It seems to me that moving the shared fileset cleanup to > before_shmem_exit() is the right approach to fix this problem. The > issue is fixed by the attached patch. +1, the fix makes sense to me. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, Aug 12, 2021 at 11:38 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Aug 12, 2021 at 7:39 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Wed, Aug 11, 2021 at 10:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Aug 10, 2021 at 4:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > The apply worker registers SharedFileSetDeleteOnProcExit() when > > > > creating a file set to serialize the changes. When it raises an error > > > > due to conflict during applying the change, the callback eventually > > > > reports the temp file statistics but pgstat already shut down, > > > > resulting in this assertion failure. > > > > > > > > > > I think we can try to fix this by registering to clean up these files > > > via before_shmem_exit() as done by Andres in commit 675c945394. > > > Similar to that commit, we can change the function name > > > SharedFileSetDeleteOnProcExit to SharedFileSetDeleteOnShmExit and > > > register it via before_shmem_exit() instead of on_proc_exit(). Can you > > > try that and see if it fixes the issue for you unless you have better > > > ideas to try out? > > > > It seems to me that moving the shared fileset cleanup to > > before_shmem_exit() is the right approach to fix this problem. The > > issue is fixed by the attached patch. > > +1, the fix makes sense to me. > I have also tested and fix works for me. The fix works because pgstat_initialize() is called before we register clean up in SharedFileSetInit(). I am not sure if we need an Assert to ensure that and if so how we can do that? Any suggestions? -- With Regards, Amit Kapila.
Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o
От
Masahiko Sawada
Дата:
On Thu, Aug 12, 2021 at 3:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Aug 12, 2021 at 11:38 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Thu, Aug 12, 2021 at 7:39 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > On Wed, Aug 11, 2021 at 10:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Tue, Aug 10, 2021 at 4:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > The apply worker registers SharedFileSetDeleteOnProcExit() when > > > > > creating a file set to serialize the changes. When it raises an error > > > > > due to conflict during applying the change, the callback eventually > > > > > reports the temp file statistics but pgstat already shut down, > > > > > resulting in this assertion failure. > > > > > > > > > > > > > I think we can try to fix this by registering to clean up these files > > > > via before_shmem_exit() as done by Andres in commit 675c945394. > > > > Similar to that commit, we can change the function name > > > > SharedFileSetDeleteOnProcExit to SharedFileSetDeleteOnShmExit and > > > > register it via before_shmem_exit() instead of on_proc_exit(). Can you > > > > try that and see if it fixes the issue for you unless you have better > > > > ideas to try out? > > > > > > It seems to me that moving the shared fileset cleanup to > > > before_shmem_exit() is the right approach to fix this problem. The > > > issue is fixed by the attached patch. > > > > +1, the fix makes sense to me. > > > > I have also tested and fix works for me. The fix works because > pgstat_initialize() is called before we register clean up in > SharedFileSetInit(). I am not sure if we need an Assert to ensure that > and if so how we can do that? Any suggestions? I think that the assertion added by ee3f8d3d3ae ensures that pgstat_initialize() is callbed before the callback for fileset cleanup is registered, no? Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Thu, Aug 12, 2021 at 1:13 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Aug 12, 2021 at 3:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Aug 12, 2021 at 11:38 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Thu, Aug 12, 2021 at 7:39 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > On Wed, Aug 11, 2021 at 10:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > On Tue, Aug 10, 2021 at 4:37 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > > > > > > > The apply worker registers SharedFileSetDeleteOnProcExit() when > > > > > > creating a file set to serialize the changes. When it raises an error > > > > > > due to conflict during applying the change, the callback eventually > > > > > > reports the temp file statistics but pgstat already shut down, > > > > > > resulting in this assertion failure. > > > > > > > > > > > > > > > > I think we can try to fix this by registering to clean up these files > > > > > via before_shmem_exit() as done by Andres in commit 675c945394. > > > > > Similar to that commit, we can change the function name > > > > > SharedFileSetDeleteOnProcExit to SharedFileSetDeleteOnShmExit and > > > > > register it via before_shmem_exit() instead of on_proc_exit(). Can you > > > > > try that and see if it fixes the issue for you unless you have better > > > > > ideas to try out? > > > > > > > > It seems to me that moving the shared fileset cleanup to > > > > before_shmem_exit() is the right approach to fix this problem. The > > > > issue is fixed by the attached patch. > > > > > > +1, the fix makes sense to me. > > > > > > > I have also tested and fix works for me. The fix works because > > pgstat_initialize() is called before we register clean up in > > SharedFileSetInit(). I am not sure if we need an Assert to ensure that > > and if so how we can do that? Any suggestions? > > I think that the assertion added by ee3f8d3d3ae ensures that > pgstat_initialize() is callbed before the callback for fileset cleanup > is registered, no? Right, it ensures that callback for fileset, is called after pgstat_initialize() and before pgstat_shutdown. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, Aug 12, 2021 at 1:13 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Thu, Aug 12, 2021 at 3:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > I have also tested and fix works for me. The fix works because > > pgstat_initialize() is called before we register clean up in > > SharedFileSetInit(). I am not sure if we need an Assert to ensure that > > and if so how we can do that? Any suggestions? > > I think that the assertion added by ee3f8d3d3ae ensures that > pgstat_initialize() is callbed before the callback for fileset cleanup > is registered, no? > I think I am missing something here, can you please explain? -- With Regards, Amit Kapila.