Обсуждение: lockup in parallel hash join on dikkop (freebsd 14.0-current)
Hi, I received an alert dikkop (my rpi4 buildfarm animal running freebsd 14) did not report any results for a couple days, and it seems it got into an infinite loop in REL_11_STABLE when building hash table in a parallel hashjoin, or something like that. It seems to be progressing now, probably because I attached gdb to the workers to get backtraces, which does signals etc. Anyway, in 'ps ax' I saw this: 94545 - Ss 0:03.39 postgres: buildfarm regression [local] SELECT 94627 - Is 0:00.03 postgres: parallel worker for PID 94545 94628 - Is 0:00.02 postgres: parallel worker for PID 94545 and the backend was stuck waiting on this query: select final > 1 as multibatch from hash_join_batches( $$ select count(*) from join_foo left join (select b1.id, b1.t from join_bar b1 join join_bar b2 using (id)) ss on join_foo.id < ss.id + 1 and join_foo.id > ss.id - 1; $$); This started on 2023-01-20 23:23:18.125, and the next log (after I did the gdb stuff), is from 2023-01-26 20:05:16.751. Quite a bit of time. It seems all three processes are doing WaitEventSetWait, either through a ConditionVariable, or WaitLatch. But I don't have any good idea of what might have broken - and as it got "unstuck" I can't investigate more. But I see there's nodeHash and parallelism, and I recall there's a lot of gotchas due to how the backends cooperate when building the hash table, etc. Thomas, any idea what might be wrong? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > I received an alert dikkop (my rpi4 buildfarm animal running freebsd 14) > did not report any results for a couple days, and it seems it got into > an infinite loop in REL_11_STABLE when building hash table in a parallel > hashjoin, or something like that. > It seems to be progressing now, probably because I attached gdb to the > workers to get backtraces, which does signals etc. That reminds me of cases that I saw several times on my now-deceased animal florican: https://www.postgresql.org/message-id/flat/2245838.1645902425%40sss.pgh.pa.us There's clearly something rotten somewhere in there, but whether it's our bug or FreeBSD's isn't clear. regards, tom lane
On Fri, Jan 27, 2023 at 9:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > > I received an alert dikkop (my rpi4 buildfarm animal running freebsd 14) > > did not report any results for a couple days, and it seems it got into > > an infinite loop in REL_11_STABLE when building hash table in a parallel > > hashjoin, or something like that. > > > It seems to be progressing now, probably because I attached gdb to the > > workers to get backtraces, which does signals etc. > > That reminds me of cases that I saw several times on my now-deceased > animal florican: > > https://www.postgresql.org/message-id/flat/2245838.1645902425%40sss.pgh.pa.us > > There's clearly something rotten somewhere in there, but whether > it's our bug or FreeBSD's isn't clear. And if it's ours, it's possibly in latch code and not anything higher (I mean, not in condition variables, barriers, or parallel hash join) because I saw a similar hang in the shm_mq stuff which uses the latch API directly. Note that 13 switched to kqueue but still used the self-pipe, and 14 switched to a signal event, and this hasn't been reported in those releases or later, which makes the poll() code path a key suspect.
On Fri, Jan 27, 2023 at 9:57 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Fri, Jan 27, 2023 at 9:49 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > > > I received an alert dikkop (my rpi4 buildfarm animal running freebsd 14) > > > did not report any results for a couple days, and it seems it got into > > > an infinite loop in REL_11_STABLE when building hash table in a parallel > > > hashjoin, or something like that. > > > > > It seems to be progressing now, probably because I attached gdb to the > > > workers to get backtraces, which does signals etc. > > > > That reminds me of cases that I saw several times on my now-deceased > > animal florican: > > > > https://www.postgresql.org/message-id/flat/2245838.1645902425%40sss.pgh.pa.us > > > > There's clearly something rotten somewhere in there, but whether > > it's our bug or FreeBSD's isn't clear. > > And if it's ours, it's possibly in latch code and not anything higher > (I mean, not in condition variables, barriers, or parallel hash join) > because I saw a similar hang in the shm_mq stuff which uses the latch > API directly. Note that 13 switched to kqueue but still used the > self-pipe, and 14 switched to a signal event, and this hasn't been > reported in those releases or later, which makes the poll() code path > a key suspect. Also, 14 changed the flag/memory barrier dance (maybe_sleeping), but 13 did it the same way as 11 + 12. So between 12 and 13 we have just the poll -> kqueue change.
After 1000 make check loops, and 1000 make -C src/test/modules/test_shm_mq check loops, on the same FBSD 13.1 machine as elver which has failed like this once before, I haven't been able to reproduce this on REL_12_STABLE. Not really sure how to chase this, but if you see this situation again, I'd been interested to see the output of fstat -p PID (shows bytes in pipes) and procstat -j PID (shows pending signals) for all PIDs involved (before connecting a debugger or doing anything else that might make it return with EINTR, after which we know it continues happily because it then sees latch->is_set next time around the loop). If poll() is not returning when there are bytes ready to read from the self-pipe, which fstat can show, I think that'd indicate a kernel bug. If procstat -j shows signals pending but somehow it's still blocked in the syscall. Otherwise, it might indicate a compiler or postgres bug, but I don't have any particular theories.
Hi, On 2023-01-27 22:23:58 +1300, Thomas Munro wrote: > After 1000 make check loops, and 1000 make -C src/test/modules/test_shm_mq > check loops, on the same FBSD 13.1 machine as elver which has failed > like this once before, I haven't been able to reproduce this on > REL_12_STABLE. Did you use the same compiler / compilation flags as when elver hit it? Clearly Tomas' case was with at least some optimizations enabled. Except that you're saying that you hit this on elver (amd64), I think it'd be interesting that we see the failure on an arm host, which has a less strict memory order model than x86. IIUC elver previously hit this on 12? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > Except that you're saying that you hit this on elver (amd64), I think it'd be > interesting that we see the failure on an arm host, which has a less strict > memory order model than x86. I also saw it on florican, which is/was an i386 machine using clang and pretty standard build options other than 'CFLAGS' => '-msse2 -O2', so I think this isn't too much about machine architecture or compiler flags. Machine speed might matter though. elver is a good deal faster than florican was, and dikkop is slower yet. I gather Thomas has seen this only once on elver, but I saw it maybe a dozen times over a couple of years on florican, and now dikkop has hit it after not so many runs. regards, tom lane
Hi, On 2023-01-27 23:18:39 -0500, Tom Lane wrote: > I also saw it on florican, which is/was an i386 machine using clang and > pretty standard build options other than > 'CFLAGS' => '-msse2 -O2', > so I think this isn't too much about machine architecture or compiler > flags. Ah. Florican dropped of the BF status page and I was too lazy to look deeper. You have a penchant for odd architectures, so it didn't seem too crazy :) > Machine speed might matter though. elver is a good deal faster than > florican was, and dikkop is slower yet. I gather Thomas has seen this > only once on elver, but I saw it maybe a dozen times over a couple of > years on florican, and now dikkop has hit it after not so many runs. Re-reading the old thread, it is interesting that you tried hard to reproduce it outside of the BF, without success: https://postgr.es/m/2398828.1646000688%40sss.pgh.pa.us Such problems are quite annoying. Last time I hit such a case was https://postgr.es/m/20220325052654.3xpbmntatyofau2w%40alap3.anarazel.de but I can't see anything like that being the issue here. Greetings, Andres Freund
On Sat, Jan 28, 2023 at 4:42 PM Andres Freund <andres@anarazel.de> wrote: > Did you use the same compiler / compilation flags as when elver hit it? > Clearly Tomas' case was with at least some optimizations enabled. I did use the same compiler version and optimisation level, clang llvmorg-13.0.0-0-gd7b669b3a303 at -O2.
On 1/28/23 05:53, Andres Freund wrote: > Hi, > > On 2023-01-27 23:18:39 -0500, Tom Lane wrote: >> I also saw it on florican, which is/was an i386 machine using clang and >> pretty standard build options other than >> 'CFLAGS' => '-msse2 -O2', >> so I think this isn't too much about machine architecture or compiler >> flags. > > Ah. Florican dropped of the BF status page and I was too lazy to look > deeper. You have a penchant for odd architectures, so it didn't seem too crazy > :) > > >> Machine speed might matter though. elver is a good deal faster than >> florican was, and dikkop is slower yet. I gather Thomas has seen this >> only once on elver, but I saw it maybe a dozen times over a couple of >> years on florican, and now dikkop has hit it after not so many runs. > > Re-reading the old thread, it is interesting that you tried hard to reproduce > it outside of the BF, without success: > https://postgr.es/m/2398828.1646000688%40sss.pgh.pa.us > > Such problems are quite annoying. Last time I hit such a case was > https://postgr.es/m/20220325052654.3xpbmntatyofau2w%40alap3.anarazel.de > but I can't see anything like that being the issue here. > FWIW I'll wait for dikkop to finish the current buildfarm run (it's currently chewing on HEAD) and then will try to do runs of the 'joins' test in a loop. That's where dikkop got stuck before. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 1/28/23 13:05, Tomas Vondra wrote: > > FWIW I'll wait for dikkop to finish the current buildfarm run (it's > currently chewing on HEAD) and then will try to do runs of the 'joins' > test in a loop. That's where dikkop got stuck before. > So I did that - same configure options as the buildfarm client, and a 'make check' (with only tests up to the 'join' suite, because that's where it got stuck before). And it took only ~15 runs (~1h) to hit this again on dikkop. As before, there are three processes - leader + 2 workers, but the query is different - this time it's this one: -- A couple of other hash join tests unrelated to work_mem management. -- Check that EXPLAIN ANALYZE has data even if the leader doesn't participate savepoint settings; set local max_parallel_workers_per_gather = 2; set local work_mem = '4MB'; set local parallel_leader_participation = off; select * from hash_join_batches( $$ select count(*) from simple r join simple s using (id); $$); I managed to collect the fstat/procstat stuff Thomas asked for, and the backtraces - attached. I still have the core files, in case we look at something. As before, running gcore on the second worker (29081) gets this unstuck - it sends some signal that apparently wakes it up. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
On Mon, Jan 30, 2023 at 1:53 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > So I did that - same configure options as the buildfarm client, and a > 'make check' (with only tests up to the 'join' suite, because that's > where it got stuck before). And it took only ~15 runs (~1h) to hit this > again on dikkop. That's good news. > I managed to collect the fstat/procstat stuff Thomas asked for, and the > backtraces - attached. I still have the core files, in case we look at > something. As before, running gcore on the second worker (29081) gets > this unstuck - it sends some signal that apparently wakes it up. Thanks! As expected, no bytes in the pipe for any those processes. Unfortunately I gave the wrong procstat command, it should be -i, not -j. Does "procstat -i /path/to/core | grep USR1" show P (pending) for that stuck process? Silly question really, I don't really expect poll() to be misbehaving in such a basic way. I was talking to Andres on IM about this yesterday and he pointed out a potential out-of-order hazard: WaitEventSetWait() sets "waiting" (to tell the signal handler to write to the self-pipe) and then reads latch->is_set with neither compiler nor memory barrier, which doesn't seem right because we might see a value of latch->is_set from before "waiting" was true, and yet the signal handler might also have run while "waiting" was false so the self-pipe doesn't save us, despite the length of the comment about that. Can you reproduce it with this change? --- a/src/backend/storage/ipc/latch.c +++ b/src/backend/storage/ipc/latch.c @@ -1011,6 +1011,7 @@ WaitEventSetWait(WaitEventSet *set, long timeout, * ordering, so that we cannot miss seeing is_set if a notificat ion * has already been queued. */ + pg_memory_barrier(); if (set->latch && set->latch->is_set) { occurred_events->fd = PGINVALID_SOCKET;
On 1/29/23 18:26, Thomas Munro wrote: > On Mon, Jan 30, 2023 at 1:53 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> So I did that - same configure options as the buildfarm client, and a >> 'make check' (with only tests up to the 'join' suite, because that's >> where it got stuck before). And it took only ~15 runs (~1h) to hit this >> again on dikkop. > > That's good news. > >> I managed to collect the fstat/procstat stuff Thomas asked for, and the >> backtraces - attached. I still have the core files, in case we look at >> something. As before, running gcore on the second worker (29081) gets >> this unstuck - it sends some signal that apparently wakes it up. > > Thanks! As expected, no bytes in the pipe for any those processes. > Unfortunately I gave the wrong procstat command, it should be -i, not > -j. Does "procstat -i /path/to/core | grep USR1" show P (pending) for > that stuck process? Silly question really, I don't really expect > poll() to be misbehaving in such a basic way. > It shows "--C" for all three processes, which should mean "will be caught". > I was talking to Andres on IM about this yesterday and he pointed out > a potential out-of-order hazard: WaitEventSetWait() sets "waiting" (to > tell the signal handler to write to the self-pipe) and then reads > latch->is_set with neither compiler nor memory barrier, which doesn't > seem right because we might see a value of latch->is_set from before > "waiting" was true, and yet the signal handler might also have run > while "waiting" was false so the self-pipe doesn't save us, despite > the length of the comment about that. Can you reproduce it with this > change? > Will do, but I'll wait for another lockup to see how frequent it actually is. I'm now at ~90 runs total, and it didn't happen again yet. So hitting it after 15 runs might have been a bit of a luck. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2023-01-30 06:26:02 +1300, Thomas Munro wrote: > On Mon, Jan 30, 2023 at 1:53 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: > > So I did that - same configure options as the buildfarm client, and a > > 'make check' (with only tests up to the 'join' suite, because that's > > where it got stuck before). And it took only ~15 runs (~1h) to hit this > > again on dikkop. > > That's good news. Indeed. As annoying as it is, it might be worth reproing it once or twice more, just to have a feeling for how long we need to run to have confidence in a fix. > I was talking to Andres on IM about this yesterday and he pointed out > a potential out-of-order hazard: WaitEventSetWait() sets "waiting" (to > tell the signal handler to write to the self-pipe) and then reads > latch->is_set with neither compiler nor memory barrier, which doesn't > seem right because we might see a value of latch->is_set from before > "waiting" was true, and yet the signal handler might also have run > while "waiting" was false so the self-pipe doesn't save us, despite > the length of the comment about that. Can you reproduce it with this > change? > > --- a/src/backend/storage/ipc/latch.c > +++ b/src/backend/storage/ipc/latch.c > @@ -1011,6 +1011,7 @@ WaitEventSetWait(WaitEventSet *set, long timeout, > * ordering, so that we cannot miss seeing is_set if a notificat > ion > * has already been queued. > */ > + pg_memory_barrier(); > if (set->latch && set->latch->is_set) > { > occurred_events->fd = PGINVALID_SOCKET; I think we need a barrier in SetLatch(), after is_set = true. We have that in some of the newer branches (due to the maybe_sleeping logic), but not in the older branches. Greetings, Andres Freund
Hi, On 2023-01-29 18:39:05 +0100, Tomas Vondra wrote: > Will do, but I'll wait for another lockup to see how frequent it > actually is. I'm now at ~90 runs total, and it didn't happen again yet. > So hitting it after 15 runs might have been a bit of a luck. Was there a difference in how much load there was on the machine between "reproduced in 15 runs" and "not reproed in 90"? If indeed lack of barriers is related to the issue, an increase in context switches could substantially change the behaviour (in both directions). More intra-process context switches can amount to "probabilistic barriers" because that'll be a barrier. At the same time it can make it more likely that the relatively narrow window in WaitEventSetWait() is hit, or lead to larger delays processing signals. Greetings, Andres Freund
On 1/29/23 18:53, Andres Freund wrote: > Hi, > > On 2023-01-29 18:39:05 +0100, Tomas Vondra wrote: >> Will do, but I'll wait for another lockup to see how frequent it >> actually is. I'm now at ~90 runs total, and it didn't happen again yet. >> So hitting it after 15 runs might have been a bit of a luck. > > Was there a difference in how much load there was on the machine between > "reproduced in 15 runs" and "not reproed in 90"? If indeed lack of barriers > is related to the issue, an increase in context switches could substantially > change the behaviour (in both directions). More intra-process context > switches can amount to "probabilistic barriers" because that'll be a > barrier. At the same time it can make it more likely that the relatively > narrow window in WaitEventSetWait() is hit, or lead to larger delays > processing signals. > No. The only thing the machine is doing is while /usr/bin/true; do make check done I can't reduce the workload further, because the "join" test is in a separate parallel group (I cut down parallel_schedule). I could make the machine busier, of course. However, the other lockup I saw was when using serial_schedule, so I guess lower concurrency makes it more likely. But who knows ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Jan 30, 2023 at 7:08 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > However, the other lockup I saw was when using serial_schedule, so I > guess lower concurrency makes it more likely. FWIW "psql db -f src/test/regress/sql/join_hash.sql | cat" also works (I mean, it's self-contained and doesn't need anything else from make check; pipe to cat just disables the pager); that's how I've been trying (and failing) to reproduce this on various computers. I also did a lot of "make -C src/test/module/test_shm_mq installcheck" loops, at the same time, because that's where my animal hung.
On Mon, Jan 30, 2023 at 6:26 AM Thomas Munro <thomas.munro@gmail.com> wrote: > out-of-order hazard I've been trying to understand how that could happen, but my CPU-fu is weak. Let me try to write an argument for why it can't happen, so that later I can look back at how stupid and naive I was. We have A B, and if the CPU sees no dependency and decides to execute B A (pipelined), shouldn't an interrupt either wait for the whole schemozzle to commit first (if not in a hurry), or nuke it, handle the IPI and restart, or something? After an hour of reviewing random slides from classes on out-of-order execution and reorder buffers and the like, I think the term for making sure that interrupts run with the illusion of in-order execution maintained is called "precise interrupts", and it is expected in all modern architectures, after the early OoO pioneers lost their minds trying to program without it. I guess generally you want that because it would otherwise run your interrupt handler in a completely uncertain environment, and specifically in this case it would reach our signal handler which reads A's output (waiting) and writes to B's input (is_set), so B IPI A surely shouldn't be allowed? As for compiler barriers, I see that elver's compiler isn't reordering the code. Maybe it's a much dumber sort of a concurrency problem: stale cache line due to missing barrier, but... commit db0f6cad488 made us also set our own latch (a second time) when someone sets our latch in releases 9.something to 13. Which should mean that we're guaranteed to see is_set = true in the scenario described, because we'll clobber it ourselves if we have to, for good measure. If our secondary SetLatch() sees it's already set and decides not to set it, then it's possible that the code we interrupted was about to run ResetLatch(), but any code doing that must next check its expected exit condition (or it has a common-or-garden latch protocol bug, as has been discovered from time in the tree...). /me wanders away with a renewed fear of computers and the vast complexities they hide
Hi, On 2023-01-30 15:22:34 +1300, Thomas Munro wrote: > On Mon, Jan 30, 2023 at 6:26 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > out-of-order hazard > > I've been trying to understand how that could happen, but my CPU-fu is > weak. Let me try to write an argument for why it can't happen, so > that later I can look back at how stupid and naive I was. We have A > B, and if the CPU sees no dependency and decides to execute B A > (pipelined), shouldn't an interrupt either wait for the whole > schemozzle to commit first (if not in a hurry), or nuke it, handle the > IPI and restart, or something? In a core local view, yes, I think so. But I don't think that's how it can work on multi-core, and even more so, multi-socket machines. Imagine how it'd influence latency if every interrupt on any CPU would prevent all out-of-order execution on any CPU. > After an hour of reviewing randoma > slides from classes on out-of-order execution and reorder buffers and > the like, I think the term for making sure that interrupts run with > the illusion of in-order execution maintained is called "precise > interrupts", and it is expected in all modern architectures, after the > early OoO pioneers lost their minds trying to program without it. I > guess generally you want that because it would otherwise run your > interrupt handler in a completely uncertain environment, and > specifically in this case it would reach our signal handler which > reads A's output (waiting) and writes to B's input (is_set), so B IPI > A surely shouldn't be allowed? Userspace signals aren't delivered synchronously during hardware interrupts afaik - and I don't think they even possibly could be (after all the process possibly isn't scheduled). I think what you're talking about with precise interrupts above is purely about the single-core view, and mostly about hardware interrupts for faults etc. The CPU will unwind state from speculatively executed code etc on interrupt, sure - but I think that's separate from guaranteeing that you can't have stale cache contents *due to work by another CPU*. I'm not even sure that userspace signals are generally delivered via an immediate hardware interrupt, or whether they're processed at the next scheduler tick. After all, we know that multiple signals are coalesced, which certainly isn't compatible with synchronous execution. But it could be that that just happens when the target of a signal is not currently scheduled. > Maybe it's a much dumber sort of a concurrency problem: stale cache > line due to missing barrier, but... commit db0f6cad488 made us also > set our own latch (a second time) when someone sets our latch in > releases 9.something to 13. But this part does indeed put a crimp on some potential theories. TBH, I'd be in favor of just adding the barriers for good measure, even if we don't know if it's a live bug today - it seems incredibly fragile. Greetings, Andres Freund
On Mon, Jan 30, 2023 at 6:36 PM Andres Freund <andres@anarazel.de> wrote: > On 2023-01-30 15:22:34 +1300, Thomas Munro wrote: > > On Mon, Jan 30, 2023 at 6:26 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > > out-of-order hazard > > > > I've been trying to understand how that could happen, but my CPU-fu is > > weak. Let me try to write an argument for why it can't happen, so > > that later I can look back at how stupid and naive I was. We have A > > B, and if the CPU sees no dependency and decides to execute B A > > (pipelined), shouldn't an interrupt either wait for the whole > > schemozzle to commit first (if not in a hurry), or nuke it, handle the > > IPI and restart, or something? > > In a core local view, yes, I think so. But I don't think that's how it can > work on multi-core, and even more so, multi-socket machines. Imagine how it'd > influence latency if every interrupt on any CPU would prevent all out-of-order > execution on any CPU. Good. Yeah, I was talking only about a single thread/core. > > After an hour of reviewing randoma > > slides from classes on out-of-order execution and reorder buffers and > > the like, I think the term for making sure that interrupts run with > > the illusion of in-order execution maintained is called "precise > > interrupts", and it is expected in all modern architectures, after the > > early OoO pioneers lost their minds trying to program without it. I > > guess generally you want that because it would otherwise run your > > interrupt handler in a completely uncertain environment, and > > specifically in this case it would reach our signal handler which > > reads A's output (waiting) and writes to B's input (is_set), so B IPI > > A surely shouldn't be allowed? > > Userspace signals aren't delivered synchronously during hardware interrupts > afaik - and I don't think they even possibly could be (after all the process > possibly isn't scheduled). Yeah, they're not synchronous and the target might not even be running. BUT if a suitable thread is running then AFAICT an IPI is delivered to that sucker to get it running the handler ASAP, at least on the three OSes I looked at. (See breadcrumbs below). > I think what you're talking about with precise interrupts above is purely > about the single-core view, and mostly about hardware interrupts for faults > etc. The CPU will unwind state from speculatively executed code etc on > interrupt, sure - but I think that's separate from guaranteeing that you can't > have stale cache contents *due to work by another CPU*. Yeah. I get the cache problem, a separate issue that does indeed look pretty dodgy. I guess I wrote my email out-of-order: at the end I speculated that cache coherency probably can't explain this failure at least in THAT bit of the source, because of that funky extra self-SetLatch(). I just got spooked by the mention of out-of-order execution and I wanted to chase it down and straighten out my understanding. > I'm not even sure that userspace signals are generally delivered via an > immediate hardware interrupt, or whether they're processed at the next > scheduler tick. After all, we know that multiple signals are coalesced, which > certainly isn't compatible with synchronous execution. But it could be that > that just happens when the target of a signal is not currently scheduled. FreeBSD: By default, they are when possible, eg if the process is currently running a suitable thread. You can set sysctl kern.smp.forward_signal_enabled=0 to turn that off, and then it works more like the way you imagined (checking for pending signals at various arbitrary times, not sure). See tdsigwakeup() -> forward_signal() -> ipi_cpu(). Linux: Well it certainly smells approximately similar. See signal_wake_up_state() -> kick_process() -> smp_send_reschedule() -> smp_cross_call() -> __ipi_send_mask(). The comment for kick_process() explains that it's using the scheduler IPI to get signals handled ASAP. Darwin: ... -> cpu_signal() -> something that talks about IPIs Coalescing is happening not only at the pending signal level (an invention of the OS), and then for the inter-processor wakeups there is also interrupt coalescing. It's latches all the way down.
On 1/29/23 19:08, Tomas Vondra wrote: > > > On 1/29/23 18:53, Andres Freund wrote: >> Hi, >> >> On 2023-01-29 18:39:05 +0100, Tomas Vondra wrote: >>> Will do, but I'll wait for another lockup to see how frequent it >>> actually is. I'm now at ~90 runs total, and it didn't happen again yet. >>> So hitting it after 15 runs might have been a bit of a luck. >> >> Was there a difference in how much load there was on the machine between >> "reproduced in 15 runs" and "not reproed in 90"? If indeed lack of barriers >> is related to the issue, an increase in context switches could substantially >> change the behaviour (in both directions). More intra-process context >> switches can amount to "probabilistic barriers" because that'll be a >> barrier. At the same time it can make it more likely that the relatively >> narrow window in WaitEventSetWait() is hit, or lead to larger delays >> processing signals. >> > > No. The only thing the machine is doing is > > while /usr/bin/true; do > make check > done > > I can't reduce the workload further, because the "join" test is in a > separate parallel group (I cut down parallel_schedule). I could make the > machine busier, of course. > > However, the other lockup I saw was when using serial_schedule, so I > guess lower concurrency makes it more likely. > FWIW the machine is now on run ~2700 without any further lockups :-/ Seems it was quite lucky we hit it twice in a handful of attempts. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2023-02-06 19:51:19 +0100, Tomas Vondra wrote: > > No. The only thing the machine is doing is > > > > while /usr/bin/true; do > > make check > > done > > > > I can't reduce the workload further, because the "join" test is in a > > separate parallel group (I cut down parallel_schedule). I could make the > > machine busier, of course. > > > > However, the other lockup I saw was when using serial_schedule, so I > > guess lower concurrency makes it more likely. > > > > FWIW the machine is now on run ~2700 without any further lockups :-/ > > Seems it was quite lucky we hit it twice in a handful of attempts. Did you cut down the workload before you reproduced it the first time, or after? It's quite possible that it's not reproducible in isolation. Greetings, Andres Freund
On 2/6/23 20:20, Andres Freund wrote: > Hi, > > On 2023-02-06 19:51:19 +0100, Tomas Vondra wrote: >>> No. The only thing the machine is doing is >>> >>> while /usr/bin/true; do >>> make check >>> done >>> >>> I can't reduce the workload further, because the "join" test is in a >>> separate parallel group (I cut down parallel_schedule). I could make the >>> machine busier, of course. >>> >>> However, the other lockup I saw was when using serial_schedule, so I >>> guess lower concurrency makes it more likely. >>> >> >> FWIW the machine is now on run ~2700 without any further lockups :-/ >> >> Seems it was quite lucky we hit it twice in a handful of attempts. > > Did you cut down the workload before you reproduced it the first time, or > after? It's quite possible that it's not reproducible in isolation. > No, I left the workload as it was for the first lockup, so `make check` runs everything as is up until the "join" test suite. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Feb 7, 2023 at 12:46 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > No, I left the workload as it was for the first lockup, so `make check` > runs everything as is up until the "join" test suite. Wait, shouldn't that be join_hash?
On 2/7/23 00:48, Thomas Munro wrote: > On Tue, Feb 7, 2023 at 12:46 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> No, I left the workload as it was for the first lockup, so `make check` >> runs everything as is up until the "join" test suite. > > Wait, shouldn't that be join_hash? No, because join_hash does not exist on 11 (it was added in 12). Also, it actually locked up like this - that's the lockup I reported on 28/1. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Feb 7, 2023 at 1:06 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > On 2/7/23 00:48, Thomas Munro wrote: > > On Tue, Feb 7, 2023 at 12:46 PM Tomas Vondra > > <tomas.vondra@enterprisedb.com> wrote: > >> No, I left the workload as it was for the first lockup, so `make check` > >> runs everything as is up until the "join" test suite. > > > > Wait, shouldn't that be join_hash? > > No, because join_hash does not exist on 11 (it was added in 12). Also, > it actually locked up like this - that's the lockup I reported on 28/1. Oh, good. I had been trying to repro with 12 here and forgot that you were looking at 11...
On 2/7/23 01:09, Thomas Munro wrote: > On Tue, Feb 7, 2023 at 1:06 PM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: >> On 2/7/23 00:48, Thomas Munro wrote: >>> On Tue, Feb 7, 2023 at 12:46 PM Tomas Vondra >>> <tomas.vondra@enterprisedb.com> wrote: >>>> No, I left the workload as it was for the first lockup, so `make check` >>>> runs everything as is up until the "join" test suite. >>> >>> Wait, shouldn't that be join_hash? >> >> No, because join_hash does not exist on 11 (it was added in 12). Also, >> it actually locked up like this - that's the lockup I reported on 28/1. > > Oh, good. I had been trying to repro with 12 here and forgot that you > were looking at 11... FYI it happened again, on a regular run of regression tests (I gave up on trying to reproduce this - after some initial hits I didn't hit it in a couple thousand tries so I just added the machine back to buildfarm). Anyway, same symptoms - lockup in join_hash on PG11, leader waiting on WaitLatch and both workers waiting on BarrierArriveAndWait. I forgot running gdb on the second worker will get it unstuck, so I haven't been able to collect more info. What else do you think would be useful to collect next time? It's hard to draw conclusions due to the low probability of the issue, but it's pretty weird this only ever happened on 11 so far. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
Hi, I have another case of this on dikkop (on 11 again). Is there anything else we'd want to try? Or maybe someone would want access to the machine and do some investigation directly? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Aug 31, 2023 at 12:16 AM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > I have another case of this on dikkop (on 11 again). Is there anything > else we'd want to try? Or maybe someone would want access to the machine > and do some investigation directly? Sounds interesting -- I'll ping you off-list.
On Thu, Aug 31, 2023 at 2:32 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Thu, Aug 31, 2023 at 12:16 AM Tomas Vondra > <tomas.vondra@enterprisedb.com> wrote: > > I have another case of this on dikkop (on 11 again). Is there anything > > else we'd want to try? Or maybe someone would want access to the machine > > and do some investigation directly? Hmm. No conclusion tonight but I think it's weird. We have: bsd@generic:/mnt/data/buildfarm $ ps x -O wchan | grep 52663 52663 select - Is 0:07.40 postgres: bsd regression [local] SELECT (postgres) 52731 select - Is 0:00.09 postgres: parallel worker for PID 52663 (postgres) 52732 select - Is 0:00.06 postgres: parallel worker for PID 52663 (postgres) 81525 piperd 0 S+ 0:00.01 grep 52663 wchan=select means sleeping in poll()/select(). bsd@generic:/mnt/data/buildfarm $ procstat -i 52732 | grep USR1 52732 postgres USR1 P-C bsd@generic:/mnt/data/buildfarm $ procstat -j 52732 | grep USR1 52732 100121 postgres USR1 -- We have a signal that is pending and not blocked, so I don't immediately know why poll() hasn't returned control.
Hello Thomas, 31.08.2023 14:15, Thomas Munro wrote: > We have a signal that is pending and not blocked, so I don't > immediately know why poll() hasn't returned control. When I worked at the Postgres Pro company, we observed a similar lockup under rather specific conditions (we used Elbrus CPU and the specific Elbrus compiler (lcc) based on edg). I managed to reproduce that lockup and Anton Voloshin investigated it. The issue was caused by the compiler optimization in WaitEventSetWait(): waiting = true; ... while (returned_events == 0) { ... if (set->latch && set->latch->is_set) { ... break; } In that case, compiler decided that it may place the read "set->latch->is_set" before the write "waiting = true". (Placing "pg_compiler_barrier();" just after "waiting = true;" fixed the issue for us.) I can't provide more details for now, but maybe you could look at the binary code generated on the target platform to confirm or reject my guess. Best regards, Alexander
On 9/1/23 10:00, Alexander Lakhin wrote: > Hello Thomas, > > 31.08.2023 14:15, Thomas Munro wrote: > >> We have a signal that is pending and not blocked, so I don't >> immediately know why poll() hasn't returned control. > > When I worked at the Postgres Pro company, we observed a similar lockup > under rather specific conditions (we used Elbrus CPU and the specific > Elbrus > compiler (lcc) based on edg). > I managed to reproduce that lockup and Anton Voloshin investigated it. > The issue was caused by the compiler optimization in WaitEventSetWait(): > waiting = true; > ... > while (returned_events == 0) > { > ... > if (set->latch && set->latch->is_set) > { > ... > break; > } > > In that case, compiler decided that it may place the read > "set->latch->is_set" before the write "waiting = true". > (Placing "pg_compiler_barrier();" just after "waiting = true;" fixed the > issue for us.) > I can't provide more details for now, but maybe you could look at the > binary > code generated on the target platform to confirm or reject my guess. > Hmmm, I'm not very good at reading the binary code, but here's what objdump produced for WaitEventSetWait. Maybe someone will see what the issue is. I thought about maybe just adding the barrier in the code, but then how would we know it's the issue and this fixed it? It happens so rarely we can't make any conclusions from a couple runs of tests. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
Hello Tomas, 01.09.2023 16:00, Tomas Vondra wrote: > Hmmm, I'm not very good at reading the binary code, but here's what > objdump produced for WaitEventSetWait. Maybe someone will see what the > issue is. At first glance, I can't see anything suspicious in the disassembly. IIUC, waiting = true presented there as: 805c38: b902ad18 str w24, [x8, #684] // pgstat_report_wait_start(): proc->wait_event_info = wait_event_info; // end of pgstat_report_wait_start(wait_event_info); 805c3c: b0ffdb09 adrp x9, 0x366000 <dsm_segment_address+0x24> 805c40: b0ffdb0a adrp x10, 0x366000 <dsm_segment_address+0x28> 805c44: f0000eeb adrp x11, 0x9e4000 <PMSignalShmemInit+0x4> 805c48: 52800028 mov w8, #1 // true 805c4c: 52800319 mov w25, #24 805c50: 5280073a mov w26, #57 805c54: fd446128 ldr d8, [x9, #2240] 805c58: 90000d7b adrp x27, 0x9b1000 <ModifyWaitEvent+0xb0> 805c5c: fd415949 ldr d9, [x10, #688] 805c60: f9071d68 str x8, [x11, #3640] // waiting = true (x8 = w8) So there are two simple mov's and two load operations performed in parallel, but I don't think it's similar to what we had in that case. > I thought about maybe just adding the barrier in the code, but then how > would we know it's the issue and this fixed it? It happens so rarely we > can't make any conclusions from a couple runs of tests. Probably I could construct a reproducer for the lockup if I had access to the such machine for a day or two. Best regards, Alexander
On Fri, Sep 1, 2023 at 6:13 AM Alexander Lakhin <exclusion@gmail.com> wrote: > (Placing "pg_compiler_barrier();" just after "waiting = true;" fixed the > issue for us.) Maybe it'd be worth trying something stronger, like pg_memory_barrier(). A compiler barrier doesn't prevent the CPU from reordering loads and stores as it goes, and ARM64 has weak memory ordering. -- Robert Haas EDB: http://www.enterprisedb.com
Hello Robert, 01.09.2023 23:21, Robert Haas wrote: > On Fri, Sep 1, 2023 at 6:13 AM Alexander Lakhin<exclusion@gmail.com> wrote: >> (Placing "pg_compiler_barrier();" just after "waiting = true;" fixed the >> issue for us.) > Maybe it'd be worth trying something stronger, like > pg_memory_barrier(). A compiler barrier doesn't prevent the CPU from > reordering loads and stores as it goes, and ARM64 has weak memory > ordering. Indeed, thank you for the tip! So maybe here we deal with not compiler's, but with CPU's optimization. The wider code fragment is: 805c48: 52800028 mov w8, #1 // true 805c4c: 52800319 mov w25, #24 805c50: 5280073a mov w26, #57 805c54: fd446128 ldr d8, [x9, #2240] 805c58: 90000d7b adrp x27, 0x9b1000 <ModifyWaitEvent+0xb0> 805c5c: fd415949 ldr d9, [x10, #688] 805c60: f9071d68 str x8, [x11, #3640] // waiting = true (x8 = w8) 805c64: f90003f3 str x19, [sp] 805c68: 14000010 b 0x805ca8 <WaitEventSetWait+0x108> 805ca8: f9400a88 ldr x8, [x20, #16] // if (set->latch && set->latch->is_set) 805cac: b4000068 cbz x8, 0x805cb8 <WaitEventSetWait+0x118> 805cb0: f9400108 ldr x8, [x8] 805cb4: b5001248 cbnz x8, 0x805efc <WaitEventSetWait+0x35c> 805cb8: f9401280 ldr x0, [x20, #32] If that CPU can delay the writing to the variable waiting (str x8, [x11, #3640]) in it's internal form like "store 1 to [address]" to 805cb0 or a later instruction, then we can get the behavior discussed. Something like that is shown in the ARM documentation: https://developer.arm.com/documentation/102336/0100/Memory-ordering?lang=en I'll try to test this guess on the target machine... Best regards, Alexander
I agree that the code lacks barriers. I haven't been able to figure out how any reordering could cause this hang, though, because in these old branches procsignal_sigusr1_handler is used for latch wakeups, and it also calls SetLatch(MyLatch) itself, right at the end. That is, SetLatch() gets called twice, first in the waker process and then again in the awoken process, so it should be impossible for the latter not to see MyLatch->is_set == true after procsignal_sigusr1_handler completes. That made me think the handler didn't run, which is consistent with procstat -i showing it as pending ('P'). Which made me start to suspect a kernel bug, unless we can explain what we did to block it... But... perhaps I am confused about that and did something wrong when looking into it. It's hard to investigate when you aren't allowed to take core files or connect a debugger (both will reliably trigger EINTR).
Hello, 03.09.2023 00:00, Alexander Lakhin wrote: > I'll try to test this guess on the target machine... > I got access to dikkop thanks to Tomas Vondra, and started reproducing the issue. It was rather difficult to catch the lockup as Tomas and Tom noticed before. I tried to use stress-ng to affect reproduction somehow and gradually managed to reduce the join_hash test to just a single query (if I were venturous enough, I could reduce that test to it in just one step, but I choose to move little by little). At the end I constructed a reproducing script that could catch the lockup on that ARM machine during an hour on average. Today I've booted FreeBSD-14.0-ALPHA4 on my secondary AMD (Ryzen 7 3700X) machine and managed to reproduce the issue with that script (see attachment). (I run the script just after `gmake && gmake check`.) To reproduce it faster on my machine, I run it on tmpfs and use parallel: for ((i=1;i<=4;i++)); do echo "ITERATION $i"; parallel --halt now,fail=1 -j8 --linebuffer --tag time .../repro+.sh {} ::: 01 02 03 04 || break; done It produces the lockup as follows: ITERATION 1 ... 02 waiting for server to shut down.... done 02 server stopped 02 02 real 3m2.420s 02 user 0m2.896s 02 sys 0m1.685s 04 TIMEOUT on iteration 448 04 04 real 3m16.212s 04 user 0m2.322s 04 sys 0m1.904s ... psql -p 15435 regression -c "SELECT * FROM pg_stat_activity;" 16384 | regression | 53696 | | 10 | user | psql | | | -1 | 2023-09-08 18:44:27.572426+00 | 2023-09-08 18:44:27.573633+00 | 2023-09-08 18:44:27.795518+00 | 2023-09-08 18:44:27.795519+00 | IPC | HashBuildHashOuter | active | | 731 | | explain (analyze) select count(*) from simple r join simple s using (id); | client backend 16384 | regression | 53894 | 53696 | 10 | user | psql | | | | 2023-09-08 18:44:27.796749+00 | 2023-09-08 18:44:27.573633+00 | 2023-09-08 18:44:27.795518+00 | 2023-09-08 18:44:27.799261+00 | IPC | HashBuildHashOuter | active | | 731 | | explain (analyze) select count(*) from simple r join simple s using (id); | parallel worker 16384 | regression | 53896 | 53696 | 10 | user | psql | | | | 2023-09-08 18:44:27.797436+00 | 2023-09-08 18:44:27.573633+00 | 2023-09-08 18:44:27.795518+00 | 2023-09-08 18:44:27.799291+00 | IPC | HashBuildHashInner | active | | 731 | | explain (analyze) select count(*) from simple r join simple s using (id); | parallel worker procstat -i 53896 53896 postgres URG P-C tail server04.log 2023-09-08 18:44:27.777 UTC|user|regression|53696|64fb6b8b.d1c0|LOG: statement: explain (analyze) select count(*) from simple r join simple s using (id); 2023-09-08 18:44:27.786 UTC|user|regression|53696|64fb6b8b.d1c0|LOG: statement: explain (analyze) select count(*) from simple r join simple s using (id); 2023-09-08 18:44:27.795 UTC|user|regression|53696|64fb6b8b.d1c0|LOG: statement: explain (analyze) select count(*) from simple r join simple s using (id); 2023-09-08 18:45:38.685 UTC|[unknown]|[unknown]|66915|64fb6bd2.10563|LOG: connection received: host=[local] 2023-09-08 18:45:38.685 UTC|user|regression|66915|64fb6bd2.10563|LOG: connection authorized: user=user database=regression application_name=psql 2023-09-08 18:45:38.686 UTC|user|regression|66915|64fb6bd2.10563|LOG: statement: SELECT * FROM pg_stat_activity; It takes less than 10 minutes on average for me. I checked REL_12_STABLE, REL_13_STABLE, and REL_14_STABLE (with HAVE_KQUEUE undefined forcefully) — they all are affected. I could not reproduce the lockup on my Ubuntu box (with HAVE_SYS_EPOLL_H undefined manually). And surprisingly for me, I could not reproduce it on master and REL_16_STABLE. `git bisect` for this behavior change pointed at 7389aad63 (though maybe it just greatly decreased probability of the failure; I'm going to double-check this). In particular, that commit changed this: - /* - * Ignore SIGURG for now. Child processes may change this (see - * InitializeLatchSupport), but they will not receive any such signals - * until they wait on a latch. - */ - pqsignal_pm(SIGURG, SIG_IGN); /* ignored */ -#endif + /* This may configure SIGURG, depending on platform. */ + InitializeLatchSupport(); + InitProcessLocalLatch(); With debugging logging added I see (on 7389aad63~1) that one process really sends SIGURG to another, and the latter reaches poll(), but it just got no signal, it's signal handler not called and poll() just waits... So it looks like the ARM weak memory model is not the root cause of the issue. But as far as I can see, it's still specific to FreeBSD (but not specific to a compiler — I used gcc and clang with the same success). Best regards, Alexander
Вложения
On Sat, Sep 9, 2023 at 7:00 AM Alexander Lakhin <exclusion@gmail.com> wrote: > It takes less than 10 minutes on average for me. I checked > REL_12_STABLE, REL_13_STABLE, and REL_14_STABLE (with HAVE_KQUEUE undefined > forcefully) — they all are affected. > I could not reproduce the lockup on my Ubuntu box (with HAVE_SYS_EPOLL_H > undefined manually). And surprisingly for me, I could not reproduce it on > master and REL_16_STABLE. > `git bisect` for this behavior change pointed at 7389aad63 (though maybe it > just greatly decreased probability of the failure; I'm going to double-check > this). > In particular, that commit changed this: > - /* > - * Ignore SIGURG for now. Child processes may change this (see > - * InitializeLatchSupport), but they will not receive any such signals > - * until they wait on a latch. > - */ > - pqsignal_pm(SIGURG, SIG_IGN); /* ignored */ > -#endif > + /* This may configure SIGURG, depending on platform. */ > + InitializeLatchSupport(); > + InitProcessLocalLatch(); > > With debugging logging added I see (on 7389aad63~1) that one process > really sends SIGURG to another, and the latter reaches poll(), but it > just got no signal, it's signal handler not called and poll() just waits... Thanks for working so hard on this Alexander. That is a surprising discovery! So changes to the signal handler arrangements in the *postmaster* before the child was forked affected this? > So it looks like the ARM weak memory model is not the root cause of the > issue. But as far as I can see, it's still specific to FreeBSD (but not > specific to a compiler — I used gcc and clang with the same success). Idea: FreeBSD 13 introduced a new mechanism called sigfastblock[1], which lets system libraries control signal blocking with atomic memory tricks in a word of user space memory. I have no particular theory for why it would be going wrong here (I don't expect us to be using any of the stuff that would use it, though I don't understand it in detail so that doesn't say much), but it occurred to me that all reports so far have been on 13.x or 14. I wonder... If you have a good fast recipe for reproducing this, could you also try it on FreeBSD 12.4? [1] https://man.freebsd.org/cgi/man.cgi?query=sigfastblock&sektion=2&manpath=FreeBSD+13.0-current
Hi Thomas, 08.09.2023 22:39, Thomas Munro wrote: >> With debugging logging added I see (on 7389aad63~1) that one process >> really sends SIGURG to another, and the latter reaches poll(), but it >> just got no signal, it's signal handler not called and poll() just waits... > Thanks for working so hard on this Alexander. That is a surprising > discovery! So changes to the signal handler arrangements in the > *postmaster* before the child was forked affected this? Yes, I think we deal with something like that. I can try to deduce a minimum change that affects reproducing the issue, but may be it's not that important. Perhaps we now should think of escalating the problem to FreeBSD developers? I wonder, what kind of reproducer they find acceptable. A standalone C program only or maybe a script that compiles/installs postgres and runs our test will do too? >> So it looks like the ARM weak memory model is not the root cause of the >> issue. But as far as I can see, it's still specific to FreeBSD (but not >> specific to a compiler — I used gcc and clang with the same success). > Idea: FreeBSD 13 introduced a new mechanism called sigfastblock[1], > which lets system libraries control signal blocking with atomic memory > tricks in a word of user space memory. I have no particular theory > for why it would be going wrong here (I don't expect us to be using > any of the stuff that would use it, though I don't understand it in > detail so that doesn't say much), but it occurred to me that all > reports so far have been on 13.x or 14. I wonder... If you have a > good fast recipe for reproducing this, could you also try it on > FreeBSD 12.4? It was a happy guess! I checked the reproduction on FreeBSD 13.1-RELEASE releng/13.1-n250148-fc952ac2212 and got the same results as on FreeBSD 14: REL_12_STABLE - failed on iteration 3 REL_15_STABLE - failed on iteration 1 REL_16_STABLE - 10 iterations with no failure But on FreeBSD 12.4-RELEASE r372781: REL_12_STABLE - 20 iterations with no failure REL_15_STABLE - 20 iterations with no failure BTW, I also retested 7389aad63 on FreeBSD 14 and got no failure for 100 iterations. Best regards, Alexander
On Sat, Sep 9, 2023 at 9:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: > Yes, I think we deal with something like that. I can try to deduce a minimum > change that affects reproducing the issue, but may be it's not that important. > Perhaps we now should think of escalating the problem to FreeBSD developers? > I wonder, what kind of reproducer they find acceptable. A standalone C > program only or maybe a script that compiles/installs postgres and runs > our test will do too? We discussed this a bit off-list and I am following up on that. My guess is that this will turn out to be a bad interaction between that optimisation and our (former) habit of forking background workers from inside a signal handler, but let's see... FTR If someone is annoyed by this and just wants their build farm animal not to hang on REL_12_STABLE, via Alexander's later experiments we learned that sysctl kern.sigfastblock_fetch_always=1 fixes the problem.