Обсуждение: [PATCH] Avoid pallocs in async.c's SignalBackends critical section
Hi hackers,
This patch addresses this comment in async.c's SignalBackends:
* XXX in principle these pallocs could fail, which would be bad.
* Maybe preallocate the arrays? They're not that large, though.
This is unsafe, since AtCommit_Notify effectively runs in a critical
section, so an OOM there would PANIC ("AbortTransaction while in COMMIT
state"), as we can no longer abort safely.
This patch fixes this by adding two static arrays, notifySignalPids and
notifySignalProcs, allocated lazily in TopMemoryContext by
initSignalArrays. PreCommit_Notify now calls initSignalArrays while it's
still safe to ERROR, ensuring the arrays exist before entering the
commit path.
SignalBackends is updated to use these preallocated arrays instead of
allocating temporary ones.
This removes the risk of palloc in a critical section and eliminates
repeated palloc/pfree cycles.
/Joel
Вложения
On 23/11/2025 16:45, Joel Jacobson wrote:
> Hi hackers,
>
> This patch addresses this comment in async.c's SignalBackends:
>
> * XXX in principle these pallocs could fail, which would be bad.
> * Maybe preallocate the arrays? They're not that large, though.
>
> This is unsafe, since AtCommit_Notify effectively runs in a critical
> section, so an OOM there would PANIC ("AbortTransaction while in COMMIT
> state"), as we can no longer abort safely.
Ugh. I wonder if we should put an actual critical section around those
post-commit cleanup actions? As you said, it's effectively a critical
section already, except that we don't get the benefit of the
AssertNotInCriticalSection assertions.
Or even better, could we move things out of that effective critical
section? It's painful to write code that cannot palloc.
> This patch fixes this by adding two static arrays, notifySignalPids and
> notifySignalProcs, allocated lazily in TopMemoryContext by
> initSignalArrays. PreCommit_Notify now calls initSignalArrays while it's
> still safe to ERROR, ensuring the arrays exist before entering the
> commit path.
>
> SignalBackends is updated to use these preallocated arrays instead of
> allocating temporary ones.
>
> This removes the risk of palloc in a critical section and eliminates
> repeated palloc/pfree cycles.
Makes sense as far as it goes. But there's more: Exec_ListenCommit()
also palloc's, And AtCommit_Notify also calls asyncQueueAdvanceTail(),
which calls SimpleLruTruncate(). I didn't analyze SimpleLruTruncate() in
detail, but I bet it also palloc's, and it's not nice to panic e.g.
because of a failed unlink() call either.
- Heikki
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 23/11/2025 16:45, Joel Jacobson wrote:
>> This patch addresses this comment in async.c's SignalBackends:
>> * XXX in principle these pallocs could fail, which would be bad.
>> * Maybe preallocate the arrays? They're not that large, though.
> Ugh. I wonder if we should put an actual critical section around those
> post-commit cleanup actions? As you said, it's effectively a critical
> section already, except that we don't get the benefit of the
> AssertNotInCriticalSection assertions.
> Or even better, could we move things out of that effective critical
> section? It's painful to write code that cannot palloc.
I don't think Joel did anybody any favors by separating this patch
fragment from its larger context [1]. Given the infrequency of
complaints about failures in this area, I'm not sure that the
notational pain of an actual critical section is justified.
But I complained that the changes contemplated in [1] were raising
the probability of failure, and while working on tamping that back
down we decided to do something about this old gripe too.
There's a relevant comment in CommitTransaction():
* This is all post-commit cleanup. Note that if an error is raised here,
* it's too late to abort the transaction. This should be just
* noncritical resource releasing.
Unfortunately, releasing locks, sending notifies, etc is not all
that "noncritical" if you want the DB to keep functioning well.
But there's a good deal of code in there and making it all obey
the critical-section rules looks painful.
regards, tom lane
[1] https://www.postgresql.org/message-id/flat/6899c044-4a82-49be-8117-e6f669765f7e@app.fastmail.com
On Mon, Nov 24, 2025, at 17:06, Tom Lane wrote: > I don't think Joel did anybody any favors by separating this patch > fragment from its larger context [1]. I'm a bit surprised by this. My intention in splitting it out was based on earlier advice in [1] that I think made a lot of sense: >> [...my idea of a bgworker to kick lagging backends...] > If you feel that that's not robust enough, > you should split it out as a separate patch that's advertised as a > robustness improvement not a performance improvement, and see if you > can get anyone to bite. In general, I think it's nice when a bigger change can be split into smaller meaningful committable changes, which seemed possible in this case. Heikki also raised a point that seems worth exploring: AtCommit_Notify currently calls asyncQueueAdvanceTail. After PreCommit_Notify, we already know whether tryAdvanceTail is needed, so it looks feasible to move the asyncQueueAdvanceTail call to the end of PreCommit_Notify. > Given the infrequency of > complaints about failures in this area, I'm not sure that the > notational pain of an actual critical section is justified. > > But I complained that the changes contemplated in [1] were raising > the probability of failure, and while working on tamping that back > down we decided to do something about this old gripe too. > > There's a relevant comment in CommitTransaction(): > > * This is all post-commit cleanup. Note that if an error is raised here, > * it's too late to abort the transaction. This should be just > * noncritical resource releasing. > > Unfortunately, releasing locks, sending notifies, etc is not all > that "noncritical" if you want the DB to keep functioning well. > But there's a good deal of code in there and making it all obey > the critical-section rules looks painful. I see why a critical-section is probably too painful. But since the direction in [1] is to avoid adding new possibly risky operations to AtCommit_Notify, I don't think it's completely unreasonable to consider moving some existing ones into PreCommit_Notify, when feasible. If it's preferable, I'm fine dropping this standalone patch and folding any such adjustments into v29 in [1], or I can just leave the existing code untouched? /Joel
On Mon, Nov 24, 2025, at 22:53, Joel Jacobson wrote:
> On Mon, Nov 24, 2025, at 17:06, Tom Lane wrote:
>> Unfortunately, releasing locks, sending notifies, etc is not all
>> that "noncritical" if you want the DB to keep functioning well.
>> But there's a good deal of code in there and making it all obey
>> the critical-section rules looks painful.
>
> I see why a critical-section is probably too painful. But since the
> direction in [1] is to avoid adding new possibly risky operations to
> AtCommit_Notify, I don't think it's completely unreasonable to consider
> moving some existing ones into PreCommit_Notify, when feasible.
>
> If it's preferable, I'm fine dropping this standalone patch and folding
> any such adjustments into v29 in [1], or I can just leave the existing
> code untouched?
With the following three changes, I think the only remaining
potentially-risky code in AtCommit_Notify, is the acquire/release of
locks.
* 0001-async-avoid-pallocs-in-critical-section-v2.patch:
Preallocate signal arrays to avoid pallocs AtCommit.
* 0002-async-avoid-pallocs-in-critical-section-v2.patch:
Move asyncQueueAdvanceTail from AtCommit to PreCommit.
* 0003-async-avoid-pallocs-in-critical-section-v2.patch
Convert listenChannels to hash table.
This is based on Heikki's suggestion
"We really should turn that into a hash table."
from the bug fix thread [2] combined with Tom's idea of a boolean
"is it REALLY listening?"
field [1].
Together, these patches allows us to gets rid of the following comments:
0001:
- * XXX in principle these pallocs could fail, which would be bad. Maybe
- * preallocate the arrays? They're not that large, though.
0002:
- * This is (usually) called during CommitTransaction(), so it's important for
- * it to have very low probability of failure.
0003:
- * XXX It is theoretically possible to get an out-of-memory failure here,
- * which would be bad because we already committed. For the moment it
- * doesn't seem worth trying to guard against that, but maybe improve this
- * later.
Please advise if we want these changes, and if so, if they should be
folded into [1] i.e. closing this thread, or if we want to keep this thread.
/Joel
[1] https://www.postgresql.org/message-id/flat/6899c044-4a82-49be-8117-e6f669765f7e@app.fastmail.com
[2] https://www.postgresql.org/message-id/flat/CAK98qZ3wZLE-RZJN_Y%2BTFjiTRPPFPBwNBpBi5K5CU8hUHkzDpw%40mail.gmail.com
Вложения
On Tue, Nov 25, 2025, at 11:15, Joel Jacobson wrote:
> With the following three changes, I think the only remaining
> potentially-risky code in AtCommit_Notify, is the acquire/release of
> locks.
Patch 0001 and 0002 are unchanged since v2.
0003:
Since this thread is specifically about avoiding pallocs in the
effective "critical section", I realize we shouldn't change
listenChannels from a list to hash (in this patch), but just move the
existing potentially-risky code out of AtCommit_Notify.
Thanks to a sharper focus on that, I realized Tom's alternative design
idea from [1], to just go ahead and perform the LISTEN/UNLISTEN updates
in PreCommit_Notify, is an excellent simplification, with no real
downsides that I can identify.
This allowed simplifying 0003 a lot, by just doing all LISTEN/UNLISTEN
operations during PreCommit:
1 file changed, 100 insertions(+), 139 deletions(-)
I really hope this approach is acceptable, because then the optimization
patch will become much simpler.
Please advise whether you think this makes sense, and whether we should
keep this thread open or fold the result directly into [1].
Patches:
* 0001-async-avoid-pallocs-in-critical-section-v3.patch:
Preallocate signal arrays to avoid pallocs AtCommit
* 0002-async-avoid-pallocs-in-critical-section-v3.patch:
Move asyncQueueAdvanceTail from AtCommit to PreCommit
* 0003-async-avoid-pallocs-in-critical-section-v3.patch:
Execute LISTEN/UNLISTEN during PreCommit
/Joel
[1] https://www.postgresql.org/message-id/flat/6899c044-4a82-49be-8117-e6f669765f7e%40app.fastmail.com