Matt Newell <newellm@blur.com> writes:
> On Monday, September 28, 2015 07:22:26 PM Tom Lane wrote:
>> Possibly. sinval catchup notification works like that, so you could maybe
>> invent a similar mechanism for notify. Beware that we've had to fix
>> performance issues with sinval catchup; you don't want to release a whole
>> bunch of processes to do work at the same time.
> I'll have to think about this more but i don't envision it causing such as
> scenario.
> The extra processing that will happen at signaling time would have been done
> anyway when the aborted transaction is finally rolled back, the only extra
> work is copying the relevant notifications to local memory.
Hm. An idle-in-transaction listening backend will eventually cause a more
serious form of denial of service: once the pg_notify SLRU area fills up
to its maximum of 8GB, no more new messages can be inserted, and every
transaction that tries to do a NOTIFY will abort. However, that's a
documented hazard and we've not really had complaints about it. In any
case, trying to fix that by having such a backend absorb messages into
local memory doesn't seem like a great idea: if it sits for long enough,
its local memory usage will bloat. Eventually you'd have a failure
anyway.
> Regardless it might not be worthwhile since my patch for #1 seems to address
> the symptom that bit me.
I looked at this patch for a bit and found it kind of ugly, particularly
the business about "we allow one process at a time to advance
firstUncommitted by using advancingFirstUncommitted as a mutex".
That doesn't sound terribly safe or performant.
After further thought, I wonder whether instead of having an incoming
listener initialize its "pos" to QUEUE_TAIL, we couldn't have it
initialize to the maximum "pos" among existing listeners (with a floor of
QUEUE_TAIL of course). If any other listener has advanced over a given
message X, then X was committed (or aborted) earlier and the new listener
is not required to return it; so it should be all right to take that
listener's "pos" as a starting point.
The minimum-code-change way to do that would be to compute the max pos
the hard way while Exec_ListenPreCommit holds the AsyncQueueLock, which
it would now have to do in exclusive mode. That's a little bit annoying
but it's surely not much worse than what SignalBackends has to do after
every notification.
Alternatively, we could try to maintain a shared pointer that is always
equal to the frontmost backend's "pos". But I think that would require
more locking during queue reading than exists now; so it's far from clear
it would be a win, especially in systems where listening sessions last for
a reasonable amount of time (so that Exec_ListenPreCommit is infrequent).
regards, tom lane