Hello everyone.
> However ... I tried to reproduce the original complaint, and
> failed entirely. I do see KnownAssignedXidsGetAndSetXmin
> eating a bit of time in the standby backends, but it's under 1%
> and doesn't seem to be rising over time. Perhaps we've already
> applied some optimization that ameliorates the problem? But
> I tested v13 as well as HEAD, and got the same results.
> Hmm. I wonder if my inability to detect a problem is because the startup
> process does keep ahead of the workload on my machine, while it fails
> to do so on the OP's machine. I've only got a 16-CPU machine at hand,
> which probably limits the ability of the primary to saturate the standby's
> startup process.
Yes, optimization by Andres Freund made things much better, but the
impact is still noticeable.
I was also using 16CPU machine - but two of them (primary and standby).
Here are the scripts I was using (1) for benchmark - maybe it could help.
> Nowadays we've *got* those primitives. Can we get rid of
> known_assigned_xids_lck, and if so would it make a meaningful
> difference in this scenario?
I was trying it already - but was unable to find real benefits for it.
WIP patch in attachment.
Hm, I see I have sent it to list, but it absent in archives... Just
quote from it:
> First potential positive effect I could see is
> (TransactionIdIsInProgress -> KnownAssignedXidsSearch) locking but
> seems like it is not on standby hotpath.
> Second one - locking for KnownAssignedXidsGetAndSetXmin (build
> snapshot). But I was unable to measure impact. It wasn’t visible
> separately in (3) test.
> Maybe someone knows scenario causing known_assigned_xids_lck or
> TransactionIdIsInProgress become bottleneck on standby?
The latest question is still actual :)
> I think it might be a bigger effect than one might immediately think. Because
> the spinlock will typically be on the same cacheline as head/tail, and because
> every spinlock acquisition requires the cacheline to be modified (and thus
> owned mexlusively) by the current core, uses of head/tail will very commonly
> be cache misses even in workloads without a lot of KAX activity.
I was trying to find some way to practically achieve any noticeable
impact here, but unsuccessfully.
>> But yeah, it does feel like the proposed
>> approach is only going to be optimal over a small range of conditions.
> In particular, it doesn't adapt at all to workloads that don't replay all that
> much, but do compute a lot of snapshots.
The approach (2) was optimized to avoid any additional work for anyone
except non-startup
processes (approach with offsets to skip gaps while building snapshot).
[1]: https://gist.github.com/michail-nikolaev/e1dfc70bdd7cfd1b902523dbb3db2f28
[2]:
https://www.postgresql.org/message-id/flat/CANtu0ogzo4MsR7My9%2BNhu3to5%3Dy7G9zSzUbxfWYOn9W5FfHjTA%40mail.gmail.com#341a3c3b033f69b260120b3173a66382
--
Michail Nikolaev