(c) Although it's not visible in the results, 4.5.5 almost perfectly eliminated the fluctuations in the results. For example when 3.2.80 produced this results (10 runs with the same parameters):
Notice how much more even the 4.5.5 results are, compared to 3.2.80.
how long each run was? Generally, I do half-hour run to get stable results.
10 x 5-minute runs for each client count. The full shell script driving the benchmark is here: http://bit.ly/2doY6ID and in short it looks like this:
for r in `seq 1 $runs`; do for c in 1 8 16 32 64 128 192; do psql -c checkpoint pgbench -j 8 -c $c ... done done
I see couple of problems with the tests:
1. You're running regular pgbench, which also updates the small tables. At scale 300 and higher clients, there is going to heavy contention on the pgbench_branches table. Why not test with pgbench -N? As far as this patch is concerned, we are only interested in seeing contention on ClogControlLock. In fact, how about a test which only consumes an XID, but does not do any write activity at all? Complete artificial workload, but good enough to tell us if and how much the patch helps in the best case. We can probably do that with a simple txid_current() call, right?
2. Each subsequent pgbench run will bloat the tables. Now that may not be such a big deal given that you're checkpointing between each run. But it still makes results somewhat hard to compare. If a vacuum kicks in, that may have some impact too. Given the scale factor you're testing, why not just start fresh every time?