Hi,
On 03/30/2016 07:09 PM, Andres Freund wrote:
> Yes. That looks good. My testing shows that increasing the number of
> buffers can increase both throughput and reduce latency variance. The
> former is a smaller effect with one of the discussed patches applied,
> the latter seems to actually increase in scale (with increased
> throughput).
>
>
> I've attached patches to:
> 0001: Increase the max number of clog buffers
> 0002: Implement 64bit atomics fallback and optimize read/write
> 0003: Edited version of Simon's clog scalability patch
>
> WRT 0003 - still clearly WIP - I've:
> - made group_lsn pg_atomic_u64*, to allow for tear-free reads
> - split content from IO lock
> - made SimpleLruReadPage_optShared always return with only share lock
> held
> - Implement a different, experimental, concurrency model for
> SetStatusBit using cmpxchg. A define USE_CONTENT_LOCK controls which
> bit is used.
>
> I've tested this and saw this outperform Amit's approach. Especially so
> when using a read/write mix, rather then only reads. I saw over 30%
> increase on a large EC2 instance with -btpcb-like@1 -bselect-only@3. But
> that's in a virtualized environment, not very good for reproducability.
>
> Amit, could you run benchmarks on your bigger hardware? Both with
> USE_CONTENT_LOCK commented out and in?
>
> I think we should go for 1) and 2) unconditionally. And then evaluate
> whether to go with your, or 3) from above. If the latter, we've to do
> some cleanup :)
>
I have been testing Amit's patch in various setups and work loads, with
up to 400 connections on a 2 x Xeon E5-2683 (28C/56T @ 2 GHz), not
seeing an improvement, but no regression either.
Testing with 0001 and 0002 do show up to a 5% improvement when using a
HDD for data + wal - about 1% when using 2 x RAID10 SSD - unlogged.
I can do a USE_CONTENT_LOCK run on 0003 if it is something for 9.6.
Thanks for your work on this !
Best regards, Jesper