Re: Speed up Clog Access by increasing CLOG buffers

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Speed up Clog Access by increasing CLOG buffers
Дата
Msg-id b3586234-6c80-5b64-1261-871e0e852bbb@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Speed up Clog Access by increasing CLOG buffers  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Speed up Clog Access by increasing CLOG buffers  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Re: Speed up Clog Access by increasing CLOG buffers  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: Speed up Clog Access by increasing CLOG buffers  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
Hi,

On 10/27/2016 01:44 PM, Amit Kapila wrote:
> On Thu, Oct 27, 2016 at 4:15 AM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>>
>> FWIW I plan to run the same test with logged tables - if it shows similar
>> regression, I'll be much more worried, because that's a fairly typical
>> scenario (logged tables, data set > shared buffers), and we surely can't
>> just go and break that.
>>
>
> Sure, please do those tests.
>

OK, so I do have results for those tests - that is, scale 3000 with 
shared_buffers=16GB (so continuously writing out dirty buffers). The 
following reports show the results slightly differently - all three "tps 
charts" next to each other, then the speedup charts and tables.

Overall, the results are surprisingly positive - look at these results 
(all ending with "-retest"):

[1] http://tvondra.bitbucket.org/index2.html#dilip-3000-logged-sync-retest

[2] 
http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-sync-noskip-retest

[3] 
http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-sync-skip-retest

All three show significant improvement, even with fairly low client 
counts. For example with 72 clients, the tps improves 20%, without 
significantly affecting variability variability of the results( measured 
as stdddev, more on this later).

It's however interesting that "no_content_lock" is almost exactly the 
same as master, while the other two cases improve significantly.

The other interesting thing is that "pgbench -N" [3] shows no such 
improvement, unlike regular pgbench and Dilip's workload. Not sure why, 
though - I'd expect to see significant improvement in this case.

I have also repeated those tests with clog buffers increased to 512 (so 
4x the current maximum of 128). I only have results for Dilip's workload 
and "pgbench -N":

[4] 
http://tvondra.bitbucket.org/index2.html#dilip-3000-logged-sync-retest-512

[5] 
http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-sync-skip-retest-512

The results are somewhat surprising, I guess, because the effect is 
wildly different for each workload.

For Dilip's workload increasing clog buffers to 512 pretty much 
eliminates all benefits of the patches. For example with 288 client, the 
group_update patch gives ~60k tps on 128 buffers [1] but only 42k tps on 
512 buffers [4].

With "pgbench -N", the effect is exactly the opposite - while with 128 
buffers there was pretty much no benefit from any of the patches [3], 
with 512 buffers we suddenly get almost 2x the throughput, but only for 
group_update and master (while the other two patches show no improvement 
at all).

I don't have results for the regular pgbench ("noskip") with 512 buffers 
yet, but I'm curious what that will show.

In general I however think that the patches don't show any regression in 
any of those workloads (at least not with 128 buffers). Based solely on 
the results, I like the group_update more, because it performs as good 
as master or significantly better.

>>> 2. We do see in some cases that granular_locking and
>>> no_content_lock patches has shown significant increase in
>>> contention on CLOGControlLock. I have already shared my analysis
>>> for same upthread [8].
>>

I've read that analysis, but I'm not sure I see how it explains the "zig 
zag" behavior. I do understand that shifting the contention to some 
other (already busy) lock may negatively impact throughput, or that the 
group_update may result in updating multiple clog pages, but I don't 
understand two things:

(1) Why this should result in the fluctuations we observe in some of the 
cases. For example, why should we see 150k tps on, 72 clients, then drop 
to 92k with 108 clients, then back to 130k on 144 clients, then 84k on 
180 clients etc. That seems fairly strange.

(2) Why this should affect all three patches, when only group_update has 
to modify multiple clog pages.

For example consider this:
    http://tvondra.bitbucket.org/index2.html#dilip-300-logged-async

For example looking at % of time spent on different locks with the 
group_update patch, I see this (ignoring locks with ~1%):
 event_type     wait_event       36   72  108  144  180  216  252  288
----------------------------------------------------------------------              -                60   63   45   53
38   50   33   48 Client         ClientRead       33   23    9   14    6   10    4    8 LWLockNamed    CLogControlLock
2    7   33   14   34   14   33   14 LWLockTranche  buffer_content    0    2    9   13   19   18   26   22
 

I don't see any sign of contention shifting to other locks, just 
CLogControlLock fluctuating between 14% and 33% for some reason.

Now, maybe this has nothing to do with PostgreSQL itself, but maybe it's 
some sort of CPU / OS scheduling artifact. For example, the system has 
36 physical cores, 72 virtual ones (thanks to HT). I find it strange 
that the "good" client counts are always multiples of 72, while the 
"bad" ones fall in between.
  72 = 72 * 1   (good) 108 = 72 * 1.5 (bad) 144 = 72 * 2   (good) 180 = 72 * 2.5 (bad) 216 = 72 * 3   (good) 252 = 72 *
3.5(bad) 288 = 72 * 4   (good)
 

So maybe this has something to do with how OS schedules the tasks, or 
maybe some internal heuristics in the CPU, or something like that.


>> On logged tables it usually looks like this (i.e. modest increase for high
>> client counts at the expense of significantly higher variability):
>>
>>   http://tvondra.bitbucket.org/#pgbench-3000-logged-sync-skip-64
>>
>
> What variability are you referring to in those results?>

Good question. What I mean by "variability" is how stable the tps is 
during the benchmark (when measured on per-second granularity). For 
example, let's run a 10-second benchmark, measuring number of 
transactions committed each second.

Then all those runs do 1000 tps on average:
  run 1: 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000  run 2: 500, 1500, 500, 1500, 500, 1500, 500, 1500,
500,1500  run 3: 0, 2000, 0, 2000, 0, 2000, 0, 2000, 0, 2000
 

I guess we agree those runs behave very differently, despite having the 
same throughput. So this is what STDDEV(tps) measures, i.e. the third 
chart on the reports, shows.

So for example this [6] shows that the patches give us higher throughput 
with >= 180 clients, but we also pay for that with increased variability 
of the results (i.e. the tps chart will have jitter):

[6] 
http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-sync-skip-64

Of course, exchanging throughput, latency and variability is one of the 
crucial trade-offs in transactions systems - at some point the resources 
get saturated and higher throughput can only be achieved in exchange for 
latency (e.g. by grouping requests). But still, we'd like to get stable 
tps from the system, not something that gives us 2000 tps one second and 
0 tps the next one.

Of course, this is not perfect - it does not show whether there are 
transactions with significantly higher latency, and so on. It'd be good 
to also measure latency, but I haven't collected that info during the 
runs so far.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [sqlsmith] Missing CHECK_FOR_INTERRUPTS in tsquery_rewrite
Следующее
От: Tatsuo Ishii
Дата:
Сообщение: Re: sources.sgml typo