Re: Speed up Clog Access by increasing CLOG buffers

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Speed up Clog Access by increasing CLOG buffers
Дата
Msg-id 8efd9956-059a-78f3-32ff-f1e1a4dd09c8@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Speed up Clog Access by increasing CLOG buffers  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Speed up Clog Access by increasing CLOG buffers  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On 10/31/2016 02:51 PM, Amit Kapila wrote:
> On Mon, Oct 31, 2016 at 12:02 AM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> Hi,
>>
>> On 10/27/2016 01:44 PM, Amit Kapila wrote:
>>
>> I've read that analysis, but I'm not sure I see how it explains the "zig
>> zag" behavior. I do understand that shifting the contention to some other
>> (already busy) lock may negatively impact throughput, or that the
>> group_update may result in updating multiple clog pages, but I don't
>> understand two things:
>>
>> (1) Why this should result in the fluctuations we observe in some of the
>> cases. For example, why should we see 150k tps on, 72 clients, then drop to
>> 92k with 108 clients, then back to 130k on 144 clients, then 84k on 180
>> clients etc. That seems fairly strange.
>>
>
> I don't think hitting multiple clog pages has much to do with
> client-count.  However, we can wait to see your further detailed test
> report.
>
>> (2) Why this should affect all three patches, when only group_update has to
>> modify multiple clog pages.
>>
>
> No, all three patches can be affected due to multiple clog pages.
> Read second paragraph ("I think one of the probable reasons that could
> happen for both the approaches") in same e-mail [1].  It is basically
> due to frequent release-and-reacquire of locks.
>
>>
>>
>>>> On logged tables it usually looks like this (i.e. modest increase for
>>>> high
>>>> client counts at the expense of significantly higher variability):
>>>>
>>>>   http://tvondra.bitbucket.org/#pgbench-3000-logged-sync-skip-64
>>>>
>>>
>>> What variability are you referring to in those results?
>>
>>>
>>
>> Good question. What I mean by "variability" is how stable the tps is during
>> the benchmark (when measured on per-second granularity). For example, let's
>> run a 10-second benchmark, measuring number of transactions committed each
>> second.
>>
>> Then all those runs do 1000 tps on average:
>>
>>   run 1: 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000
>>   run 2: 500, 1500, 500, 1500, 500, 1500, 500, 1500, 500, 1500
>>   run 3: 0, 2000, 0, 2000, 0, 2000, 0, 2000, 0, 2000
>>
>
> Generally, such behaviours are seen due to writes. Are WAL and DATA
> on same disk in your tests?
>

Yes, there's one RAID device on 10 SSDs, with 4GB of the controller. 
I've done some tests and it easily handles > 1.5GB/s in sequential 
writes, and >500MB/s in sustained random writes.

Also, let me point out that most of the tests were done so that the 
whole data set fits into shared_buffers, and with no checkpoints during 
the runs (so no writes to data files should really happen).

For example these tests were done on scale 3000 (45GB data set) with 
64GB shared buffers:

[a] 
http://tvondra.bitbucket.org/index2.html#pgbench-3000-unlogged-sync-noskip-64

[b] 
http://tvondra.bitbucket.org/index2.html#pgbench-3000-logged-async-noskip-64

and I could show similar cases with scale 300 on 16GB shared buffers.

In those cases, there's very little contention between WAL and the rest 
of the data base (in terms of I/O).

And moreover, this setup (single device for the whole cluster) is very 
common, we can't just neglect it.

But my main point here really is that the trade-off in those cases may 
not be really all that great, because you get the best performance at 
36/72 clients, and then the tps drops and variability increases. At 
least not right now, before tackling contention on the WAL lock (or 
whatever lock becomes the bottleneck).

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kouhei Kaigai
Дата:
Сообщение: PassDownLimitBound for ForeignScan/CustomScan [take-2]
Следующее
От: Kouhei Kaigai
Дата:
Сообщение: ParallelFinish-hook of FDW/CSP (Re: Steps inside ExecEndGather)