Re: Scaling shared buffer eviction

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: Scaling shared buffer eviction
Дата
Msg-id CAA4eK1KHTX3wa34N7F_4vCnFWEBTO_J=ak2nDKL_ZzcrsGCL7A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Scaling shared buffer eviction  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Scaling shared buffer eviction  (Robert Haas <robertmhaas@gmail.com>)
Re: Scaling shared buffer eviction  (Gregory Smith <gregsmithpgsql@gmail.com>)
Список pgsql-hackers
On Sun, Sep 14, 2014 at 12:23 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Sep 12, 2014 at 11:55 AM, Amit Chapel <amit.kapila16@gmail.com> wrote:
> On Thu, Sep 11, 2014 at 4:31 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2014-09-10 12:17:34 +0530, Amit Kapila wrote:

I will post the data with the latest patch separately (where I will focus
on new cases discussed between Robert and Andres).


Performance Data with latest version of patch.
All the data shown below is a median of 3 runs, for each
individual run data, refer attached document
(perf_read_scalability_data_v9.ods)
 
 Performance Data for Read-only test
 -----------------------------------------------------
 Configuration and Db Details
 IBM POWER-7 16 cores, 64 hardware threads
 RAM = 64GB
 Database Locale =C
 checkpoint_segments=256
 checkpoint_timeout    =15min
 shared_buffers=8GB
 scale factor = 3000
 Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8)
 Duration of each individual run = 5mins

All the data is in tps and taken using pgbench read-only load

Client_Count/Patch_Ver8163264128
HEAD5861410737014071710435765010
sbe_v962943119064172246220174220904


Observations
---------------------
1. It scales well as with previous versions of patch, but
it seems the performance is slightly better in few cases,
may be because I have removed a statement (if check)
or 2 in bgreclaimer (those were done under spinlock) or it
could be just run-to-run difference.

> (1) A read-only pgbench workload that is just a tiny bit larger than
> shared_buffers, say size of shared_buffers plus 0.01%.  Such workloads
> tend to stress buffer eviction heavily.

When the data is just tiny bit larger than shared buffers, actually
there is no problem in scalability even in HEAD, because I think
most of the requests will be satisfied from existing buffer pool.
I have taken data for some of the loads where database size is
bit larger than shared buffers and it is as follows:

Scale Factor - 800
Shared_Buffers - 12286MB (Total db size is 12288MB)


Client_Count/Patch_Ver18163264128
HEAD840668712132222198481290340289828
sbe_v9850468546131926195789289959289021


Scale Factor - 800
Shared_Buffers - 12166MB (Total db size is 12288MB)

Client_Count/Patch_Ver18163264128
HEAD842868609128092196596292066293812
sbe_v9838668546126926197126289959287621


Observations
---------------------
In most cases performance with patch is slightly less as compare
to HEAD and the difference is generally less than 1% and in a case
or 2 close to 2%. I think the main reason for slight difference is that
when the size of shared buffers is almost same as data size, the number
of buffers it needs from clock sweep are very less, as an example in first
case (when size of shared buffers is 12286MB), it actually needs at most
256 additional buffers (2MB) via clock sweep, where as bgreclaimer
will put 2000 (high water mark) additional buffers (0.5% of shared buffers
is greater than 2000 ) in free list, so bgreclaimer does some extra work
when it is not required and it also leads to condition you mentioned
down (freelist will contain buffers that have already been touched since
we added them).  Now for case 2 (12166MB), we need buffers more
than 2000 additional buffers, but not too many, so it can also have
similar effect.

I think we have below options related to this observation
a. Some further tuning in bgreclaimer, so that instead of putting
the buffers up to high water mark in freelist, it puts just 1/4th or
1/2 of high water mark and then check if the free list still contains
lesser than equal to low water mark, if yes it continues and if not
then it can wait (or may be some other way).
b. Instead of waking bgreclaimer when the number of buffers fall
below low water mark, wake when the number of times backends
does clock sweep crosses certain threshold
c. Give low and high water mark as config knobs, so that in some
rare cases users can use them to do tuning.
d. Lets not do anything as if user does such a configuration, he should
be educated to configure shared buffers in a better way and or the
performance hit doesn't seem to be justified to do any further
work.

Now if we do either of 'a' or 'b', then I think there is a chance
that the gain might not be same for cases where users can
easily get benefit from this patch and there is a chance that
it degrades the performance in some other case. 

> (2) A workload that maximizes the rate of concurrent buffer eviction
> relative to other tasks.  Read-only pgbench is not bad for this, but
> maybe somebody's got a better idea.

I think the first test of pgbench (scale_factor-3000;shared_buffers-8GB)
addresses this case.

> As I sort of mentioned in what I was writing for the bufmgr README,
> there are, more or less, three ways this can fall down, at least that
> I can see: (1) if the high water mark is too high, then we'll start
> finding buffers in the freelist that have already been touched since
> we added them:

I think I am able to see this effect (though mild) in one of above tests.


With Regards,
Amit Kapila.

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Sequence Access Method WIP
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: CRC algorithm (was Re: [REVIEW] Re: Compression of full-page-writes)