Re: Scaling shared buffer eviction

Поиск

Список

Период

Сортировка

От	Amit Kapila
Тема	Re: Scaling shared buffer eviction
Дата	16 сентября 2014 г. 15:18:44
Msg-id	CAA4eK1KHTX3wa34N7F_4vCnFWEBTO_J=ak2nDKL_ZzcrsGCL7A@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Scaling shared buffer eviction (Amit Kapila <amit.kapila16@gmail.com>)
Ответы	Re: Scaling shared buffer eviction (Robert Haas <robertmhaas@gmail.com>) Re: Scaling shared buffer eviction (Gregory Smith <gregsmithpgsql@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Sun, Sep 14, 2014 at 12:23 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Fri, Sep 12, 2014 at 11:55 AM, Amit Chapel <amit.kapila16@gmail.com> wrote:
> On Thu, Sep 11, 2014 at 4:31 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2014-09-10 12:17:34 +0530, Amit Kapila wrote:

I will post the data with the latest patch separately (where I will focus
on new cases discussed between Robert and Andres).

Performance Data with latest version of patch.

All the data shown below is a median of 3 runs, for each

individual run data, refer attached document

(perf_read_scalability_data_v9.ods)

Performance Data for Read-only test

-----------------------------------------------------

Configuration and Db Details
IBM POWER-7 16 cores, 64 hardware threads
RAM = 64GB
Database Locale =C
checkpoint_segments=256
checkpoint_timeout =15min
shared_buffers=8GB
scale factor = 3000
Client Count = number of concurrent sessions and threads (ex. -c 8 -j 8)
Duration of each individual run = 5mins

All the data is in tps and taken using pgbench read-only load

Client_Count/Patch_Ver	8	16	32	64	128
HEAD	58614	107370	140717	104357	65010
sbe_v9	62943	119064	172246	220174	220904

Observations

---------------------

1. It scales well as with previous versions of patch, but

it seems the performance is slightly better in few cases,

may be because I have removed a statement (if check)

or 2 in bgreclaimer (those were done under spinlock) or it

could be just run-to-run difference.

> (1) A read-only pgbench workload that is just a tiny bit larger than
> shared_buffers, say size of shared_buffers plus 0.01%. Such workloads
> tend to stress buffer eviction heavily.

When the data is just tiny bit larger than shared buffers, actually

there is no problem in scalability even in HEAD, because I think

most of the requests will be satisfied from existing buffer pool.

I have taken data for some of the loads where database size is

bit larger than shared buffers and it is as follows:

Scale Factor - 800

Shared_Buffers - 12286MB (Total db size is 12288MB)

Client_Count/Patch_Ver	1	8	16	32	64	128
HEAD	8406	68712	132222	198481	290340	289828
sbe_v9	8504	68546	131926	195789	289959	289021

Scale Factor - 800
Shared_Buffers - 12166MB (Total db size is 12288MB)

Client_Count/Patch_Ver	1	8	16	32	64	128
HEAD	8428	68609	128092	196596	292066	293812
sbe_v9	8386	68546	126926	197126	289959	287621

Observations

---------------------

In most cases performance with patch is slightly less as compare

to HEAD and the difference is generally less than 1% and in a case

or 2 close to 2%. I think the main reason for slight difference is that

when the size of shared buffers is almost same as data size, the number

of buffers it needs from clock sweep are very less, as an example in first

case (when size of shared buffers is 12286MB), it actually needs at most

256 additional buffers (2MB) via clock sweep, where as bgreclaimer

will put 2000 (high water mark) additional buffers (0.5% of shared buffers

is greater than 2000 ) in free list, so bgreclaimer does some extra work

when it is not required and it also leads to condition you mentioned

down (freelist will contain buffers that have already been touched since

we added them). Now for case 2 (12166MB), we need buffers more

than 2000 additional buffers, but not too many, so it can also have

similar effect.

I think we have below options related to this observation

a. Some further tuning in bgreclaimer, so that instead of putting

the buffers up to high water mark in freelist, it puts just 1/4th or

1/2 of high water mark and then check if the free list still contains

lesser than equal to low water mark, if yes it continues and if not

then it can wait (or may be some other way).

b. Instead of waking bgreclaimer when the number of buffers fall

below low water mark, wake when the number of times backends

does clock sweep crosses certain threshold

c. Give low and high water mark as config knobs, so that in some

rare cases users can use them to do tuning.

d. Lets not do anything as if user does such a configuration, he should

be educated to configure shared buffers in a better way and or the

performance hit doesn't seem to be justified to do any further

work.

Now if we do either of 'a' or 'b', then I think there is a chance

that the gain might not be same for cases where users can

easily get benefit from this patch and there is a chance that

it degrades the performance in some other case.

> (2) A workload that maximizes the rate of concurrent buffer eviction
> relative to other tasks. Read-only pgbench is not bad for this, but
> maybe somebody's got a better idea.

I think the first test of pgbench (scale_factor-3000;shared_buffers-8GB)

addresses this case.

> As I sort of mentioned in what I was writing for the bufmgr README,
> there are, more or less, three ways this can fall down, at least that
> I can see: (1) if the high water mark is too high, then we'll start
> finding buffers in the freelist that have already been touched since
> we added them:

I think I am able to see this effect (though mild) in one of above tests.

With Regards,
Amit Kapila.

EnterpriseDB: http://www.enterprisedb.com