Обсуждение: Increased shared_buffer setting = lower hit ratio ?

От:
CS DBA
Дата:

All;

We have a large db server with 128GB of ram running complex functions.

with the server set to have the following we were seeing a somewhat low
hit ratio and lots of temp buffers

shared_buffers = 18GB
work_mem = 75MB
effective_cache_size = 105GB
checkpoint_segments = 128


when we increased the values to these not only did the hit ratio drop
but query times are now longer as well:


shared_buffers = 28GB
work_mem = 150MB
effective_cache_size = 105GB
checkpoint_segments = 256

This does not seem to make sense to me, anyone have any thoughts on why
more memory resources would cause worse performance?

Thanks in advance



От:
CS DBA
Дата:

This is on a CentOS 6.5 box running PostgreSQL 9.2


On 11/13/14 4:09 PM, CS DBA wrote:
> All;
>
> We have a large db server with 128GB of ram running complex functions.
>
> with the server set to have the following we were seeing a somewhat
> low hit ratio and lots of temp buffers
>
> shared_buffers = 18GB
> work_mem = 75MB
> effective_cache_size = 105GB
> checkpoint_segments = 128
>
>
> when we increased the values to these not only did the hit ratio drop
> but query times are now longer as well:
>
>
> shared_buffers = 28GB
> work_mem = 150MB
> effective_cache_size = 105GB
> checkpoint_segments = 256
>
> This does not seem to make sense to me, anyone have any thoughts on
> why more memory resources would cause worse performance?
>
> Thanks in advance
>



От:
Tomas Vondra
Дата:

Hi,

On 14.11.2014 00:16, CS DBA wrote:
> This is on a CentOS 6.5 box running PostgreSQL 9.2
>
>
> On 11/13/14 4:09 PM, CS DBA wrote:
>> All;
>>
>> We have a large db server with 128GB of ram running complex
>> functions.
>>
>> with the server set to have the following we were seeing a
>> somewhat low hit ratio and lots of temp buffers
>>
>> shared_buffers = 18GB
>> work_mem = 75MB
>> effective_cache_size = 105GB
>> checkpoint_segments = 128
>>
>>
>> when we increased the values to these not only did the hit ratio
>> drop but query times are now longer as well:
>>
>>
>> shared_buffers = 28GB
>> work_mem = 150MB
>> effective_cache_size = 105GB
>> checkpoint_segments = 256
>>
>> This does not seem to make sense to me, anyone have any thoughts
>> on why more memory resources would cause worse performance?

what exactly do you mean by hit ratio - is that the page cache hit ratio
(filesystem cache), or shared buffers hit ratio (measured e.g. using
pg_buffercache)?

Regarding the unexpected decrease of performance after increasing
shared_buffers - that's actually quite common behavior. First, the
management of shared buffers is not free, and the more pieces you need
to manage the more expensive it is. Also, by using larger shared buffers
you make that memory unusable for page cache etc. There are also other
negative consequences - double buffering, accumulating more changes for
a checkpoint etc.

The common wisdom (which some claim to be obsolete) is not to set shared
buffers over ~10GB of RAM. It's however very workload-dependent so your
mileage may vary.

To get some basic idea of the shared_buffers utilization, it's possible
to compute stats using pg_buffercache. Also pg_stat_bgwriter contains
useful data.

BTW, it's difficult to say why a query is slow - can you post explain
analyze of the query with both shared_buffers settings?

And just to check - what kind of hardware/kernel version is this? Do you
have numa / transparent huge pages or similar trouble-indicing issues?

Tomas


От:
Jeff Janes
Дата:

On Thu, Nov 13, 2014 at 3:09 PM, CS DBA <> wrote:
All;

We have a large db server with 128GB of ram running complex functions.

with the server set to have the following we were seeing a somewhat low hit ratio and lots of temp buffers

shared_buffers = 18GB
work_mem = 75MB
effective_cache_size = 105GB
checkpoint_segments = 128


when we increased the values to these not only did the hit ratio drop but query times are now longer as well:


shared_buffers = 28GB
work_mem = 150MB
effective_cache_size = 105GB
checkpoint_segments = 256

This does not seem to make sense to me, anyone have any thoughts on why more memory resources would cause worse performance?

You should try changing those things separately, there isn't much reason that shared_buffers and work_mem should be changed together.

There are many reasons the hit ratio and the performance could have gotten worse, without more info we can just speculate.  I'd guess it is just as likely as not that the two observations actually have different causes, rather than both being caused by the same thing.  Can you figure out which specific queries changed performance?  Barring that, which objects changed hit ratios the most?  And how did the actual buffer hit statistics change?  Looking at just the ratio obscures more than it enlightens.

Large sorts are often slower when given more memory.  If you give it so much more memory that it becomes an in-memory sort, it will get faster.  But if you change it from (for example) a 12-way merge of X sized runs to a 6-way merge of X*2 size runs it could very well be slower because you are making poor use of the CPU cache and spending more time waiting on main memory while building those runs. But that shouldn't show up hit ratios, just in performance.

A higher work_mem might also prompt a plan to read an entire table and hash it, rather than do a nested loop probing its index.  If the index was well-cached in shared buffers but the whole table is not, this could make the buffer hit ratio look worse.

Cheers,

Jeff