Обсуждение: background writer being lazy?

Поиск
Список
Период
Сортировка

background writer being lazy?

От
Brian Fehrle
Дата:
Hi all,

I have a large database that I'm trying to do some buffer writer tuning to. I have a data collector set up on pg_stat_bgwriter so I can see the statistics over time, and see how many buffers are being written by checkpoints, and how many are being written by the background writer.

The main thing I am currently seeing is that there are 300X or more buffers written by checkpoints rather than background writer.

Here are my current settings:
PostgreSQL 8.4
bgwriter_delay = 50ms
bgwriter_lru_maxpages = 500
bgwriter_lru_multiplier = 4

checkpoint_segments=100
checkpoint_timeout=40min
(NOTE: Watching my logs, timed checkpoints occur on time, exactly 40 minutes apart, no extra)

I have a lot of snapshots of pg_stat_bgwriter, but here are the totals for three days worth that I find interesting. These numbers represent the total increase the entire day, so end of day value - begining of day value.
Day 1:
 total_ckpt_buffers | total_clean_buffers | total_clean_max_written
--------------------+---------------------+-------------------------
             474322 |                1368 |                       0


Day 2:
 total_ckpt_buffers | total_clean_buffers | total_clean_max_written
--------------------+---------------------+-------------------------
             507956 |                2208 |                       0


Day 3:
 total_ckpt_buffers | total_clean_buffers | total_clean_max_written
--------------------+---------------------+-------------------------
             622519 |               65879 |                      56


day three  had a big spike, where my max_written (set to 500 pages) was reached. more often than not, max pages is not reached, and I can go 6 hours without a single buffer written by background writer, while 100K is written by my checkpoints (side note, i have log_checkpoints = on and the log confirms the checkpoint buffers written values).


So my main question is, where is the issue? It doesn't seem (to me) that the background writer is having a hard time keeping up, because there are simply tons of times where it's doing nothing. So is it just not determining that it needs to do anything because there are already enough 'clean buffers' ready for use at any given time? Would increasing bgwriter_lru_multiplier to a higher value help get more to be written by the bgwriter, and if so are there any negative side effects I would need to consider for this?

Or, is there another explanation to this behavior that I'm not thinking of?

Thanks in advance for any help, thoughts

- Brian F

Re: background writer being lazy?

От
Brian Fehrle
Дата:

 version                
       | PostgreSQL 8.4.1 on x86_64-unknown-linux-gnu, compiled by GCC gcc (SUSE Linux) 4.3.2 [gcc-4_3-branch revision 141291], 64-bit
 bgwriter_delay           
     | 50ms
 bgwriter_lru_maxpages    
     | 500
 bgwriter_lru_multiplier  
     | 4
 checkpoint_segments      
     | 100
 checkpoint_timeout       
     | 40min
 checkpoint_warning       
     | 30min
 checkpoint_completion_target  | 0.5

 effective_cache_size          | 16GB
 effective_io_concurrency 
     | 4
 extra_float_digits       
     | 3
 max_connections           
    | 2000
 max_stack_depth           
    | 7MB
 shared_buffers            
    | 16GB
 synchronous_commit        
   | off
 temp_buffers            
      | 8192
 TimeZone                      | US/Pacific
 wal_buffers                   | 8MB
 work_mem                      | 64MB



Since our checkpoint_completion_target is 0.5, we complete a checkpoint around 20 mins after it started, and checkpoints occur like clockwork every 40 minutes, no extra's are forced.

Performance overall is fairly good, but of course we're trying to squeeze as much out of it as we can. One main thing is trying to lower 'spikey' disk IO so that performance is more consistent at any given time.

- Brian F

On 11/01/2011 12:44 PM, Kevin Grittner wrote:
Brian Fehrle <brianf@consistentstate.com> wrote:
On 11/01/2011 08:58 AM, Kevin Grittner wrote:
 
What problem are you trying to solve?

We're trying to offload dirty buffer writes from checkpoints and
from backends (not the bgwriter) to the background writer, I
believe with the hope that it's the same amount of disk IO, but
spread out evenly rather than just when a checkpoint is occurring.
 
What version of PostgreSQL are you using?  Recent versions spread
out the checkpoint activity using the same process which does the
background writing, so there is no benefit from moving writes from
its background writing phase to its distributed checkpoint phase. 
Depending on your setting of checkpoint_completion_target you are
probably spending as much or more time spreading the checkpoint as
doing background writing between checkpoints.  Each of the last few
major releases has made this much better, so if you're spending time
tweaking something prior to 9.1, you'd probably be better served
putting that time into upgrading.
Yes.  Writing dirty buffers when there are enough buffers
available to service requests would tend to increase overall disk
writes and degrade performance.  You don't have a problem unless
you have a high percentage of writes from normal backends which
need to flush a buffer in order to get one.

This seems to be the case, as buffers_backend is between 
checkpoint_buffers and clean_buffers from pg_stat_bgwriter.
For example, on 2011-10-19:
checkpoint buffers: 622,519
clean_buffers:  65,879
clean_max_written: 56
backend_buffers: 460,471

Am I reading these right in wanting to reduce backend_buffers and 
checkpoint_buffers?
 
Hmm.  That is a higher percentage of backend writes than I would
like to see.  What is your shared_memory setting?  Actually, please
show the results of the query here:
http://wiki.postgresql.org/wiki/Server_Configuration
How is your performance?
-Kevin

Re: background writer being lazy?

От
"Kevin Grittner"
Дата:
Brian Fehrle <brianf@consistentstate.com> wrote:

> PostgreSQL 8.4.1 on x86_64-unknown-linux-gnu

Please upgrade to the latest bug fix release of PostgreSQL:

http://www.postgresql.org/support/versioning

To see what bug and security fixes you're missing, look at release
notes for 8.4.2 to 8.4.9 here:

http://www.postgresql.org/docs/8.4/static/release.html

There have been improvements in your areas of concern in 9.0 and
9.1, so you might want to start planning a major release upgrade.
That's not as painful as it used to be, with pg_upgrade.

>   bgwriter_lru_maxpages   | 500

FWIW, we do set this to 1000.

>   max_connections | 2000

This is probably your biggest problem.  Unless you've got 1000 CPUs
on this box, you should use a connection pooler which is
transaction-oriented, limits the number of database connections, and
queues requests for a new transaction when all connections are in
use.  This will almost certainly improve throughput and limit
latency problems.  You do not need 2000 connections to support 2000
cnocurrent users; such a setting will make it harder to provide 2000
concurrent users with decent and consistent performance.

>   effective_cache_size   | 16GB

Given your other settings, this seems likely to be low.  I normally
add the cache space reported by the OS to the shared_buffers
setting.

>   shared_buffers | 16GB

This is probably at least twice what it should be.  If you are
having problems with backends writing too many buffers and problems
with clusters of I/O congestion, you might want to drop it to the
0.5 to 2.0 GB range.

>   wal_buffers                   | 8MB

Might as well go to 16MB.

>   work_mem                      | 64MB

Each of your connections can allocate this much space, potentially
several times, at the same moment.  Unless you really have a monster
machine, 64MB * 2000 connections is just asking for out of memory
failures at unpredictable peak load times.

> One main thing is trying to lower 'spikey' disk IO so that
> performance is more consistent at any given time.

The advice above should help with that.

-Kevin

Re: background writer being lazy?

От
Brian Fehrle
Дата:
On 11/01/2011 08:58 AM, Kevin Grittner wrote:
> Brian Fehrle<brianf@consistentstate.com>  wrote:
>
>> So my main question is, where is the issue?
>
> That's my question, too.  What problem are you trying to solve?
>
We're trying to offload dirty buffer writes from checkpoints and from
backends (not the bgwriter) to the background writer, I believe with the
hope that it's the same amount of disk IO, but spread out evenly rather
than just when a checkpoint is occurring.

>> It doesn't seem (to me) that the background writer is having a
>> hard time keeping up, because there are simply tons of times where
>> it's doing nothing. So is it just not determining that it needs to
>> do anything because there are already enough 'clean buffers' ready
>> for use at any given time?
>
> Yes.  Writing dirty buffers when there are enough buffers available
> to service requests would tend to increase overall disk writes and
> degrade performance.  You don't have a problem unless you have a
> high percentage of writes from normal backends which need to flush a
> buffer in order to get one.
>
This seems to be the case, as buffers_backend is between
checkpoint_buffers and clean_buffers from pg_stat_bgwriter.
For example, on 2011-10-19:
checkpoint buffers: 622,519
clean_buffers:  65,879
clean_max_written: 56
backend_buffers: 460,471

Am I reading these right in wanting to reduce backend_buffers and
checkpoint_buffers?

- Brian F

>> Would increasing bgwriter_lru_multiplier to a higher value help
>> get more to be written by the bgwriter, and if so are there any
>> negative side effects I would need to consider for this?
>
> Yes.  If a buffer is written to again by PostgreSQL after it hits
> disk because your background writer was overly aggressive, you will
> hurt your overall throughput.  Now, sometimes that's worth doing to
> control latency, but you haven't described any such problem.
>
> -Kevin


Re: background writer being lazy?

От
Greg Smith
Дата:
On 10/31/2011 07:28 PM, Brian Fehrle wrote:
The main thing I am currently seeing is that there are 300X or more buffers written by checkpoints rather than background writer.


Writing buffers at checkpoint time is more efficient than having the background writer handle them.  I think your efforts to space checkpoints out may have backfired a bit on you.  You're letting 40 minutes of dirty buffers accumulate before they're written out.  Putting checkpoint_timeout closer to its default of 5 minutes again may reduce the spikes you're seeing.

The changes you've made to the background writer configuration are also counterproductive, given that it's not really going to trigger anyway.  I would only recommend decreasing bgwriter_delay or increasing bgwriter_lru_maxpages or you see the total_clean_max_written value get incremented regularly.  If that's not happening, making the background writer run more often and try to do more work just adds overhead.

Also:  you've set shared_buffers to 16GB.  That's beyond where most people find increases to stop being useful.  I'd wager you'll get less spiky performance just by lowering that a lot.  The 256MB to 1GB range is where I normally end up on servers where lower latency is prioritized instead of maximum throughput.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us