Обсуждение: background writer being lazy?
Hi all,
I have a large database that I'm trying to do some buffer writer tuning to. I have a data collector set up on pg_stat_bgwriter so I can see the statistics over time, and see how many buffers are being written by checkpoints, and how many are being written by the background writer.
The main thing I am currently seeing is that there are 300X or more buffers written by checkpoints rather than background writer.
Here are my current settings:
PostgreSQL 8.4
bgwriter_delay = 50ms
bgwriter_lru_maxpages = 500
bgwriter_lru_multiplier = 4
checkpoint_segments=100
checkpoint_timeout=40min
(NOTE: Watching my logs, timed checkpoints occur on time, exactly 40 minutes apart, no extra)
I have a lot of snapshots of pg_stat_bgwriter, but here are the totals for three days worth that I find interesting. These numbers represent the total increase the entire day, so end of day value - begining of day value.
Day 1:
total_ckpt_buffers | total_clean_buffers | total_clean_max_written
--------------------+---------------------+-------------------------
474322 | 1368 | 0
Day 2:
total_ckpt_buffers | total_clean_buffers | total_clean_max_written
--------------------+---------------------+-------------------------
507956 | 2208 | 0
Day 3:
total_ckpt_buffers | total_clean_buffers | total_clean_max_written
--------------------+---------------------+-------------------------
622519 | 65879 | 56
day three had a big spike, where my max_written (set to 500 pages) was reached. more often than not, max pages is not reached, and I can go 6 hours without a single buffer written by background writer, while 100K is written by my checkpoints (side note, i have log_checkpoints = on and the log confirms the checkpoint buffers written values).
So my main question is, where is the issue? It doesn't seem (to me) that the background writer is having a hard time keeping up, because there are simply tons of times where it's doing nothing. So is it just not determining that it needs to do anything because there are already enough 'clean buffers' ready for use at any given time? Would increasing bgwriter_lru_multiplier to a higher value help get more to be written by the bgwriter, and if so are there any negative side effects I would need to consider for this?
Or, is there another explanation to this behavior that I'm not thinking of?
Thanks in advance for any help, thoughts
- Brian F
I have a large database that I'm trying to do some buffer writer tuning to. I have a data collector set up on pg_stat_bgwriter so I can see the statistics over time, and see how many buffers are being written by checkpoints, and how many are being written by the background writer.
The main thing I am currently seeing is that there are 300X or more buffers written by checkpoints rather than background writer.
Here are my current settings:
PostgreSQL 8.4
bgwriter_delay = 50ms
bgwriter_lru_maxpages = 500
bgwriter_lru_multiplier = 4
checkpoint_segments=100
checkpoint_timeout=40min
(NOTE: Watching my logs, timed checkpoints occur on time, exactly 40 minutes apart, no extra)
I have a lot of snapshots of pg_stat_bgwriter, but here are the totals for three days worth that I find interesting. These numbers represent the total increase the entire day, so end of day value - begining of day value.
Day 1:
total_ckpt_buffers | total_clean_buffers | total_clean_max_written
--------------------+---------------------+-------------------------
474322 | 1368 | 0
Day 2:
total_ckpt_buffers | total_clean_buffers | total_clean_max_written
--------------------+---------------------+-------------------------
507956 | 2208 | 0
Day 3:
total_ckpt_buffers | total_clean_buffers | total_clean_max_written
--------------------+---------------------+-------------------------
622519 | 65879 | 56
day three had a big spike, where my max_written (set to 500 pages) was reached. more often than not, max pages is not reached, and I can go 6 hours without a single buffer written by background writer, while 100K is written by my checkpoints (side note, i have log_checkpoints = on and the log confirms the checkpoint buffers written values).
So my main question is, where is the issue? It doesn't seem (to me) that the background writer is having a hard time keeping up, because there are simply tons of times where it's doing nothing. So is it just not determining that it needs to do anything because there are already enough 'clean buffers' ready for use at any given time? Would increasing bgwriter_lru_multiplier to a higher value help get more to be written by the bgwriter, and if so are there any negative side effects I would need to consider for this?
Or, is there another explanation to this behavior that I'm not thinking of?
Thanks in advance for any help, thoughts
- Brian F
version | PostgreSQL 8.4.1 on x86_64-unknown-linux-gnu, compiled by GCC gcc (SUSE Linux) 4.3.2 [gcc-4_3-branch revision 141291], 64-bit
bgwriter_delay | 50ms
bgwriter_lru_maxpages | 500
bgwriter_lru_multiplier | 4
checkpoint_segments | 100
checkpoint_timeout | 40min
checkpoint_warning | 30min
checkpoint_completion_target | 0.5
effective_cache_size | 16GB
effective_io_concurrency | 4
extra_float_digits | 3
max_connections | 2000
max_stack_depth | 7MB
shared_buffers | 16GB
synchronous_commit | off
temp_buffers | 8192
TimeZone | US/Pacific
wal_buffers | 8MB
work_mem | 64MB
Since our checkpoint_completion_target is 0.5, we complete a checkpoint around 20 mins after it started, and checkpoints occur like clockwork every 40 minutes, no extra's are forced.
Performance overall is fairly good, but of course we're trying to squeeze as much out of it as we can. One main thing is trying to lower 'spikey' disk IO so that performance is more consistent at any given time.
- Brian F
On 11/01/2011 12:44 PM, Kevin Grittner wrote:
Brian Fehrle <brianf@consistentstate.com> wrote:On 11/01/2011 08:58 AM, Kevin Grittner wrote:What problem are you trying to solve?We're trying to offload dirty buffer writes from checkpoints and from backends (not the bgwriter) to the background writer, I believe with the hope that it's the same amount of disk IO, but spread out evenly rather than just when a checkpoint is occurring.What version of PostgreSQL are you using? Recent versions spread out the checkpoint activity using the same process which does the background writing, so there is no benefit from moving writes from its background writing phase to its distributed checkpoint phase. Depending on your setting of checkpoint_completion_target you are probably spending as much or more time spreading the checkpoint as doing background writing between checkpoints. Each of the last few major releases has made this much better, so if you're spending time tweaking something prior to 9.1, you'd probably be better served putting that time into upgrading.Yes. Writing dirty buffers when there are enough buffers available to service requests would tend to increase overall disk writes and degrade performance. You don't have a problem unless you have a high percentage of writes from normal backends which need to flush a buffer in order to get one.This seems to be the case, as buffers_backend is between checkpoint_buffers and clean_buffers from pg_stat_bgwriter. For example, on 2011-10-19: checkpoint buffers: 622,519 clean_buffers: 65,879 clean_max_written: 56 backend_buffers: 460,471 Am I reading these right in wanting to reduce backend_buffers and checkpoint_buffers?Hmm. That is a higher percentage of backend writes than I would like to see. What is your shared_memory setting? Actually, please show the results of the query here: http://wiki.postgresql.org/wiki/Server_Configuration How is your performance? -Kevin
Brian Fehrle <brianf@consistentstate.com> wrote: > PostgreSQL 8.4.1 on x86_64-unknown-linux-gnu Please upgrade to the latest bug fix release of PostgreSQL: http://www.postgresql.org/support/versioning To see what bug and security fixes you're missing, look at release notes for 8.4.2 to 8.4.9 here: http://www.postgresql.org/docs/8.4/static/release.html There have been improvements in your areas of concern in 9.0 and 9.1, so you might want to start planning a major release upgrade. That's not as painful as it used to be, with pg_upgrade. > bgwriter_lru_maxpages | 500 FWIW, we do set this to 1000. > max_connections | 2000 This is probably your biggest problem. Unless you've got 1000 CPUs on this box, you should use a connection pooler which is transaction-oriented, limits the number of database connections, and queues requests for a new transaction when all connections are in use. This will almost certainly improve throughput and limit latency problems. You do not need 2000 connections to support 2000 cnocurrent users; such a setting will make it harder to provide 2000 concurrent users with decent and consistent performance. > effective_cache_size | 16GB Given your other settings, this seems likely to be low. I normally add the cache space reported by the OS to the shared_buffers setting. > shared_buffers | 16GB This is probably at least twice what it should be. If you are having problems with backends writing too many buffers and problems with clusters of I/O congestion, you might want to drop it to the 0.5 to 2.0 GB range. > wal_buffers | 8MB Might as well go to 16MB. > work_mem | 64MB Each of your connections can allocate this much space, potentially several times, at the same moment. Unless you really have a monster machine, 64MB * 2000 connections is just asking for out of memory failures at unpredictable peak load times. > One main thing is trying to lower 'spikey' disk IO so that > performance is more consistent at any given time. The advice above should help with that. -Kevin
On 11/01/2011 08:58 AM, Kevin Grittner wrote: > Brian Fehrle<brianf@consistentstate.com> wrote: > >> So my main question is, where is the issue? > > That's my question, too. What problem are you trying to solve? > We're trying to offload dirty buffer writes from checkpoints and from backends (not the bgwriter) to the background writer, I believe with the hope that it's the same amount of disk IO, but spread out evenly rather than just when a checkpoint is occurring. >> It doesn't seem (to me) that the background writer is having a >> hard time keeping up, because there are simply tons of times where >> it's doing nothing. So is it just not determining that it needs to >> do anything because there are already enough 'clean buffers' ready >> for use at any given time? > > Yes. Writing dirty buffers when there are enough buffers available > to service requests would tend to increase overall disk writes and > degrade performance. You don't have a problem unless you have a > high percentage of writes from normal backends which need to flush a > buffer in order to get one. > This seems to be the case, as buffers_backend is between checkpoint_buffers and clean_buffers from pg_stat_bgwriter. For example, on 2011-10-19: checkpoint buffers: 622,519 clean_buffers: 65,879 clean_max_written: 56 backend_buffers: 460,471 Am I reading these right in wanting to reduce backend_buffers and checkpoint_buffers? - Brian F >> Would increasing bgwriter_lru_multiplier to a higher value help >> get more to be written by the bgwriter, and if so are there any >> negative side effects I would need to consider for this? > > Yes. If a buffer is written to again by PostgreSQL after it hits > disk because your background writer was overly aggressive, you will > hurt your overall throughput. Now, sometimes that's worth doing to > control latency, but you haven't described any such problem. > > -Kevin
On 10/31/2011 07:28 PM, Brian Fehrle wrote:
Writing buffers at checkpoint time is more efficient than having the background writer handle them. I think your efforts to space checkpoints out may have backfired a bit on you. You're letting 40 minutes of dirty buffers accumulate before they're written out. Putting checkpoint_timeout closer to its default of 5 minutes again may reduce the spikes you're seeing.
The changes you've made to the background writer configuration are also counterproductive, given that it's not really going to trigger anyway. I would only recommend decreasing bgwriter_delay or increasing bgwriter_lru_maxpages or you see the total_clean_max_written value get incremented regularly. If that's not happening, making the background writer run more often and try to do more work just adds overhead.
Also: you've set shared_buffers to 16GB. That's beyond where most people find increases to stop being useful. I'd wager you'll get less spiky performance just by lowering that a lot. The 256MB to 1GB range is where I normally end up on servers where lower latency is prioritized instead of maximum throughput.
The main thing I am currently seeing is that there are 300X or more buffers written by checkpoints rather than background writer.
Writing buffers at checkpoint time is more efficient than having the background writer handle them. I think your efforts to space checkpoints out may have backfired a bit on you. You're letting 40 minutes of dirty buffers accumulate before they're written out. Putting checkpoint_timeout closer to its default of 5 minutes again may reduce the spikes you're seeing.
The changes you've made to the background writer configuration are also counterproductive, given that it's not really going to trigger anyway. I would only recommend decreasing bgwriter_delay or increasing bgwriter_lru_maxpages or you see the total_clean_max_written value get incremented regularly. If that's not happening, making the background writer run more often and try to do more work just adds overhead.
Also: you've set shared_buffers to 16GB. That's beyond where most people find increases to stop being useful. I'd wager you'll get less spiky performance just by lowering that a lot. The 256MB to 1GB range is where I normally end up on servers where lower latency is prioritized instead of maximum throughput.
-- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us