Обсуждение: tuning bgwriter in 8.4.2
We recently upgraded to 8.4.2 and I'm trying to make sure our bgwriter is working as well as it can. Based on:
# select * from pg_stat_bgwriter ;
checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean | maxwritten_clean | buffers_backend | buffers_alloc
-------------------+-----------------+--------------------+---------------+------------------+-----------------+---------------
804 | 2 | 39171885 | 22562 | 211 | 24759656 | 4488627
(1 row)
...I'm not sure that it is, because as I understand things, assuming I don't keep updating the same pages then buffers_backend should be a small percentage of buffers_alloc, and buffers_clean should be larger than it is compared to buffers_checkpoint. Is my understanding correct?
My checkpoints are spread exactly 5 minutes apart, and a typical checkpoint log entry looks like:
checkpoint complete: wrote 48289 buffers (3.7%); 0 transaction log file(s) added, 0 removed, 14 recycled; write=149.872 s, sync=0.378 s, total=150.256 s
The only bgwriter tunable we've adjusted so far is bgwriter_lru_maxpages = 500, though we've also set checkpoint_segments = 768 (not that we need it that high, but we have the space on the wal volume.)
On Feb 14, 2010, at 10:25 AM, Ben Chobot wrote:
We recently upgraded to 8.4.2 and I'm trying to make sure our bgwriter is working as well as it can. Based on:# select * from pg_stat_bgwriter ;checkpoints_timed | checkpoints_req | buffers_checkpoint | buffers_clean | maxwritten_clean | buffers_backend | buffers_alloc-------------------+-----------------+--------------------+---------------+------------------+-----------------+---------------804 | 2 | 39171885 | 22562 | 211 | 24759656 | 4488627(1 row)...I'm not sure that it is, because as I understand things, assuming I don't keep updating the same pages then buffers_backend should be a small percentage of buffers_alloc, and buffers_clean should be larger than it is compared to buffers_checkpoint. Is my understanding correct?My checkpoints are spread exactly 5 minutes apart, and a typical checkpoint log entry looks like:checkpoint complete: wrote 48289 buffers (3.7%); 0 transaction log file(s) added, 0 removed, 14 recycled; write=149.872 s, sync=0.378 s, total=150.256 sThe only bgwriter tunable we've adjusted so far is bgwriter_lru_maxpages = 500, though we've also set checkpoint_segments = 768 (not that we need it that high, but we have the space on the wal volume.)
I should have added that those 211 maxwritten_clean entries came about before we set bgwriter_lru_maxpages to 500. And the 2 requested checkpoints came with the initial slony load.
Ben Chobot wrote: > As I understand things, assuming I don't keep updating the same pages > then buffers_backend should be a small percentage of buffers_alloc, > and buffers_clean should be larger than it is compared to > buffers_checkpoint. Is my understanding correct? Sure; your buffers_clean is really low relative to the totals. You should take a snapshot now that you've fixed bgwriter_lru_maxpages, with a timestamp, and then another sometime later to get really useful numbers. A diff only considering the current setup and with a time interval to go along with it is much more useful than the aggregate numbers here (patch to make that easier already in 9.0: http://blog.2ndquadrant.com/en/2010/01/measuring-postgresql-checkpoin.html ) Keep taking regular snapshots with a timestamp: select current_timestamp,* from pg_stat_bgwriter; So you can compute a diff to measure what's changing as you go. The only explanation I can offer is that your workload might be really bursty. The method used for estimating how much the cleaner should do is most likely to break down when the load comes in narrow spikes. The main way to improve response in that situation is by decreasing the interval, so it kicks in and does the "what's happened during the last <n> ms?" computations more often. Right now, a burst that lasts less than 200ms can be completely missed, if the system was mostly idle before that. You can try lowering bgwriter_delay and proportionally decreasing bgwriter_lru_maxpages to make response time to burst workloads better. In your situation, I'd try make the writer wake up 4X as often, only do 1/4 as much maximum work as it currently does each time, and doubling the multiplier too; see if things move in the right direction, and maybe keep going from there afterwards. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On Feb 17, 2010, at 3:41 PM, Greg Smith wrote: > Ben Chobot wrote: >> As I understand things, assuming I don't keep updating the same pages then buffers_backend should be a small percentageof buffers_alloc, and buffers_clean should be larger than it is compared to buffers_checkpoint. Is my understandingcorrect? > > Sure; your buffers_clean is really low relative to the totals. You should take a snapshot now that you've fixed bgwriter_lru_maxpages,with a timestamp, and then another sometime later to get really useful numbers. A diff only consideringthe current setup and with a time interval to go along with it is much more useful than the aggregate numbershere (patch to make that easier already in 9.0: http://blog.2ndquadrant.com/en/2010/01/measuring-postgresql-checkpoin.html) Keep taking regular snapshots with a timestamp: > > select current_timestamp,* from pg_stat_bgwriter; > > So you can compute a diff to measure what's changing as you go. > > The only explanation I can offer is that your workload might be really bursty. The method used for estimating how muchthe cleaner should do is most likely to break down when the load comes in narrow spikes. The main way to improve responsein that situation is by decreasing the interval, so it kicks in and does the "what's happened during the last <n>ms?" computations more often. Right now, a burst that lasts less than 200ms can be completely missed, if the system wasmostly idle before that. > > You can try lowering bgwriter_delay and proportionally decreasing bgwriter_lru_maxpages to make response time to burstworkloads better. In your situation, I'd try make the writer wake up 4X as often, only do 1/4 as much maximum workas it currently does each time, and doubling the multiplier too; see if things move in the right direction, and maybekeep going from there afterwards. Thanks for the suggestions Greg. I'll monitor it closely over the next few days, but it doesn't really seem to have changedmuch so far. Is there a way to tell if I really am just keeping the same few pages dirty throughout every checkpoint?I wouldn't have expected that, but given our application I suppose it is possible.
Ben Chobot wrote: > Is there a way to tell if I really am just keeping the same few pages dirty throughout every checkpoint? I wouldn't haveexpected that, but given our application I suppose it is possible. You can install pg_buffercache and look at what's in the cache to check your theory. I have some sample queries that show neat things at http://www.westnet.com/~gsmith/content/postgresql/bufcache.sh -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On Feb 17, 2010, at 6:38 PM, Greg Smith wrote:
Ben Chobot wrote:Is there a way to tell if I really am just keeping the same few pages dirty throughout every checkpoint? I wouldn't have expected that, but given our application I suppose it is possible.
You can install pg_buffercache and look at what's in the cache to check your theory. I have some sample queries that show neat things at http://www.westnet.com/~gsmith/content/postgresql/bufcache.sh
This appears to be fairly typical:
# select count(*),isdirty,usagecount from pg_buffercache group by isdirty,usagecount order by usagecount desc,isdirty;
count | isdirty | usagecount
--------+---------+------------
670629 | f | 5
75766 | t | 5
237311 | f | 4
5372 | t | 4
74682 | f | 3
31 | t | 3
73786 | f | 2
18 | t | 2
104112 | f | 1
62 | t | 1
68951 | f | 0
(11 rows)
Is it reading it correctly to say that the bgwriter probably wouldn't help much, because a majority of the dirty pages appear to be popular?
Ben Chobot wrote: > Is it reading it correctly to say that the bgwriter probably wouldn't > help much, because a majority of the dirty pages appear to be popular? Yes. The background writer cleaner process only does something useful if there are pages with low usage counts it can evict. You would need to increase shared_buffers significantly before it's likely that would happen. Right now, 87% of your buffer cache has a usage count of 2 or higher, which basically means it's filled with almost nothing but the working set of data it never wants to evict unless it's for a checkpoint. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On Feb 22, 2010, at 6:47 PM, Greg Smith wrote: > Ben Chobot wrote: >> Is it reading it correctly to say that the bgwriter probably wouldn't help much, because a majority of the dirty pagesappear to be popular? > > Yes. The background writer cleaner process only does something useful if there are pages with low usage counts it canevict. You would need to increase shared_buffers significantly before it's likely that would happen. Right now, 87%of your buffer cache has a usage count of 2 or higher, which basically means it's filled with almost nothing but the workingset of data it never wants to evict unless it's for a checkpoint. Hm, my shared_buffers is already 10GB, but I'm using about 80GB for filesystem cache. Would a larger shared_buffers makesense? I thought I read somewhere that 10GB is on the high end of the useful size for shared_buffers.
Ben Chobot wrote: > > Hm, my shared_buffers is already 10GB, but I'm using about 80GB for filesystem cache. Would a larger shared_buffers makesense? I thought I read somewhere that 10GB is on the high end of the useful size for shared_buffers. Yeah, I figured that out when I was analyzing your figures and thought I'd missed a decimal place when I first saw it. The problem with huge increases in shared_buffer is that they can increase the amount of time it takes to allocate a new buffer. If all you've got in there are lots of records with high usage counts, it can end up taking multiple "sweeps of the clock hand" over things to drop usage counts. With >75% of your buffers already having a usage count of 4 or 5, you've already gone pretty far in the direction where that could happen. With still around 13% only have a 0 or 1 usage count I don't think it's too bad yet. You're certainly not in well explored territory though. If you were seeing large amounts of backend writes or buffers being allocated, maybe a larger shared_buffers would make sense. From the snapshots of data you've provided, that doesn't seem to be the case though, so I wouldn't be too worried about it. The only thing you could really do here is increase checkpoint_timeout - with this much data, having a checkpoint every 5 minutes is on the fast side, and I'd bet you could tolerate the disk space and additional crash recovery time in return for better average performance the rest of the time. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us