Обсуждение: Figuring out shared buffer pressure
As part of a blog, I started looking at how a user could measure the pressure on shared buffers, e.g. how much are they being used, recycled, etc. They way you normally do it on older operating systems is to see how many buffers on the free list (about to be reused) are reclaimed as needed --- that usually indicates kernel cache pressure. Unfortunately, we don't have a freelist, except for initial assignment of shared buffers on startup. I then started looking at pg_buffercache, and thought perhaps the 'usagecount' column could give me a way of measuring this. For example, excessive scanning of the shared buffers for replaceable buffers would indicate pressure, which might show as a low usagecount. I ran the attached SQL script, and got the attached results. You can see that the first few attempts to use many shared buffers was thwarted by our GetAccessStrategy() function that prevent sequential access from blowing away other shared buffers, limiting such scans to 256k: http://doxygen.postgresql.org/freelist_8c_source.html#l00410 Our storage/buffers/README file has the reason for the size (to fit in CPU cache), line 204: http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/storage/buffer/README I realize we can't size this based on shared buffers because it is based on the CPU cache size, but it seems we are purposely storing buffers in the kernel rather than shared buffers in this case. I suppose if we allowed a table to use more, there would be no way to keep all the memory in a single 256k area, but are we sure this is the right approach? Based on what I found, I can see no way users can see how heavily their shared buffers are being used. Is that correct? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Вложения
On Wed, May 30, 2012 at 9:56 AM, Bruce Momjian <bruce@momjian.us> wrote: > As part of a blog, I started looking at how a user could measure the > pressure on shared buffers, e.g. how much are they being used, recycled, > etc. > > They way you normally do it on older operating systems is to see how > many buffers on the free list (about to be reused) are reclaimed as > needed --- that usually indicates kernel cache pressure. Unfortunately, > we don't have a freelist, except for initial assignment of shared > buffers on startup. Isn't that what the buffers_alloc from pg_stat_bgwriter is ? Cheers, Jeff
On Wed, May 30, 2012 at 10:38:10AM -0700, Jeff Janes wrote: > On Wed, May 30, 2012 at 9:56 AM, Bruce Momjian <bruce@momjian.us> wrote: > > As part of a blog, I started looking at how a user could measure the > > pressure on shared buffers, e.g. how much are they being used, recycled, > > etc. > > > > They way you normally do it on older operating systems is to see how > > many buffers on the free list (about to be reused) are reclaimed as > > needed --- that usually indicates kernel cache pressure. Unfortunately, > > we don't have a freelist, except for initial assignment of shared > > buffers on startup. > > Isn't that what the buffers_alloc from pg_stat_bgwriter is ? The issue is that once a buffer is removed from the free list, it is never returned to the free list. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Wed, May 30, 2012 at 10:57 AM, Bruce Momjian <bruce@momjian.us> wrote: > On Wed, May 30, 2012 at 10:38:10AM -0700, Jeff Janes wrote: >> On Wed, May 30, 2012 at 9:56 AM, Bruce Momjian <bruce@momjian.us> wrote: >> > As part of a blog, I started looking at how a user could measure the >> > pressure on shared buffers, e.g. how much are they being used, recycled, >> > etc. >> > >> > They way you normally do it on older operating systems is to see how >> > many buffers on the free list (about to be reused) are reclaimed as >> > needed --- that usually indicates kernel cache pressure. Unfortunately, >> > we don't have a freelist, except for initial assignment of shared >> > buffers on startup. >> >> Isn't that what the buffers_alloc from pg_stat_bgwriter is ? > > The issue is that once a buffer is removed from the free list, it is > never returned to the free list. A buffer doesn't need to be removed from the linked list in order for buffers_alloc to get incremented. Conceptually, the freelist consists not only of the linked list, but also of all unpinned buffers with a usagecount of zero. Cheers, Jeff
On Wed, May 30, 2012 at 11:06:45AM -0700, Jeff Janes wrote: > On Wed, May 30, 2012 at 10:57 AM, Bruce Momjian <bruce@momjian.us> wrote: > > On Wed, May 30, 2012 at 10:38:10AM -0700, Jeff Janes wrote: > >> On Wed, May 30, 2012 at 9:56 AM, Bruce Momjian <bruce@momjian.us> wrote: > >> > As part of a blog, I started looking at how a user could measure the > >> > pressure on shared buffers, e.g. how much are they being used, recycled, > >> > etc. > >> > > >> > They way you normally do it on older operating systems is to see how > >> > many buffers on the free list (about to be reused) are reclaimed as > >> > needed --- that usually indicates kernel cache pressure. Unfortunately, > >> > we don't have a freelist, except for initial assignment of shared > >> > buffers on startup. > >> > >> Isn't that what the buffers_alloc from pg_stat_bgwriter is ? > > > > The issue is that once a buffer is removed from the free list, it is > > never returned to the free list. > > A buffer doesn't need to be removed from the linked list in order for > buffers_alloc to get incremented. Seems buffers_alloc is the number of calls to StrategyGetBuffer(), which tells how many time we have requested a buffer. Not sure how that helps measure buffer pressure. > Conceptually, the freelist consists not only of the linked list, but > also of all unpinned buffers with a usagecount of zero. True. I guess my problem is can't find out how many of those zero-uage-count buffers are being reclaimed as needed. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Wed, May 30, 2012 at 11:23 AM, Bruce Momjian <bruce@momjian.us> wrote: > On Wed, May 30, 2012 at 11:06:45AM -0700, Jeff Janes wrote: >> On Wed, May 30, 2012 at 10:57 AM, Bruce Momjian <bruce@momjian.us> wrote: >> > On Wed, May 30, 2012 at 10:38:10AM -0700, Jeff Janes wrote: >> >> >> >> Isn't that what the buffers_alloc from pg_stat_bgwriter is ? >> > >> > The issue is that once a buffer is removed from the free list, it is >> > never returned to the free list. >> >> A buffer doesn't need to be removed from the linked list in order for >> buffers_alloc to get incremented. > > Seems buffers_alloc is the number of calls to StrategyGetBuffer(), which > tells how many time we have requested a buffer. Not sure how that helps > measure buffer pressure. Once the linked list is empty, every request for a buffer to read a new page into must result in the eviction of the previous occupant from this conceptual freelist buffer (except perhaps for some race conditions). Isn't that what you wanted? Except that the buffers_alloc does not get incremented when the StrategyGetBuffer is satisfied by a ring strategy rather than the default strategy. > >> Conceptually, the freelist consists not only of the linked list, but >> also of all unpinned buffers with a usagecount of zero. > > True. I guess my problem is can't find out how many of those > zero-uage-count buffers are being reclaimed as needed. Do you need to figure this out specifically in the context of bulk strategies, or did you just pick a sequential scan because you thought it would be an easy test case? When I need to generate pressure on the buffer cache, I use pgbench -S. Cheers, Jeff
On Wed, May 30, 2012 at 11:51:23AM -0700, Jeff Janes wrote: > On Wed, May 30, 2012 at 11:23 AM, Bruce Momjian <bruce@momjian.us> wrote: > > On Wed, May 30, 2012 at 11:06:45AM -0700, Jeff Janes wrote: > >> On Wed, May 30, 2012 at 10:57 AM, Bruce Momjian <bruce@momjian.us> wrote: > >> > On Wed, May 30, 2012 at 10:38:10AM -0700, Jeff Janes wrote: > >> >> > >> >> Isn't that what the buffers_alloc from pg_stat_bgwriter is ? > >> > > >> > The issue is that once a buffer is removed from the free list, it is > >> > never returned to the free list. > >> > >> A buffer doesn't need to be removed from the linked list in order for > >> buffers_alloc to get incremented. > > > > Seems buffers_alloc is the number of calls to StrategyGetBuffer(), which > > tells how many time we have requested a buffer. Not sure how that helps > > measure buffer pressure. > > Once the linked list is empty, every request for a buffer to read a > new page into must result in the eviction of the previous occupant > from this conceptual freelist buffer (except perhaps for some race > conditions). Isn't that what you wanted? Except that the > buffers_alloc does not get incremented when the StrategyGetBuffer is > satisfied by a ring strategy rather than the default strategy. Well, the ideal case is that I could find out how often data that is near to be discarded is actually needed, hence the "reclaimed" field that is often important for kernel memory presssure reporting on older operating systems. I will post an email soon about my theory of why buffer pressure is an important thing to report to users. > >> Conceptually, the freelist consists not only of the linked list, but > >> also of all unpinned buffers with a usagecount of zero. > > > > True. I guess my problem is can't find out how many of those > > zero-uage-count buffers are being reclaimed as needed. > > Do you need to figure this out specifically in the context of bulk > strategies, or did you just pick a sequential scan because you thought > it would be an easy test case? When I need to generate pressure on > the buffer cache, I use pgbench -S. I thought it would just be easy. I was susprised we were favoring kernel reads over using more than 256k of shared buffers. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Wed, May 30, 2012 at 2:55 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Wed, May 30, 2012 at 11:51:23AM -0700, Jeff Janes wrote: >> On Wed, May 30, 2012 at 11:23 AM, Bruce Momjian <bruce@momjian.us> wrote: >> > On Wed, May 30, 2012 at 11:06:45AM -0700, Jeff Janes wrote: >> >> On Wed, May 30, 2012 at 10:57 AM, Bruce Momjian <bruce@momjian.us> wrote: >> >> > On Wed, May 30, 2012 at 10:38:10AM -0700, Jeff Janes wrote: >> >> >> >> >> >> Isn't that what the buffers_alloc from pg_stat_bgwriter is ? >> >> > >> >> > The issue is that once a buffer is removed from the free list, it is >> >> > never returned to the free list. >> >> >> >> A buffer doesn't need to be removed from the linked list in order for >> >> buffers_alloc to get incremented. >> > >> > Seems buffers_alloc is the number of calls to StrategyGetBuffer(), which >> > tells how many time we have requested a buffer. Not sure how that helps >> > measure buffer pressure. >> >> Once the linked list is empty, every request for a buffer to read a >> new page into must result in the eviction of the previous occupant >> from this conceptual freelist buffer (except perhaps for some race >> conditions). Isn't that what you wanted? Except that the >> buffers_alloc does not get incremented when the StrategyGetBuffer is >> satisfied by a ring strategy rather than the default strategy. > > Well, the ideal case is that I could find out how often data that is > near to be discarded is actually needed, hence the "reclaimed" field > that is often important for kernel memory presssure reporting on older > operating systems. I will post an email soon about my theory of why > buffer pressure is an important thing to report to users. Ah, now I see. By reclaimed I thought you meant claimed for reuse with a new page, but you mean it was found to already have the page we wanted and a usagecount of zero and was unpinned, and so would have been in danger of eviction if we hadn't just now pinned it and bumped the usagecount. Yeah, I don't think anything currently reported will help with that. Cheers, Jeff
On Wed, May 30, 2012 at 05:55:07PM -0400, Bruce Momjian wrote: > > > Seems buffers_alloc is the number of calls to StrategyGetBuffer(), which > > > tells how many time we have requested a buffer. Not sure how that helps > > > measure buffer pressure. > > > > Once the linked list is empty, every request for a buffer to read a > > new page into must result in the eviction of the previous occupant > > from this conceptual freelist buffer (except perhaps for some race > > conditions). Isn't that what you wanted? Except that the > > buffers_alloc does not get incremented when the StrategyGetBuffer is > > satisfied by a ring strategy rather than the default strategy. > > Well, the ideal case is that I could find out how often data that is > near to be discarded is actually needed, hence the "reclaimed" field > that is often important for kernel memory presssure reporting on older > operating systems. I will post an email soon about my theory of why > buffer pressure is an important thing to report to users. OK, realizing there is no simple way to measure shared buffer pressure, let me explain why I want to. Right now we simplisticly recommend 25% of RAM for shared_buffers, with a maximum of 8GB (512MB on Windows). This helps to be sure that there are sufficient kernel buffers for high-write operations, and perhaps a kernel cache larger than shared buffers. However, this doesn't help people configure shared buffers larger (e.g. 35%) if their working set is larger. Right now, I don't see how a user would know this is happening. On the flip side, they might have a smaller working set than 25% and spending the overhead of managing 1 million shared buffers. Again, there is no way to know if that is the case. For example, we have reports that larger shared buffers is sometimes better, sometimes not, but there is no feedback we give the user to explain why this is happening. My guess is that if their working set is larger than 25% of RAM, they benefit, if not, the buffer management overhead makes things slower. I feel we need to allow users to get clearer information on how active their shared buffer cache is, perhaps allowing them to shink/grow it as appropriate. Asking them to blindly try different shared buffer sizes seems suboptimal. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Le jeudi 31 mai 2012 19:11:07, Bruce Momjian a écrit : > On Wed, May 30, 2012 at 05:55:07PM -0400, Bruce Momjian wrote: > > > > Seems buffers_alloc is the number of calls to StrategyGetBuffer(), > > > > which tells how many time we have requested a buffer. Not sure how > > > > that helps measure buffer pressure. > > > > > > Once the linked list is empty, every request for a buffer to read a > > > new page into must result in the eviction of the previous occupant > > > from this conceptual freelist buffer (except perhaps for some race > > > conditions). Isn't that what you wanted? Except that the > > > buffers_alloc does not get incremented when the StrategyGetBuffer is > > > satisfied by a ring strategy rather than the default strategy. > > > > Well, the ideal case is that I could find out how often data that is > > near to be discarded is actually needed, hence the "reclaimed" field > > that is often important for kernel memory presssure reporting on older > > operating systems. I will post an email soon about my theory of why > > buffer pressure is an important thing to report to users. > > OK, realizing there is no simple way to measure shared buffer pressure, > let me explain why I want to. > > Right now we simplisticly recommend 25% of RAM for shared_buffers, with > a maximum of 8GB (512MB on Windows). This helps to be sure that there > are sufficient kernel buffers for high-write operations, and perhaps a > kernel cache larger than shared buffers. > > However, this doesn't help people configure shared buffers larger (e.g. > 35%) if their working set is larger. Right now, I don't see how a user > would know this is happening. On the flip side, they might have a > smaller working set than 25% and spending the overhead of managing 1 > million shared buffers. Again, there is no way to know if that is the > case. > > For example, we have reports that larger shared buffers is sometimes > better, sometimes not, but there is no feedback we give the user to > explain why this is happening. My guess is that if their working set is > larger than 25% of RAM, they benefit, if not, the buffer management > overhead makes things slower. > > I feel we need to allow users to get clearer information on how active > their shared buffer cache is, perhaps allowing them to shink/grow it as > appropriate. Asking them to blindly try different shared buffer sizes > seems suboptimal. There is also a recent thread about the usagecount used. Jeff suggested to maybe increase by 2 and decrease by 1. This is really near another idea I had but I didn't tested yet. Increment/decrement by 1 or 2, just make even numbers ascending, and odd number descending. So that by just looking at the usage count you can know what's going with your buffers (more odd than even or reverse ? building the cache or wasting it or ...). it should also allows a finer control of eviction. Increase/decrease shared_buffer while postgresql is up is a nice idea. -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation
On Thu, May 31, 2012 at 10:11 AM, Bruce Momjian <bruce@momjian.us> wrote: > However, this doesn't help people configure shared buffers larger (e.g. > 35%) if their working set is larger. Right now, I don't see how a user > would know this is happening. On the flip side, they might have a > smaller working set than 25% and spending the overhead of managing 1 > million shared buffers. Again, there is no way to know if that is the > case. Another important use case: downgrades. They do happen, and right now are amazingly risky and made with limited information. Clearly a most complete picture is impossible because of reliance on kernel buffer management, but knowing the PG buffer pool occupancy and flux seems like it'd be so much better than knowing nothing, and it is likely that some conservative intuition could be learned to perform relatively safe downgrades. -- fdr