Re: shared_buffers documentation
От | Robert Haas |
---|---|
Тема | Re: shared_buffers documentation |
Дата | |
Msg-id | r2k603c8f071004161908g2bae5d83l3754862cb39a182@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: shared_buffers documentation (Greg Smith <greg@2ndquadrant.com>) |
Список | pgsql-hackers |
On Fri, Apr 16, 2010 at 9:47 PM, Greg Smith <greg@2ndquadrant.com> wrote: > Robert Haas wrote: >> Well, why can't they just hang out as dirty buffers in the OS cache, >> which is also designed to solve this problem? > > If the OS were guaranteed to be as suitable for this purpose as the approach > taken in the database, this might work. But much like the clock sweep > approach should outperform a simpler OS caching implementation in many > common workloads, there are a couple of spots where making dirty writes the > OS's problem can fall down: > > 1) That presumes that OS write coalescing will solve the problem for you by > merging repeat writes, which depending on implementation it might not. > > 2) On some filesystems, such as ext3, any write with an fsync behind it will > flush the whole write cache out and defeat this optimization. Since the > spread checkpoint design has some such writes going to the data disk in the > middle of the currently processing checkpoing, in those situations that's > likely to push the first write of that block to disk before it can be > combined with a second. If you'd have kept it in the buffer cache it might > survive as long as a full checkpoint cycle longer.. > > 3) The "timeout" as it were for shared buffers is driven by the distance > between checkpoints, typically as long as 5 minutes. The longest a > filesystem will hold onto a write is probably less. On Linux it's typically > 30 seconds before the OS considers a write important to get out to disk, > longest case; if you've already filled a lot of RAM with writes it can be > substantially less. Thanks for the explanation. That makes sense. Does this imply that the problems with shared_buffers being too small are going to be less with a read-mostly load? >> I guess the obvious question is whether Windows "doesn't need" more >> shared memory than that, or whether it "can't effectively use" more >> memory than that. > > It's probably can't effectively use. We know for a fact that applications > where blocks regularly accumulate high usage counts and have repeat > read/writes to them, which includes pgbench, benefit in several easy to > measure ways from using larger amounts of database buffer cache. There's > just plain old less churn of buffers going in and out of there. The > alternate explanation of "Windows is just so much better at read/write > caching that you should give it most of the RAM anyway" doesn't really sound > as probable as the more commonly proposed theory "Windows doesn't handle > large blocks of shared memory well". > > Note that there's no discussion of the why behind this is in the commit you > just did, just the description of what happens. The reasons why are left > undefined, which I feel is appropriate given we really don't know for sure. > Still waiting for somebody to let loose the Visual Studio profiler and > measure what's causing the degradation at larger sizes. Right - my purpose in wanting to revise the documentation was not to give a complete tutorial, which is obviously not practical, but to give people some guidelines that are better than our previous suggestion to use "a few tens of megabytes", which I think we've accomplished. The follow-up questions are mostly for my own benefit rather than the docs... ...Robert
В списке pgsql-hackers по дате отправления: