Re: limiting hint bit I/O
От | Cédric Villemain |
---|---|
Тема | Re: limiting hint bit I/O |
Дата | |
Msg-id | AANLkTimx__uotguC6iWGLi4fgFyehKE=SYAmnCqFyHzU@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: limiting hint bit I/O (Cédric Villemain <cedric.villemain.debian@gmail.com>) |
Список | pgsql-hackers |
2011/2/7 Cédric Villemain <cedric.villemain.debian@gmail.com>: > 2011/2/7 Robert Haas <robertmhaas@gmail.com>: >> On Mon, Feb 7, 2011 at 10:48 AM, Bruce Momjian <bruce@momjian.us> wrote: >>> Robert Haas wrote: >>>> On Sat, Feb 5, 2011 at 4:31 PM, Bruce Momjian <bruce@momjian.us> wrote: >>>> > Uh, in this C comment: >>>> > >>>> > + ? ? ? ?* or not we want to take the time to write it. ?We allow up to 5% of >>>> > + ? ? ? ?* otherwise-not-dirty pages to be written due to hint bit changes, >>>> > >>>> > 5% of what? ?5% of all buffers? ?5% of all hint-bit-dirty ones? ?Can you >>>> > clarify this in the patch? >>>> >>>> 5% of buffers that are hint-bit-dirty but not otherwise dirty. ISTM >>>> that's exactly what the comment you just quoted says on its face, but >>>> I'm open to some other wording you want to propose. >>> >>> How about: >>> >>> otherwise-not-dirty -> only-hint-bit-dirty >>> >>> So 95% of your hint bit modificates are discarded if the pages is not >>> otherwise dirtied? That seems pretty radical. >> >> No, it's more subtle than that, although I admit it *is* radical. >> There are three ways that pages can get written out to disk: >> >> 1. Checkpoints. >> 2. Background writer activity. >> 3. Backends writing out dirty buffers because there are no clean >> buffers available to allocate. >> >> What the latest version of the patch implements is: >> >> 1. Checkpoints no longer write only-hint-bit-dirty pages to disk. >> Since a checkpoint doesn't evict pages from memory, the hint bits are >> still there to be written out (or not) by (2) or (3), below. >> >> 2. When the background writer's cleaning scan hits an >> only-hint-bit-dirty page, it writes it, same as before. This >> definitely doesn't result in the loss of any hint bits. >> >> 3. When a backend writes out a dirty buffer itself, because there are >> no clean buffers available to allocate, it initially writes them. But >> if there are more than 100 such pages per block of 2000 allocations, >> it recycles any after the first 100 without writing them. >> >> In normal operation, I suspect that there will be very little impact >> from this change. The change described in #1 may slightly reduce the >> size of some checkpoints, but it's unclear that it will be enough to >> be material. The change described in #3 will probably also not >> matter, because, in a well-tuned system, the background writer should >> be set aggressively enough to provide a supply of clean pages, and >> therefore backends shouldn't be doing many writes themselves, and >> therefore most buffer allocations will be of already-clean pages, and >> the logic described in #3 will probably never kick in. Even if they >> are writing a lot of buffers themselves, the logic in #3 still won't >> kick in if many of the pages being written are actually dirty - it >> will only matter if the backends are writing out lots and lots of >> pages *solely because they are only-hint-bit-dirty*. >> >> Where I expect this to make a big difference is on sequential scans of >> just-loaded tables. In that case, the BufferAccessStrategy machinery >> will force the backend to reuse the same buffers over and over again, >> and all of those pages will be only-hint-bit-dirty. So the backend >> has to do a write for every page it allocates, and even though those >> writes are being absorbed by the OS cache, it's still slow. With this >> patch, what will happen is that the backend will write about 100 >> pages, then perform the next 1900 allocations without writing, then >> write another 100 pages, etc. So at the end of the scan, instead of >> having written an amount of data equal to the size of the table, we >> will have written 5% of that amount, and 5% of the hint bits will be >> on disk. Each subsequent scan will get another 5% of the hint bits on >> disk until after 20 scans they are all set. So the work of setting >> the hint bits is spread out across the first 20 table scans instead of >> all being done the first time through. >> >> Clearly, there's further jiggering that can be done here. But the >> overall goal is simply that some of our users don't seem to like it >> when the first scan of a newly loaded table generates a huge storm of >> *write* traffic. Given that the hint bits appear to be quite >> important from a performance perspective (see benchmark numbers >> upthread), > > those are not real benchmarks, just quick guess to check behavior. > (and I agree it looks good, but I also got inconsistent results, the > patched postgresql hardly reach the same speed of the original > 9.1devel even after 200 hundreds select of your testcase) > > >> we don't really have the option of just not writing them - >> but we can try to not to do it all at once, if we think that's an >> improvement, which I think is likely. >> >> Overall, I'm inclined to move this patch to the next CommitFest and >> forget about it for now. I don't think we're going to get enough >> testing of this in the next week to be really confident that it's >> right. I might be willing to commit with some more moderate amount of >> testing if we were right at the beginning of a development cycle, >> figuring that we'd shake out any warts as the cycle went along, but >> this isn't seeming like the right time for this kind of a change. > > I agree. > I think it might be better to do the hint_bit_allowance decrement when > we write something (dirty or dirtyhint). > And so we can have something like : > > 100% writte : write dirty + hint > 5 % write : write 5 % of (dirty + hint) (instead of write 5% of the hint only). I mean XX% if possible :) (dirty stuff is dirty so we won't skip that) > > So come a simple Bandwith/IOrequest limiter. > Open for next commitfest :) > > > -- > Cédric Villemain 2ndQuadrant > http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support > -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support
В списке pgsql-hackers по дате отправления: