Re: limiting hint bit I/O

Поиск
Список
Период
Сортировка
От Cédric Villemain
Тема Re: limiting hint bit I/O
Дата
Msg-id AANLkTimx__uotguC6iWGLi4fgFyehKE=SYAmnCqFyHzU@mail.gmail.com
обсуждение исходный текст
Ответ на Re: limiting hint bit I/O  (Cédric Villemain <cedric.villemain.debian@gmail.com>)
Список pgsql-hackers
2011/2/7 Cédric Villemain <cedric.villemain.debian@gmail.com>:
> 2011/2/7 Robert Haas <robertmhaas@gmail.com>:
>> On Mon, Feb 7, 2011 at 10:48 AM, Bruce Momjian <bruce@momjian.us> wrote:
>>> Robert Haas wrote:
>>>> On Sat, Feb 5, 2011 at 4:31 PM, Bruce Momjian <bruce@momjian.us> wrote:
>>>> > Uh, in this C comment:
>>>> >
>>>> > + ? ? ? ?* or not we want to take the time to write it. ?We allow up to 5% of
>>>> > + ? ? ? ?* otherwise-not-dirty pages to be written due to hint bit changes,
>>>> >
>>>> > 5% of what? ?5% of all buffers? ?5% of all hint-bit-dirty ones? ?Can you
>>>> > clarify this in the patch?
>>>>
>>>> 5% of buffers that are hint-bit-dirty but not otherwise dirty.  ISTM
>>>> that's exactly what the comment you just quoted says on its face, but
>>>> I'm open to some other wording you want to propose.
>>>
>>> How about:
>>>
>>>        otherwise-not-dirty -> only-hint-bit-dirty
>>>
>>> So 95% of your hint bit modificates are discarded if the pages is not
>>> otherwise dirtied?  That seems pretty radical.
>>
>> No, it's more subtle than that, although I admit it *is* radical.
>> There are three ways that pages can get written out to disk:
>>
>> 1. Checkpoints.
>> 2. Background writer activity.
>> 3. Backends writing out dirty buffers because there are no clean
>> buffers available to allocate.
>>
>> What the latest version of the patch implements is:
>>
>> 1. Checkpoints no longer write only-hint-bit-dirty pages to disk.
>> Since a checkpoint doesn't evict pages from memory, the hint bits are
>> still there to be written out (or not) by (2) or (3), below.
>>
>> 2. When the background writer's cleaning scan hits an
>> only-hint-bit-dirty page, it writes it, same as before.  This
>> definitely doesn't result in the loss of any hint bits.
>>
>> 3. When a backend writes out a dirty buffer itself, because there are
>> no clean buffers available to allocate, it initially writes them.  But
>> if there are more than 100 such pages per block of 2000 allocations,
>> it recycles any after the first 100 without writing them.
>>
>> In normal operation, I suspect that there will be very little impact
>> from this change.  The change described in #1 may slightly reduce the
>> size of some checkpoints, but it's unclear that it will be enough to
>> be material.  The change described in #3 will probably also not
>> matter, because, in a well-tuned system, the background writer should
>> be set aggressively enough to provide a supply of clean pages, and
>> therefore backends shouldn't be doing many writes themselves, and
>> therefore most buffer allocations will be of already-clean pages, and
>> the logic described in #3 will probably never kick in.  Even if they
>> are writing a lot of buffers themselves, the logic in #3 still won't
>> kick in if many of the pages being written are actually dirty - it
>> will only matter if the backends are writing out lots and lots of
>> pages *solely because they are only-hint-bit-dirty*.
>>
>> Where I expect this to make a big difference is on sequential scans of
>> just-loaded tables.  In that case, the BufferAccessStrategy machinery
>> will force the backend to reuse the same buffers over and over again,
>> and all of those pages will be only-hint-bit-dirty.  So the backend
>> has to do a write for every page it allocates, and even though those
>> writes are being absorbed by the OS cache, it's still slow.  With this
>> patch, what will happen is that the backend will write about 100
>> pages, then perform the next 1900 allocations without writing, then
>> write another 100 pages, etc.  So at the end of the scan, instead of
>> having written an amount of data equal to the size of the table, we
>> will have written 5% of that amount, and 5% of the hint bits will be
>> on disk.  Each subsequent scan will get another 5% of the hint bits on
>> disk until after 20 scans they are all set.  So the work of setting
>> the hint bits is spread out across the first 20 table scans instead of
>> all being done the first time through.
>>
>> Clearly, there's further jiggering that can be done here.  But the
>> overall goal is simply that some of our users don't seem to like it
>> when the first scan of a newly loaded table generates a huge storm of
>> *write* traffic.  Given that the hint bits appear to be quite
>> important from a performance perspective (see benchmark numbers
>> upthread),
>
> those are not real benchmarks, just quick guess to check behavior.
> (and I agree it looks good, but I also got inconsistent results, the
> patched postgresql hardly reach the same speed of the original
> 9.1devel even after 200 hundreds select of your testcase)
>
>
>> we don't really have the option of just not writing them -
>> but we can try to not to do it all at once, if we think that's an
>> improvement, which I think is likely.
>>
>> Overall, I'm inclined to move this patch to the next CommitFest and
>> forget about it for now.  I don't think we're going to get enough
>> testing of this in the next week to be really confident that it's
>> right.  I might be willing to commit with some more moderate amount of
>> testing if we were right at the beginning of a development cycle,
>> figuring that we'd shake out any warts as the cycle went along, but
>> this isn't seeming like the right time for this kind of a change.
>
> I agree.
> I think it might be better to do the hint_bit_allowance decrement when
> we write something (dirty or dirtyhint).
> And so we can have something like :
>
> 100% writte :  write dirty + hint
> 5 % write : write 5 % of (dirty + hint) (instead of write 5% of the hint only).

I mean XX% if possible :) (dirty stuff is dirty so we won't skip that)

>
> So come a simple Bandwith/IOrequest limiter.
> Open for next commitfest :)
>
>
> --
> Cédric Villemain               2ndQuadrant
> http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support
>



--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Cédric Villemain
Дата:
Сообщение: Re: limiting hint bit I/O
Следующее
От: Dimitri Fontaine
Дата:
Сообщение: Re: Sync Rep for 2011CF1