Re: Improvement of checkpoint IO scheduler for stable transaction responses
От | Heikki Linnakangas |
---|---|
Тема | Re: Improvement of checkpoint IO scheduler for stable transaction responses |
Дата | |
Msg-id | 51C9FD5C.5050706@vmware.com обсуждение исходный текст |
Ответ на | Re: Improvement of checkpoint IO scheduler for stable transaction responses (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: Improvement of checkpoint IO scheduler for stable transaction
responses
(KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp>)
Re: Improvement of checkpoint IO scheduler for stable transaction responses (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
On 25.06.2013 23:03, Robert Haas wrote: > On Tue, Jun 25, 2013 at 1:15 PM, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: >> I'm not sure it's a good idea to sleep proportionally to the time it took to >> complete the previous fsync. If you have a 1GB cache in the RAID controller, >> fsyncing the a 1GB segment will fill it up. But since it fits in cache, it >> will return immediately. So we proceed fsyncing other files, until the cache >> is full and the fsync blocks. But once we fill up the cache, it's likely >> that we're hurting concurrent queries. ISTM it would be better to stay under >> that threshold, keeping the I/O system busy, but never fill up the cache >> completely. > > Isn't the behavior implemented by the patch a reasonable approximation > of just that? When the fsyncs start to get slow, that's when we start > to sleep. I'll grant that it would be better to sleep when the > fsyncs are *about* to get slow, rather than when they actually have > become slow, but we have no way to know that. Well, that's the point I was trying to make: you should sleep *before* the fsyncs get slow. > The only feedback we have on how bad things are is how long it took > the last fsync to complete, so I actually think that's a much better > way to go than any fixed sleep - which will often be unnecessarily > long on a well-behaved system, and which will often be far too short > on one that's having trouble. I'm inclined to think think Kondo-san > has got it right. Quite possible, I really don't know. I'm inclined to first try the simplest thing possible, and only make it more complicated if that's not good enough. Kondo-san's patch wasn't very complicated, but nevertheless a fixed sleep between every fsync, unless you're behind the schedule, is even simpler. In particular, it's easier to tie that into the checkpoint scheduler - I'm not sure how you'd measure progress or determine how long to sleep unless you assume that every fsync is the same. > I like your idea of putting a stake in the ground and assuming that > the fsync phase will turn out to be X% of the checkpoint, but I wonder > if we can be a bit more sophisticated, especially for cases where > checkpoint_segments is small. When checkpoint_segments is large, then > we know that some of the data will get written back to disk during the > write phase, because the OS cache is only so big. But when it's > small, the OS will essentially do nothing during the write phase, and > then it's got to write all the data out during the fsync phase. I'm > not sure we can really model that effect thoroughly, but even > something dumb would be smarter than what we have now - e.g. use 10%, > but when checkpoint_segments< 10, use 1/checkpoint_segments. Or just > assume the fsync phase will take 30 seconds. If checkpoint_segments < 10, there isn't very much dirty data to flush out. This isn't really problem in that case - no matter how stupidly we do the writing and fsyncing. the I/O cache can absorb it. It doesn't really matter what we do in that case. - Heikki
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Robert HaasДата:
Сообщение: Re: Improvement of checkpoint IO scheduler for stable transaction responses