Re: Improvement of checkpoint IO scheduler for stable transaction responses

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Improvement of checkpoint IO scheduler for stable transaction responses
Дата
Msg-id 51C9FD5C.5050706@vmware.com
обсуждение исходный текст
Ответ на Re: Improvement of checkpoint IO scheduler for stable transaction responses  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Improvement of checkpoint IO scheduler for stable transaction responses  (KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp>)
Re: Improvement of checkpoint IO scheduler for stable transaction responses  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 25.06.2013 23:03, Robert Haas wrote:
> On Tue, Jun 25, 2013 at 1:15 PM, Heikki Linnakangas
> <hlinnakangas@vmware.com>  wrote:
>> I'm not sure it's a good idea to sleep proportionally to the time it took to
>> complete the previous fsync. If you have a 1GB cache in the RAID controller,
>> fsyncing the a 1GB segment will fill it up. But since it fits in cache, it
>> will return immediately. So we proceed fsyncing other files, until the cache
>> is full and the fsync blocks. But once we fill up the cache, it's likely
>> that we're hurting concurrent queries. ISTM it would be better to stay under
>> that threshold, keeping the I/O system busy, but never fill up the cache
>> completely.
>
> Isn't the behavior implemented by the patch a reasonable approximation
> of just that?  When the fsyncs start to get slow, that's when we start
> to sleep.   I'll grant that it would be better to sleep when the
> fsyncs are *about* to get slow, rather than when they actually have
> become slow, but we have no way to know that.

Well, that's the point I was trying to make: you should sleep *before* 
the fsyncs get slow.

> The only feedback we have on how bad things are is how long it took
> the last fsync to complete, so I actually think that's a much better
> way to go than any fixed sleep - which will often be unnecessarily
> long on a well-behaved system, and which will often be far too short
> on one that's having trouble. I'm inclined to think think Kondo-san
> has got it right.

Quite possible, I really don't know. I'm inclined to first try the 
simplest thing possible, and only make it more complicated if that's not 
good enough. Kondo-san's patch wasn't very complicated, but nevertheless 
a fixed sleep between every fsync, unless you're behind the schedule, is 
even simpler. In particular, it's easier to tie that into the checkpoint 
scheduler - I'm not sure how you'd measure progress or determine how 
long to sleep unless you assume that every fsync is the same.

> I like your idea of putting a stake in the ground and assuming that
> the fsync phase will turn out to be X% of the checkpoint, but I wonder
> if we can be a bit more sophisticated, especially for cases where
> checkpoint_segments is small.  When checkpoint_segments is large, then
> we know that some of the data will get written back to disk during the
> write phase, because the OS cache is only so big.  But when it's
> small, the OS will essentially do nothing during the write phase, and
> then it's got to write all the data out during the fsync phase.  I'm
> not sure we can really model that effect thoroughly, but even
> something dumb would be smarter than what we have now - e.g. use 10%,
> but when checkpoint_segments<  10, use 1/checkpoint_segments.  Or just
> assume the fsync phase will take 30 seconds.

If checkpoint_segments < 10, there isn't very much dirty data to flush 
out. This isn't really problem in that case - no matter how stupidly we 
do the writing and fsyncing. the I/O cache can absorb it. It doesn't 
really matter what we do in that case.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Improvement of checkpoint IO scheduler for stable transaction responses
Следующее
От: Claudio Freire
Дата:
Сообщение: Re: Hash partitioning.