Re: Improvement of checkpoint IO scheduler for stable transaction responses

Поиск
Список
Период
Сортировка
От KONDO Mitsumasa
Тема Re: Improvement of checkpoint IO scheduler for stable transaction responses
Дата
Msg-id 51CD873A.5070705@lab.ntt.co.jp
обсуждение исходный текст
Ответ на Re: Improvement of checkpoint IO scheduler for stable transaction responses  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Improvement of checkpoint IO scheduler for stable transaction responses  (KONDO Mitsumasa <kondo.mitsumasa@lab.ntt.co.jp>)
Список pgsql-hackers
(2013/06/28 0:08), Robert Haas wrote:
> On Tue, Jun 25, 2013 at 4:28 PM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
> I'm pretty sure Greg Smith tried it the fixed-sleep thing before and
> it didn't work that well.  I have also tried it and the resulting
> behavior was unimpressive.  It makes checkpoints take a long time to
> complete even when there's very little data to flush out to the OS,
> which is annoying; and when things actually do get ugly, the sleeps
> aren't long enough to matter.  See the timings Kondo-san posted
> downthread: 100ms delays aren't going let the system recover in any
> useful way when the fsync can take 13 s for one file.  On a system
> that's badly weighed down by I/O, the fsync times are often
> *extremely* long - 13 s is far from the worst you can see.  You have
> to give the system a meaningful time to recover from that, allowing
> other processes to make meaningful progress before you hit it again,
> or system performance just goes down the tubes.  Greg's test, IIRC,
> used 3 s sleeps rather than your proposal of 100 ms, but it still
> wasn't enough.
Yes. In write phase, checkpointer writes numerous 8KB dirty pages in each
SyncOneBuffer(), therefore it can be well for tiny(100ms) sleep time. But
in fsync phase, checkpointer writes scores of relation files in each fsync(),
therefore it can not be well for tiny sleep. It shoud need longer sleep time
for recovery IO performance. If we know its best sleep time, we had better use
previous fsync time. And if we want to prevent fast long fsync time, we had
better change relation file size which is 1GB in default max size to smaller.

Go back to the subject. Here is our patches test results. Fsync + write patch was
not good result in past result, so I retry benchmark in same condition. It seems
to get good perfomance than past result.

* Performance result in DBT-2 (WH340)
                | TPS      90%tile    Average  Maximum
---------------+---------------------------------------
original_0.7   | 3474.62  18.348328  5.739    36.977713
original_1.0   | 3469.03  18.637865  5.842    41.754421
fsync          | 3525.03  13.872711  5.382    28.062947
write          | 3465.96  19.653667  5.804    40.664066
fsync + write  | 3586.85  14.459486  4.960    27.266958
Heikki's patch | 3504.3   19.731743  5.761    38.33814

* HTML result in DBT-2
http://pgstatsinfo.projects.pgfoundry.org/RESULT/

In attached text, I also describe in each checkpoint time. fsync patch was seemed
to have longer time than not fsync patch. However, checkpoint schedule is on time
in checkpoint_timeout and allowable time. I think that it is most important
things in fsync phase that fast finished checkpoint is not but definitely and
assurance write pages in end of checkpoint. So my fsync patch is not wrong
working any more.

My write patch seems to have lot of riddle, so I try to investigate objective
result and theory of effect.

Best regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: changeset generation v5-01 - Patches & git tree
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Move unused buffers to freelist