Re: Load distributed checkpoint

Поиск
Список
Период
Сортировка
От Jim C. Nasby
Тема Re: Load distributed checkpoint
Дата
Msg-id 20061213063242.GS6551@nasby.net
обсуждение исходный текст
Ответ на Re: Load distributed checkpoint  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Fri, Dec 08, 2006 at 11:43:27AM -0500, Tom Lane wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> > "Jim C. Nasby" <jim@nasby.net> wrote: 
> >> Generally, I try and configure the all* settings so that you'll get 1
> >> clock-sweep per checkpoint_timeout. It's worked pretty well, but I don't
> >> have any actual tests to back that methodology up.
> 
> > We got to these numbers somewhat scientifically.  I studied I/O
> > patterns under production load and figured we should be able to handle
> > about 800 writes in per 200 ms without causing problems.  I have to
> > admit that I based the percentages and the ratio between "all" and "lru"
> > on gut feel after musing over the documentation.
> 
> I like Kevin's settings better than what Jim suggests.  If the bgwriter
> only makes one sweep between checkpoints then it's hardly going to make
> any impact at all on the number of dirty buffers the checkpoint will
> have to write.  The point of the bgwriter is to reduce the checkpoint
> I/O spike by doing writes between checkpoints, and to have any
> meaningful impact on that, you'll need it to make the cycle several times.

It would be good if the docs included more detailed info on how exactly
the bgwriter goes about flushing stuff to disk. You can certainly read
them and think that the bgwriter just goes through and issues writes for
any dirty buffers it finds. Though, looking at BgBufferSync, I think it
actually does write out pages during the "all" scan, regardless of what
usage_count says.

> I wonder whether it would be feasible to teach the bgwriter to get more
> aggressive as the time for the next checkpoint approaches?  Writes
> issued early in the interval have a much higher probability of being
> wasted (because the page gets re-dirtied later).  But maybe that just
> reduces to what Takahiro-san already suggested, namely that
> checkpoint-time writes should be done with the same kind of scheduling
> the bgwriter uses outside checkpoints.  We still have the problem that
> the real I/O storm is triggered by fsync() not write(), and we don't
> have a way to spread out the consequences of fsync().

Would the ramp-up of write activity push the kernel to actually write
stuff? My understanding is that most OSes have a time limit on how long
they'll let a write-request sit in cache, so ISTM a better way to smooth
out disk IO is to write things in a steady stream.

If the bgwriter takes the buffer access counter into account when
deciding what to write out, it might make sense to write more recently
accessed pages as checkpoint nears. The idea being that odds are good
those buffers are about to get flushed by BufferSync() anyway.

Also, I have a dumb question... BgBufferSync uses buf_id1 to keep track
of what buffer the bgwriter_all scan is looking at, which means that
it should remember where it was at the end of the last scan; yet it's
initialized to 0 every time BgBufferSync is called. Is there someplace
else that is remembering where the complete scan is leaving off when
bgwriter_all_percent or bgwriter_all_maxpages is hit? Or does the scan
in fact just keep re-scanning the beginning of the buffers?
-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Better management of mergejoinable operators
Следующее
От: Andrew - Supernews
Дата:
Сообщение: Re: Better management of mergejoinable operators