Re: Load distributed checkpoint

Поиск
Список
Период
Сортировка
От Jim C. Nasby
Тема Re: Load distributed checkpoint
Дата
Msg-id 20061228113551.GP71246@nasby.net
обсуждение исходный текст
Ответ на Re: Load distributed checkpoint  ("Simon Riggs" <simon@2ndquadrant.com>)
Список pgsql-hackers
On Wed, Dec 27, 2006 at 10:54:57PM +0000, Simon Riggs wrote:
> On Wed, 2006-12-27 at 23:26 +0100, Martijn van Oosterhout wrote:
> > On Wed, Dec 27, 2006 at 09:24:06PM +0000, Simon Riggs wrote:
> > > On Fri, 2006-12-22 at 13:53 -0500, Bruce Momjian wrote:
> > > 
> > > > I assume other kernels have similar I/O smoothing, so that data sent to
> > > > the kernel via write() gets to disk within 30 seconds.  
> > > > 
> > > > I assume write() is not our checkpoint performance problem, but the
> > > > transfer to disk via fsync().  
> > > 
> > > Well, its correct to say that the transfer to disk is the source of the
> > > problem, but that doesn't only occur when we fsync(). There are actually
> > > two disk storms that occur, because of the way the fs cache works. [Ron
> > > referred to this effect uplist]
> > 
> > As someone looking from the outside:
> > 
> > fsync only works on one file, so presumably the checkpoint process is
> > opening each file one by one and fsyncing them. 
> 
> Yes
> 
> > Does that make any
> > difference here? Could you adjust the timing here?
> 
> Thats the hard bit with io storm 2. When you fsync a file you don't
> actually know how many blocks you're writing, plus there's no way to
> slow down those writes by putting delays between them (although its
> possible your controller might know how to do this, I've never heard of
> one that does).

Any controller that sophisticated would likely also have a BBU and write
caching, which should greatly reduce the impact of at least the fsync
storm... unless you fill the cache. I suspect we might need a way to
control how much data we try and push out at a time to avoid that...

As for settings, I really like the simplicity of the Oracle system...
"Just try to ensure recovery takes about X amount of seconds". I like
the idea of a creeping checkpoint even more; only writing a buffer out
when we need to checkpoint it makes a lot more sense to me than trying
to guess when we'll next dirty a buffer. Such a system would probably
also be a lot easier to tune than the current bgwriter, even if we
couldn't simplify it all the way to "seconds for recovery".
-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dhanaraj M
Дата:
Сообщение: Re: [PATCHES] Allow the identifier length to be increased via a
Следующее
От: "Jim C. Nasby"
Дата:
Сообщение: Re: Dirty pages in freelist cause WAL stuck