Re: Load distributed checkpoint

Поиск

Список

Период

Сортировка

От	Jim C. Nasby
Тема	Re: Load distributed checkpoint
Дата	28 декабря 2006 г. 07:36:07
Msg-id	20061228113551.GP71246@nasby.net обсуждение исходный текст
Ответ на	Re: Load distributed checkpoint ("Simon Riggs" <simon@2ndquadrant.com>)
Список	pgsql-hackers

Дерево обсуждения

On Wed, Dec 27, 2006 at 10:54:57PM +0000, Simon Riggs wrote:
> On Wed, 2006-12-27 at 23:26 +0100, Martijn van Oosterhout wrote:
> > On Wed, Dec 27, 2006 at 09:24:06PM +0000, Simon Riggs wrote:
> > > On Fri, 2006-12-22 at 13:53 -0500, Bruce Momjian wrote:
> > > 
> > > > I assume other kernels have similar I/O smoothing, so that data sent to
> > > > the kernel via write() gets to disk within 30 seconds.  
> > > > 
> > > > I assume write() is not our checkpoint performance problem, but the
> > > > transfer to disk via fsync().  
> > > 
> > > Well, its correct to say that the transfer to disk is the source of the
> > > problem, but that doesn't only occur when we fsync(). There are actually
> > > two disk storms that occur, because of the way the fs cache works. [Ron
> > > referred to this effect uplist]
> > 
> > As someone looking from the outside:
> > 
> > fsync only works on one file, so presumably the checkpoint process is
> > opening each file one by one and fsyncing them. 
> 
> Yes
> 
> > Does that make any
> > difference here? Could you adjust the timing here?
> 
> Thats the hard bit with io storm 2. When you fsync a file you don't
> actually know how many blocks you're writing, plus there's no way to
> slow down those writes by putting delays between them (although its
> possible your controller might know how to do this, I've never heard of
> one that does).

Any controller that sophisticated would likely also have a BBU and write
caching, which should greatly reduce the impact of at least the fsync
storm... unless you fill the cache. I suspect we might need a way to
control how much data we try and push out at a time to avoid that...

As for settings, I really like the simplicity of the Oracle system...
"Just try to ensure recovery takes about X amount of seconds". I like
the idea of a creeping checkpoint even more; only writing a buffer out
when we need to checkpoint it makes a lot more sense to me than trying
to guess when we'll next dirty a buffer. Such a system would probably
also be a lot easier to tune than the current bgwriter, even if we
couldn't simplify it all the way to "seconds for recovery".
-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Load distributed checkpoint