Re: Spread checkpoint sync

Поиск
Список
Период
Сортировка
От Greg Smith
Тема Re: Spread checkpoint sync
Дата
Msg-id 4CE94AC6.4040409@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Spread checkpoint sync  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-hackers
Jeff Janes wrote:
> And for very large memory
> systems, even 1% may be too much to cache (dirty*_ratio can only be
> set in integer percent points), so recent kernels introduced
> dirty*_bytes parameters.  I like these better because they do what
> they say.  With the dirty*_ratio, I could never figure out what it was
> a ratio of, and the results were unpredictable without extensive
> experimentation.
>   

Right, you can't set dirty_background_ratio low enough to make this 
problem go away.  Even attempts to set it to 1%, back when that that was 
the right size for it, seem to be defeated by other mechanisms within 
the kernel.  Last time I looked at the related source code, it seemed 
the "congestion control" logic that kicks in to throttle writes was a 
likely suspect.  This is why I'm not real optimistic about newer 
mechanism like the dirty_background_bytes added 2.6.29 to help here, as 
that just gives a mapping to setting lower values; the same basic logic 
is under the hood.

Like Jeff, I've never seen dirty_expire_centisecs help at all, possibly 
due to the same congestion mechanism. 

> Yes, but how much work do we want to put into redoing the checkpoint
> logic so that the sysadmin on a particular OS and configuration and FS
> can avoid having to change the kernel parameters away from their
> defaults?  (Assuming of course I am correctly understanding the
> problem, always a dangerous assumption.)
>   

I've been trying to make this problem go away using just the kernel 
tunables available since 2006.  I adjusted them carefully on the server 
that ran into this problem so badly that it motivated the submitted 
patch, months before this issue got bad.  It didn't help.  Maybe if they 
were running a later kernel that supported dirty_background_bytes that 
would have worked better.  During the last few years, the only thing 
that has consistently helped in every case is the checkpoint spreading 
logic that went into 8.3.  I no longer expect that the kernel developers 
will ever make this problem go away the way checkpoints are written out 
right now, whereas the last good PostgreSQL work in this area definitely 
helped.

The basic premise of the current checkpoint code is that if you write 
all of the buffers out early enough, by the time syncs execute enough of 
the data should have gone out that those don't take very long to 
process.  That was usually true for the last few years, on systems with 
a battery-backed cache; the amount of memory cached by the OS was 
relatively small relative to the RAID cache size.  That's not the case 
anymore, and that divergence is growing bigger.

The idea that the checkpoint sync code can run in a relatively tight 
loop, without stopping to do the normal background writer cleanup work, 
is also busted by that observation.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Latches with weak memory ordering (Re: max_wal_senders must die)
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Improving prep_buildtree used in VPATH builds