Re: Just-in-time Background Writer Patch+Test Results

Поиск
Список
Период
Сортировка
От Greg Smith
Тема Re: Just-in-time Background Writer Patch+Test Results
Дата
Msg-id Pine.GSO.4.64.0709061121020.14491@westnet.com
обсуждение исходный текст
Ответ на Re: Just-in-time Background Writer Patch+Test Results  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Ответы Re: Just-in-time Background Writer Patch+Test Results  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Список pgsql-hackers
On Thu, 6 Sep 2007, Kevin Grittner wrote:

> If you exposed the scan_whole_pool_seconds as a tunable GUC, that would
> allay all of my concerns about this patch.  Basically, our problems were
> resolved by getting all dirty buffers out to the OS cache within two
> seconds

Unfortunately it wouldn't make my concerns about your system go away or 
I'd have recommended exposing it specifically to address your situation. 
I have been staring carefully at your configuration recently, and I would 
wager that you could turn off the LRU writer altogether and still meet 
your requirements in 8.2.  Here's what you've got right now:

> shared_buffers = 160MB (=20000 buffers)
> bgwriter_lru_percent = 20.0
> bgwriter_lru_maxpages = 200
> bgwriter_all_percent = 10.0
> bgwriter_all_maxpages = 600

With the default delay of 200ms, this has the LRU-writer scanning the 
whole pool every 1 second, while the all-writer scans every two 
seconds--assuming they don't hit the write limits.  If some event were to 
dirty the whole pool in 200ms, it might take as much as 6.7 seconds to 
write everything out (20000 / 600 * 200 ms) via the all-scan.  The 
all-scan is already gone in 8.3.  Your LRU scan will take much longer than 
that to clear everything out.  At least (20000 / 200 * 200ms) 20 seconds 
to clear a fully dirty cache.

But in fact, it's impossible to even bound how long it will take before 
the LRU writer (which is the only part this new patch tries to improve) 
gets around to writing even a single dirty buffer no matter what 
bgwriter_lru_percent (8.2) or scan_whole_pool_seconds (JIT patch) is set 
to.

There's a second low-level issue involved here.  When a page becomes 
dirty, that implies it was also recently used, which means the LRU writer 
won't touch it.  That page can't be written out by the LRU writer until an 
entire pass has been made over the shared_buffer pool while looking for 
buffers to allocate for new activity.  When the allocation clock-sweep 
passes over the newly dirtied buffer again, its usage count will drop by 
one and it will no longer be considered recently used.  At that point the 
LRU writer can write it out.  So unless there is other allocation activity 
going on, the scan_whole_pool_seconds mechanism will never provide the 
bound on time to scan and write everything you hope it will.

And if there's other allocations going on, the much more powerful JIT 
mechanism will scan the whole pool plenty fast if you bump the already 
exposed multiplier tunable up.  In my tests where the buffer cache was 
filled with mostly dirty buffers that couldn't be re-used (something 
relatively easy to trigger with pgbench tests), I've actually watched the 
new code scan >90% of the buffer cache looking for those few reusable 
buffers in the pool in a single invocation.  This would be like setting 
bgwriter_lru_percent=90.0 in the old configuration, but it only gets that 
aggressive when the distribution of pages in the buffer cache demands it, 
and when it has reason to believe going that fast will be helpful.

The completely understandable line of thinking that led to your request 
here is one of my concerns with exposing scan_whole_pool_seconds as a 
tunable.  It may suggest to people that if they set the number very low, 
it will assure all dirty buffers will be scanned and written within that 
time bound.  That's certainly not the case; both the maxpages and the 
usage count information will actually drive the speed that mechanism plods 
through the buffer cache.  It really isn't useful for scanning fast.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Hannu Krosing
Дата:
Сообщение: Re: [FEATURE REQUEST] Streaming Onlinebackup (Maybe OFFTOPIC)
Следующее
От: Mark Mielke
Дата:
Сообщение: Re: Hash index todo list item