Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)

Поиск
Список
Период
Сортировка
От Greg Smith
Тема Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)
Дата
Msg-id 500634C8.8030302@2ndQuadrant.com
обсуждение исходный текст
Ответ на Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)  (Craig Ringer <ringerc@ringerc.id.au>)
Re: Checkpointer split has broken things dramatically (was Re: DELETE vs TRUNCATE explanation)  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 07/17/2012 06:56 PM, Tom Lane wrote:
> So I went to fix this in the obvious way (attached), but while testing
> it I found that the number of buffers_backend events reported during
> a regression test run barely changed; which surprised the heck out of
> me, so I dug deeper.  The cause turns out to be extremely scary:
> ForwardFsyncRequest isn't getting called at all in the bgwriter process,
> because the bgwriter process has a pendingOpsTable.

When I did my testing early this year to look at checkpointer 
performance (among other 9.2 write changes like group commit), I did see 
some cases where buffers_backend was dramatically different on 9.2 vs. 
9.1  There were plenty of cases where the totals across a 10 minute 
pgbench were almost identical though, so this issue didn't stick out 
then.  That's a very different workload than the regression tests though.

> This implies that nobody has done pull-the-plug testing on either HEAD
> or 9.2 since the checkpointer split went in (2011-11-01), because even
> a modicum of such testing would surely have shown that we're failing to
> fsync a significant fraction of our write traffic.

Ugh.  Most of my pull the plug testing the last six months has been 
focused on SSD tests with older versions.  I want to duplicate this (and 
any potential fix) now that you've highlighted it.

> Furthermore, I would say that any performance testing done since then,
> if it wasn't looking at purely read-only scenarios, isn't worth the
> electrons it's written on.  In particular, any performance gain that
> anybody might have attributed to the checkpointer splitup is very
> probably hogwash.

There hasn't been any performance testing that suggested the 
checkpointer splitup was justified.  The stuff I did showed it being 
flat out negative for a subset of pgbench oriented cases, which didn't 
seem real-world enough to disprove it as the right thing to do though.

I thought there were two valid justifications for the checkpointer split 
(which is not a feature I have any corporate attachment to--I'm as 
isolated from how it was developed as you are).  The first is that it 
seems like the right architecture to allow reworking checkpoints and 
background writes for future write path optimization.  A good chunk of 
the time when I've tried to improve one of those (like my spread sync 
stuff from last year), the code was complicated by the background writer 
needing to follow the drum of checkpoint timing, and vice-versa.  Being 
able to hack on those independently got a sign of relief from me.  And 
while this adds some code duplication in things like the process setup, 
I thought the result would be cleaner for people reading the code to 
follow too.  This problem is terrible, but I think part of how it crept 
in is that the single checkpoint+background writer process was doing way 
too many things to even follow all of them some days.

The second justification for the split was that it seems easier to get a 
low power result from, which I believe was the angle Peter Geoghegan was 
working when this popped up originally.  The checkpointer has to run 
sometimes, but only at a 50% duty cycle as it's tuned out of the box.  
It seems nice to be able to approach that in a way that's power 
efficient without coupling it to whatever heartbeat the BGW is running 
at.  I could even see people changing the frequencies for each 
independently depending on expected system load.  Tune for lower power 
when you don't expect many users, that sort of thing.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Greg Smith
Дата:
Сообщение: Re: [PERFORM] DELETE vs TRUNCATE explanation
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: Using pg_upgrade on log-shipping standby servers