On Tue, Jul 17, 2012 at 6:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So I went to fix this in the obvious way (attached), but while testing
> it I found that the number of buffers_backend events reported during
> a regression test run barely changed; which surprised the heck out of
> me, so I dug deeper. The cause turns out to be extremely scary:
> ForwardFsyncRequest isn't getting called at all in the bgwriter process,
> because the bgwriter process has a pendingOpsTable. So it just queues
> its fsync requests locally, and then never acts on them, since it never
> runs any checkpoints anymore.
:-(
> This implies that nobody has done pull-the-plug testing on either HEAD
> or 9.2 since the checkpointer split went in (2011-11-01), because even
> a modicum of such testing would surely have shown that we're failing to
> fsync a significant fraction of our write traffic.
>
> Furthermore, I would say that any performance testing done since then,
> if it wasn't looking at purely read-only scenarios, isn't worth the
> electrons it's written on. In particular, any performance gain that
> anybody might have attributed to the checkpointer splitup is very
> probably hogwash.
I don't think anybody thought that was going to result in a direct
performance gain, but I agree the performance testing needs to be
redone. I suspect that the impact on my testing is limited, because I
do mostly pgbench testing, and the lost fsync requests were probably
duplicated by non-lost fsync requests from backend writes. But I
agree that it needs to be redone once this is fixed.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company