On Sun, Nov 14, 2010 at 2:07 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> The attached patch adds a new field to pg_stat_bgwriter, counting the number
> of times backends execute their own fsync calls. Normally, when a backend
> needs to fsync data, it passes a request to the background writer, which
> then absorbs the call into its own queue of work to do. However, under some
> types of heavy system load, the associated queue can fill. When this
> happens, backends are forced to do their own fsync call. This is
> potentially much worse than when they do a regular write.
>
> The really nasty situation is when the background writer is busy because
> it's executing a checkpoint. In that case, it's possible for the backend
> fsync calls to start competing with the ones the background writer is trying
> to get done,
Do you know where this competition is happening? Is it on the
platters, or is it in the hard drive write cache (I thought high-end
hardware had tagged writes to avoid that), or in the kernel?
...
>
> DEBUG: Absorbing 4096 fsync requests
> DEBUG: Absorbing 150 fsync requests
> DEBUG: Unable to forward fsync request, executing directly
> CONTEXT: writing block 158638 of relation base/16385/16398
>
> Here 4096 is the most entries the BGW will ever absorb at once, and all 90
> of the missed sync calls are logged so you can see what files they came
> from.
Looking in src/backend/postmaster/bgwriter.c line 1071:
* Note: we presently make no attempt to eliminate duplicate requests* in the requests[] queue. The bgwriter will have
toeliminate dups* internally anyway, so we may as well avoid holding the lock longer* than we have to here.
This makes sense if we just need to append to a queue. But once the
queue is full and we are about to do a backend fsync, might it make
sense to do a little more work to look for dups?
Cheers,
Jeff