ITAGAKI Takahiro <itagaki.takahiro@lab.ntt.co.jp> writes:
> I encountered overflow of bgwriter's file-fsync request queue. It occurred
> during checkpoints. Each backend would call fsync disorderly in such cases,
> so that the checkpoint takes a long time and the performance has decreased.
> It seems to happen frequently on the machines with a lot of memories and
> poor disks.
I can't help thinking that this is a situation that could only be got
into with a seriously misconfigured database --- per the comments for
ForwardFsyncRequest, we really don't want this code to run at all,
let alone run so often that a queue with NBuffers entries overflows.
What exactly are the test conditions under which you're seeing this
happen?
If there actually is a problem that needs to be solved, I think it'd be
better to try to do AbsorbFsyncRequests somewhere in the main checkpoint
loops. I don't like the idea of holding the BgWriterCommLock long
enough to do a qsort ... especially not if this occurs only with very
large NBuffers settings. Also, what if the qsort fails to eliminate any
duplicates, or eliminates only a few? You could get into a scenario
where the qsort gets repeated every few ForwardFsyncRequest calls, in
which case it'd become a drag on performance itself. (See also recent
discussion with Qingqing about converting BgWriterCommLock to a
spinlock. Though I was against that because no performance problem had
been shown, it could still become something we want to do ... but
putting a qsort here would foreclose that option.)
regards, tom lane