On 02/09/2016 10:27 PM, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
>> On Tue, Feb 09, 2016 at 10:02:17PM -0500, Tom Lane wrote:
>>> I wonder if it's worth sticking some instrumentation into stats
>>> collector shutdown?
>> I wouldn't be surprised if the collector got backlogged during the main phase
>> of testing and took awhile to chew through its message queue before even
>> starting the write of the final stats.
> But why would the ecpg tests show such an effect when the main regression
> tests don't? AFAIK the ecpg tests don't exactly stress the server ---
> note the trivial amount of data written by the shutdown checkpoint,
> for instance.
The main regression tests run with the stats file on the ramdisk.
>
> The other weird thing is that it's only sometimes slow. If you look at
> the last buildfarm result from axolotl, for instance, the tail end of
> the ecpg log is
>
> LOG: ShutdownSUBTRANS() complete at 2016-02-09 16:31:14.784 EST
> LOG: database system is shut down at 2016-02-09 16:31:14.784 EST
> LOG: lock files all released at 2016-02-09 16:31:14.817 EST
>
> so we only spent ~50ms on stats write that time.
That part is puzzling.
> The idea I was toying with is that previous filesystem activity (making
> the temp install, the server's never-fsync'd writes, etc) has built up a
> bunch of dirty kernel buffers, and at some point the kernel goes nuts
> writing all that data. So the issues we're seeing would come and go
> depending on the timing of that I/O spike. I'm not sure how to prove
> such a theory from here.
Yeah. It's faintly possible that a kernel upgrade will help.
cheers
andrew