Re: Tracing down buildfarm "postmaster does not shut down" failures

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема Re: Tracing down buildfarm "postmaster does not shut down" failures
Дата
Msg-id 56BABAD4.7040006@dunslane.net
обсуждение исходный текст
Ответ на Re: Tracing down buildfarm "postmaster does not shut down" failures  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Tracing down buildfarm "postmaster does not shut down" failures  (Andrew Dunstan <andrew@dunslane.net>)
Список pgsql-hackers

On 02/09/2016 10:27 PM, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
>> On Tue, Feb 09, 2016 at 10:02:17PM -0500, Tom Lane wrote:
>>> I wonder if it's worth sticking some instrumentation into stats
>>> collector shutdown?
>> I wouldn't be surprised if the collector got backlogged during the main phase
>> of testing and took awhile to chew through its message queue before even
>> starting the write of the final stats.
> But why would the ecpg tests show such an effect when the main regression
> tests don't?  AFAIK the ecpg tests don't exactly stress the server ---
> note the trivial amount of data written by the shutdown checkpoint,
> for instance.


The main regression tests run with the stats file on the ramdisk.


>
> The other weird thing is that it's only sometimes slow.  If you look at
> the last buildfarm result from axolotl, for instance, the tail end of
> the ecpg log is
>
> LOG:  ShutdownSUBTRANS() complete at 2016-02-09 16:31:14.784 EST
> LOG:  database system is shut down at 2016-02-09 16:31:14.784 EST
> LOG:  lock files all released at 2016-02-09 16:31:14.817 EST
>
> so we only spent ~50ms on stats write that time.


That part is puzzling.

> The idea I was toying with is that previous filesystem activity (making
> the temp install, the server's never-fsync'd writes, etc) has built up a
> bunch of dirty kernel buffers, and at some point the kernel goes nuts
> writing all that data.  So the issues we're seeing would come and go
> depending on the timing of that I/O spike.  I'm not sure how to prove
> such a theory from here.


Yeah. It's faintly possible that a kernel upgrade will  help.

cheers

andrew





В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Speed up Clog Access by increasing CLOG buffers
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: GinPageIs* don't actually return a boolean