Re: Tracing down buildfarm "postmaster does not shut down" failures

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Tracing down buildfarm "postmaster does not shut down" failures
Дата
Msg-id 20160210160635.vm26bxzxqpogxbbs@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Tracing down buildfarm "postmaster does not shut down" failures  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 2016-02-09 22:27:07 -0500, Tom Lane wrote:
> The idea I was toying with is that previous filesystem activity (making
> the temp install, the server's never-fsync'd writes, etc) has built up a
> bunch of dirty kernel buffers, and at some point the kernel goes nuts
> writing all that data.  So the issues we're seeing would come and go
> depending on the timing of that I/O spike.  I'm not sure how to prove
> such a theory from here.

It'd be interesting to monitor
$ grep -E '^(Dirty|Writeback):' /proc/meminfo
output. At least on linux. It's terribly easy to get the kernel into a
state where it has so much data needing to be written back that an
immediate checkpoint takes pretty much forever.

If I understand the code correctly, once a buffer has been placed into
'writeback', it'll be more-or-less processed in order. That can e.g. be
because these buffers have been written to more than 30s ago. If there
then are buffers later that also need to be written back (e.g. due to an
fsync()), you'll often wait for the earlier ones.

Andres



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Steele
Дата:
Сообщение: Re: Updated backup APIs for non-exclusive backups
Следующее
От: David Steele
Дата:
Сообщение: Re: Updated backup APIs for non-exclusive backups