Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: Better way of dealing with pgstat wait timeout during buildfarm runs?
Дата
Msg-id CAB7nPqRyOVYiLkrQkjrB1ozjoNaPBK_-ApV_8L+vnAKQHju=-g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Better way of dealing with pgstat wait timeout during buildfarm runs?  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: Better way of dealing with pgstat wait timeout during buildfarm runs?  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
On Wed, Jan 21, 2015 at 1:08 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> On 25.12.2014 22:28, Tomas Vondra wrote:
>> On 25.12.2014 21:14, Andres Freund wrote:
>>
>>> That's indeed odd. Seems to have been lost when the statsfile was
>>> split into multiple files. Alvaro, Tomas?
>>
>> The goal was to keep the logic as close to the original as possible.
>> IIRC there were "pgstat wait timeout" issues before, and in most cases
>> the conclusion was that it's probably because of overloaded I/O.
>>
>> But maybe there actually was another bug, and it's entirely possible
>> that the split introduced a new one, and that's what we're seeing now.
>> The strange thing is that the split happened ~2 years ago, which is
>> inconsistent with the sudden increase of this kind of issues. So maybe
>> something changed on that particular animal (a failing SD card causing
>> I/O stalls, perhaps)?
>>
>> Anyway, I happen to have a spare Raspberry PI, so I'll try to reproduce
>> and analyze the issue locally. But that won't happen until January.
>
> I've tried to reproduce this on my Raspberry PI 'machine' and it's not
> very difficult to trigger this. About 7 out of 10 'make check' runs fail
> because of 'pgstat wait timeout'.
>
> All the occurences I've seen were right after some sort of VACUUM
> (sometimes plain, sometimes ANALYZE or FREEZE), and the I/O at the time
> looked something like this:
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> mmcblk0           0.00    75.00    0.00    8.00     0.00    36.00
> 9.00     5.73 15633.75    0.00 15633.75 125.00 100.00
>
> So pretty terrible (this is a Class 4 SD card, supposedly able to handle
> 4 MB/s). If hamster had faulty SD card, it might have been much worse, I
> guess.
By experience, a class 10 is at least necessary, with a minimum amount
of memory to minimize the apparition of those warnings, hamster having
now a 8GB class 10 card.
-- 
Michael



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: B-Tree support function number 3 (strxfrm() optimization)
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: Add min and max execute statement time in pg_stat_statement