Re: Better way of dealing with pgstat wait timeout during buildfarm runs?

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Better way of dealing with pgstat wait timeout during buildfarm runs?
Дата
Msg-id 54BE7D70.7050606@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Better way of dealing with pgstat wait timeout during buildfarm runs?  (Tomas Vondra <tv@fuzzy.cz>)
Ответы Re: Better way of dealing with pgstat wait timeout during buildfarm runs?  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-hackers
On 25.12.2014 22:28, Tomas Vondra wrote:
> On 25.12.2014 21:14, Andres Freund wrote:
>
>> That's indeed odd. Seems to have been lost when the statsfile was
>> split into multiple files. Alvaro, Tomas?
> 
> The goal was to keep the logic as close to the original as possible.
> IIRC there were "pgstat wait timeout" issues before, and in most cases
> the conclusion was that it's probably because of overloaded I/O.
> 
> But maybe there actually was another bug, and it's entirely possible
> that the split introduced a new one, and that's what we're seeing now.
> The strange thing is that the split happened ~2 years ago, which is
> inconsistent with the sudden increase of this kind of issues. So maybe
> something changed on that particular animal (a failing SD card causing
> I/O stalls, perhaps)?
> 
> Anyway, I happen to have a spare Raspberry PI, so I'll try to reproduce
> and analyze the issue locally. But that won't happen until January.

I've tried to reproduce this on my Raspberry PI 'machine' and it's not
very difficult to trigger this. About 7 out of 10 'make check' runs fail
because of 'pgstat wait timeout'.

All the occurences I've seen were right after some sort of VACUUM
(sometimes plain, sometimes ANALYZE or FREEZE), and the I/O at the time
looked something like this:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
mmcblk0           0.00    75.00    0.00    8.00     0.00    36.00
9.00     5.73 15633.75    0.00 15633.75 125.00 100.00

So pretty terrible (this is a Class 4 SD card, supposedly able to handle
4 MB/s). If hamster had faulty SD card, it might have been much worse, I
guess.

This of course does not prove the absence of a bug - I plan to dig into
this a bit more. Feel free to point out some suspicious scenarios that
might be worth reproducing and analyzing.

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: B-Tree support function number 3 (strxfrm() optimization)
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Merging postgresql.conf and postgresql.auto.conf