On Wed, Jan 21, 2015 at 1:08 AM, Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> On 25.12.2014 22:28, Tomas Vondra wrote:
>> On 25.12.2014 21:14, Andres Freund wrote:
>>
>>> That's indeed odd. Seems to have been lost when the statsfile was
>>> split into multiple files. Alvaro, Tomas?
>>
>> The goal was to keep the logic as close to the original as possible.
>> IIRC there were "pgstat wait timeout" issues before, and in most cases
>> the conclusion was that it's probably because of overloaded I/O.
>>
>> But maybe there actually was another bug, and it's entirely possible
>> that the split introduced a new one, and that's what we're seeing now.
>> The strange thing is that the split happened ~2 years ago, which is
>> inconsistent with the sudden increase of this kind of issues. So maybe
>> something changed on that particular animal (a failing SD card causing
>> I/O stalls, perhaps)?
>>
>> Anyway, I happen to have a spare Raspberry PI, so I'll try to reproduce
>> and analyze the issue locally. But that won't happen until January.
>
> I've tried to reproduce this on my Raspberry PI 'machine' and it's not
> very difficult to trigger this. About 7 out of 10 'make check' runs fail
> because of 'pgstat wait timeout'.
>
> All the occurences I've seen were right after some sort of VACUUM
> (sometimes plain, sometimes ANALYZE or FREEZE), and the I/O at the time
> looked something like this:
>
> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
> avgrq-sz avgqu-sz await r_await w_await svctm %util
> mmcblk0 0.00 75.00 0.00 8.00 0.00 36.00
> 9.00 5.73 15633.75 0.00 15633.75 125.00 100.00
>
> So pretty terrible (this is a Class 4 SD card, supposedly able to handle
> 4 MB/s). If hamster had faulty SD card, it might have been much worse, I
> guess.
By experience, a class 10 is at least necessary, with a minimum amount
of memory to minimize the apparition of those warnings, hamster having
now a 8GB class 10 card.
--
Michael