Re: buildfarm animals and 'snapshot too old'

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема Re: buildfarm animals and 'snapshot too old'
Дата
Msg-id 53752A06.6030700@dunslane.net
обсуждение исходный текст
Ответ на Re: buildfarm animals and 'snapshot too old'  (Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>)
Ответы Re: buildfarm animals and 'snapshot too old'
Список pgsql-hackers
On 05/15/2014 04:30 PM, Stefan Kaltenbrunner wrote:
> On 05/15/2014 07:46 PM, Andrew Dunstan wrote:
>> On 05/15/2014 12:43 PM, Tomas Vondra wrote:
>>> Hi all,
>>>
>>> today I got a few of errors like these (this one is from last week,
>>> though):
>>>
>>>      Status Line: 493 snapshot too old: Wed May  7 04:36:57 2014 GMT
>>>      Content:
>>>      snapshot to old: Wed May  7 04:36:57 2014 GMT
>>>
>>> on the new buildfarm animals. I believe it was my mistake (incorrectly
>>> configured local git mirror), but it got me thinking about how this will
>>> behave with the animals running CLOBBER_CACHE_RECURSIVELY.
>>>
>>> If I understand the Perl code correctly, it does this:
>>>
>>> (1) update the repository
>>> (2) run the tests
>>> (3) check that the snapshot is not older than 24 hours (pgstatus.pl:188)
>>> (4) fail if older
>>>
>>> Now, imagine that the test runs for days/weeks. This pretty much means
>>> it's wasted, because the results will be thrown away anyway, no?
>>>
>>
>> The 24 hours runs from the time of the latest commit on the branch in
>> question, not the current time, but basically yes.
>>
>> We've never had machines with runs that long. The longest in recent
>> times has been friarbird, which runs CLOBBER_CACHE_ALWAYS and takes
>> around 4.5 hours. But we have had misconfigured machines reporting
>> unbelievable snapshot times.  I'll take a look and see if we can tighten
>> up the sanity check. It's worth noting that one thing friarbird does is
>> skip the install-check stage - it's almost certainly not going to have
>> terribly much interesting to tell us from that, given it has already run
>> a plain "make check".
> well I'm not sure about about "misconfigured" but both my personal
> buildfarm members and pginfra run ones (like gaibasaurus) got errors
> complaining about "snapshot too old" in the past for long running tests
> so I'm not sure it is really a "we never had machine with runs that
> long". So maybe we should not reject those submissions at submission
> time but rather mark them clearly on the dashboard and leave the final
> interpretation to a human...
>
>


That's a LOT harder and more work to arrange. Frankly, there are more 
important things to do.

I would like to know the circumstances of these very long runs. I drive 
some of my VMs pretty hard on pretty modest hardware, and they don't 
come close to running 24 hours.

The current behaviour goes back to this commit from December 2011:
   commit a8b5049e64f9cb08f8e165d0737139dab74e3bce   Author: Andrew Dunstan <andrew@dunslane.net>   Date:   Wed Dec 14
14:38:442011 -0800
 
        Use git snapshot instead of fixed 10 day timeout.
        The sanity checks made sure that an animal wasn't submitting a        snapshot that was too old. But sometimes
anold branch doesn't        get any changes for more than 10 days. So accept a snapshot that        is not more than 1
dayolder than the last known snapshot. Per        complaint from Stefan.
 


I'm prepared to increase the sanity check time if there is a serious 
demand for it, but I'd like to know what to increase it to.

cheers

andrew






В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stefan Kaltenbrunner
Дата:
Сообщение: Re: buildfarm animals and 'snapshot too old'
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Logical replication woes