Re: 503 Backend fetch failed errors

Поиск
Список
Период
Сортировка
От Magnus Hagander
Тема Re: 503 Backend fetch failed errors
Дата
Msg-id CABUevEw7FJNK7mcZL66r-wUV27wCfMQvuKBCqt+YRRJLqdm30Q@mail.gmail.com
обсуждение исходный текст
Ответ на Re: 503 Backend fetch failed errors  (Stefan Kaltenbrunner <stefan@kaltenbrunner.cc>)
Список buildfarm-members


On Thu, Nov 8, 2018 at 9:42 AM Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> wrote:
On 11/8/18 9:35 AM, Magnus Hagander wrote:
>
>
> On Thu, Nov 8, 2018 at 9:31 AM Stefan Kaltenbrunner
> <stefan@kaltenbrunner.cc> wrote:
>
>     On 11/8/18 9:19 AM, Magnus Hagander wrote:
>      >
>      >
>      > On Thu, Nov 8, 2018 at 9:01 AM Stefan Kaltenbrunner
>      > <stefan@kaltenbrunner.cc> wrote:
>      >
>      >     On 11/7/18 6:40 PM, Rémi Zara wrote:
>      >      > Hi,
>      >
>      >     Hi Rémi!
>      >
>      >      >
>      >      > I’m getting a lot of these errors with coypu (several per
>     day),
>      >     but not systematically.
>      >      > Is this a problem on my end, or is this on the sever end ?
>      >      >
>      >      > Query for: stage=OK&animal=coypu&ts=1541573277
>      >      > Target:
>      >
>     https://buildfarm.postgresql.org/cgi-bin/pgstatus.pl/53b137e7c765b781699bbe73e3aec7751a8c4ab7
>      >      > Status Line: 503 Backend fetch failed
>      >      > Web txn failed with status: 1
>      >      > Query for: stage=OK&animal=coypu&ts=1541575423
>      >      > Target:
>      >
>     https://buildfarm.postgresql.org/cgi-bin/pgstatus.pl/1
>      >      > Status Line: 503 Backend fetch failed
>      >      > Web txn failed with status: 1
>      >
>      >     given the error this is something that is created by the varnish
>      >     instance that is in front of the buildfarm. On a quick look I
>     could
>      >     immediately figure out what the problem is - but it looks
>     like you (or
>      >     somebody else) tried at least to click one of the links above
>     using
>      >     hist
>      >     desktop browser and got an error about a missing branch
>     specification ;)
>      >
>      >
>      >
>      > AFAICT:
>      >
>      > A quick look in the logs indicates that the buildfarm is responding:
>      > -   RespHeader     Status: 492 bad branch parameter
>      >
>      > However, 492 is not a valid http status code, so Varnish can't
>     handle it
>      > and thus returns 503 failure to the client.
>
>     I think that is not the actual error that Rémi is experiencing- the 492
>     case (which is indeed an invalid http error code) only happens when one
>     actually klicks the link in the mail above(which I guess some did and
>     you found in the logs) because the actual BF client will add a
>     parameter
>     to the "Target" URL.
>
>     The actual "errors" dont seem to show up in the lighttpd logs afaiks.
>
>
> Oh, sorry. I was checking the one called "target", I assumed that was
> the URL that failed.
>
> Assuming for the original ones the ts is part of the URL, none of that
> is still in the logs. Or are they post parameters? Do we know exactly
> which URL is actually failing, and when (exactly) this happened?

well - most of the parameters to each url are in the error report (f.e.
"Query for: stage=OK&animal=coypu&ts=1541575423") I dunno whether Rémi
knows which branch that was for? - that one also has a unix timestamp,
though I "think" that is the timestamp from when the build started on
the bf-client and not the ts when the request was made)

Afaiks the two requests are not at all in the lighly log so only varnish
might have seen them (though its unclear what error it got while
connecting to lighty)

They'll be hard to find in the Varnish log without actually having the URL. There is nothin gin the varnish log with 1541575423 in it at all. And there is nothing with "coypu" and a http 503 in it either. And the log goes back to Nov 6...

So my guess is it might be a POST which doesn't actually have the animal name or the timestamp on the URL.

I do see some general POSTs returning 503. They all seem to be of the type going to pgstatus.pl like the ones above, so maybe that is the URL after all? If I look at just POSTs there, I see a single one, and it has:

-   FetchError     Resource temporarily unavailable
-   FetchError     straight insufficient bytes

"Straight insufficient bytes" means there is a mismatch between Content-Length and the actual amount of data sent/read.

And on the backend side:

--  FetchError     req.body read error: 11 (Resource temporarily unavailable)

I believe this means that varnish is actually failing to read the request body from the *client*, in order to pass it on to the server. In that case, it could be that the client sends the wrong length. It does send a content-length header of 4160573 bytes -- perhaps it stops sending data before it gets there. Is that a "reasonable size" package being sent? It's quite a big POST.

The error occured 7.25 seconds after Varnish started talking to lighttpd. So it at least did something first. Perhaps if it actually is bigger than 4MB it hit some sort of limit and lighttpd killed the request?

--

В списке buildfarm-members по дате отправления:

Предыдущее
От: Stefan Kaltenbrunner
Дата:
Сообщение: Re: 503 Backend fetch failed errors
Следующее
От: Tom Lane
Дата:
Сообщение: Time to close down buildfarm support for REL9_3_STABLE branch