Re: test_shm_mq failing on anole (was: Sending out a request for more buildfarm animals?)

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: test_shm_mq failing on anole (was: Sending out a request for more buildfarm animals?)
Дата
Msg-id 20140929193733.GB14400@awork2.anarazel.de
обсуждение исходный текст
Ответ на Re: test_shm_mq failing on anole (was: Sending out a request for more buildfarm animals?)  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: test_shm_mq failing on anole (was: Sending out a request for more buildfarm animals?)  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 2014-09-29 15:24:55 -0400, Robert Haas wrote:
> On Mon, Sep 29, 2014 at 2:52 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> > If that theory is true, wouldn't things get unstuck everytime a new
> > connection comes in? Or 60 seconds have passed? That's not to say this
> > isn't wrong, but still?
> 
> There aren't any going to be any new connections arriving when running
> the contrib regression tests, I believe, so I don't think there is an
> escape hatch there.

I thought you might have tested to connect... And I'd guessed you'd have
reported if that had fixed it.

> I didn't think to check how timeout was set in
> ServerLoop, and it does look like the maximum ought to be 60 seconds,
> so either there's some other ingredient I'm missing here, or the whole
> theory is just wrong altogether.  :-(

Yea :(. Note how signals are blocked in all the signal handlers and only
unblocked for a very short time (the sleep).

(stare at random shit for far too long)

Ah. DetermineSleepTime(), which is called while signals are unblocked!,
modifies BackgroundWorkerList. Previously that only iterated the list,
without modifying it. That's already of quite debatable safety, but
modifying it without having blocked signals is most definitely
broken. The modification was introduced by 7f7485a0c...

If you can manually run stuff on that machine, it'd be rather helpful if
you could put a PG_SETMASK(&BlockSig);...PG_SETMASK(&UnBlockSig); in the
HaveCrashedWorker() loop.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: json (b) and null fields
Следующее
От: Andres Freund
Дата:
Сообщение: Re: test_shm_mq failing on anole (was: Sending out a request for more buildfarm animals?)