Re: Possible fix for occasional failures on castoroides etc

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Possible fix for occasional failures on castoroides etc
Дата
Msg-id 31133.1399143568@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Possible fix for occasional failures on castoroides etc  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
I wrote:
> Unfortunately, it seems the Solaris implementors didn't read Stevens,
> because it looks to me like they *do* return ECONNREFUSED on accept queue
> overflow.  Still, it's hard to see how that would be the issue if we're
> still seeing this failure with only five clients.

Also, after further inspection of the source code, it appears to me that
the kernel's limit on accept queue length is hard-wired at 4096 in
Solaris.  So there's basically no way that we're hitting that limit in the
regression tests, and the MAX_CONNECTIONS configuration is irrelevant.

We seem to be left with the race condition theory.  In that connection,
this comment in /usr/src/uts/common/io/tl.c is interesting:
*    The T_CONN_CON is generated when processing the T_CONN_REQ i.e. before*    a T_CONN_RES is received from the
acceptor.This means that a socket*    connect will complete before the peer has called accept.
 

I'm not sure that explains anything of value, but it's probably unlike any
other implementation, which makes it perhaps relevant.  It implies that
this is totally unrelated to any server-side behavior; so if it's possible
for us to work around it at all, we'd have to do so client-side.
        regards, tom lane



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: pgindent run
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Possible fix for occasional failures on castoroides etc