Re: Expansion of our checks for connection-loss errors

Поиск
Список
Период
Сортировка
От Fujii Masao
Тема Re: Expansion of our checks for connection-loss errors
Дата
Msg-id 5f5b7511-10ab-d272-a4b3-6cf9abd551c2@oss.nttdata.com
обсуждение исходный текст
Ответ на Expansion of our checks for connection-loss errors  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Expansion of our checks for connection-loss errors  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers

On 2020/10/09 4:15, Tom Lane wrote:
> Over in the thread at [1], we've tentatively determined that the
> reason buildfarm member lorikeet is currently failing is that its
> network stack returns ECONNABORTED for (some?) connection failures,
> whereas our code is only expecting ECONNRESET.  Fujii Masao therefore
> proposes that we treat ECONNABORTED the same as ECONNRESET.  I think
> this is a good idea, but after a bit of research I feel it does not
> go far enough.  I find these POSIX-standard errnos that also seem
> likely candidates to be returned for a hard loss of connection:
> 
>     ECONNABORTED
>     EHOSTUNREACH
>     ENETDOWN
>     ENETUNREACH
> 
> All of these have been in POSIX since SUSv2, so it seems unlikely
> that we need to #ifdef any of them.  (It is in any case pretty silly
> that we have #ifdefs around a very small minority of our references
> to ECONNRESET :-(.)
> 
> There are some other related errnos, such as ECONNREFUSED, that
> don't seem like they'd be returned for a failure of a pre-existing
> connection, so we don't need to include them in such tests.
> 
> Accordingly, I propose the attached patch (an expansion of
> Fujii-san's) that causes us to test for all five errnos anyplace
> we had been checking for ECONNRESET.

+1

Thanks for expanding the patch!

-#ifdef ECONNRESET
-            case ECONNRESET:
+            case ALL_CONNECTION_LOSS_ERRNOS:
                  printfPQExpBuffer(&conn->errorMessage,
                                    libpq_gettext("server closed the connection unexpectedly\n"
                                                  "\tThis probably means the server terminated abnormally\n"
                                                  "\tbefore or while processing the request.\n"));

This change causes the same error message to be reported for those five errno.
That is, we cannot identify which errno is actually reported, from the error
message. But I just wonder if it's more helpful for the troubleshooting if we,
for example, append strerror() into the message so that we can easily
identify errno. Thought?


> I felt that this was getting to
> the point where we'd better centralize the knowledge of what to check,
> so the patch does that, via an inline function and an admittedly hacky
> macro.  I also upgraded some places such as strerror.c to have full
> support for these symbols.
> 
> All of the machines I have (even as far back as HPUX 10.20) also
> define ENETRESET and EHOSTDOWN.  However, those symbols do not appear
> in SUSv2.  ENETRESET was added at some later point, but EHOSTDOWN is
> still not in POSIX.  For the moment I've left these second-tier
> symbols out of the patch, but there's a case for adding them.  I'm
> not sure whether there'd be any point in trying to #ifdef them.
> 
> BTW, I took out the conditional defines of some of these errnos in
> libpq's win32.h; AFAICS that's been dead code ever since we added
> #define's for them to win32_port.h.  Am I missing something?
> 
> This seems like a bug fix to me, so I'm inclined to back-patch.

+1

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Assertion failure with LEFT JOINs among >500 relations
Следующее
От: Michael Banck
Дата:
Сообщение: Re: Two fsync related performance issues?