Re: Issue with retry_count in socket.c and fix for the same

Поиск
Список
Период
Сортировка
От Itnal, Prakash (NSN - IN/Bangalore)
Тема Re: Issue with retry_count in socket.c and fix for the same
Дата
Msg-id 6C8E163ED7EF5846B2F44CDBC63300112E7D56@SGSIMBX002.nsn-intra.net
обсуждение исходный текст
Ответ на Re: Issue with retry_count in socket.c and fix for the same  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-odbc
I agree, it looks so complex. I will try to reorganize the retry logic and make it simple. Once done I will share the
patch.

Meantime I am new to psql-odbc driver code, so not sure how to make this timeout as configurable parameter from
odbc.inifile? As per the keywords (http://psqlodbc.projects.pgfoundry.org/docs/config-opt.html), I don't find any
existingparameter which can be used. Looks like we need to add a new parameter something like "ConnectionTimeout"! 

--
Regards,
Prakash I.

-----Original Message-----
From: ext Heikki Linnakangas [mailto:hlinnakangas@vmware.com]
Sent: Thursday, May 15, 2014 4:55 PM
To: Itnal, Prakash (NSN - IN/Bangalore); pgsql-odbc@postgresql.org
Subject: Re: [ODBC] Issue with retry_count in socket.c and fix for the same

On 05/15/2014 11:10 AM, Itnal, Prakash (NSN - IN/Bangalore) wrote:
> Recently we encountered a case where clients connected to postgres server through psqlODBC driver got strucked
(hanged)when server unexpectedly shutdown (hard power off or LAN cable removed). The clients were hanged forever. I
analyzedthe code and found a potential issue with retry count logic in function SOCK_wait_for_ready() (in socket.c). 
>
> The no_timeout variable is initialized to TRUE and is not modified if retry_count is positive number. That means if
retry_countis passed then it will treat it as "no timeout". If I understood it correctly it should work other way. If
retry_countis positive number then socket should wait with timeout and not indefinitely. 
>
> I have changed the default initialization of no_timeout to FALSE. Attached is the patch. Kindly review the same and
letme know comments. 
>
> Copied the function below for quick reference: socket.c: SOCK_wait_for_ready()
>
> 528 static int SOCK_wait_for_ready(SocketClass *sock, BOOL output, int retry_count)
> 529 {
> 530     int ret, gerrno;
> 531 #ifdef  HAVE_POLL
> 532     struct pollfd   fds;
> 533 #else
> 534     fd_set  fds, except_fds;
> 535     struct  timeval tm;
> 536 #endif /* HAVE_POLL */
> 537     BOOL    no_timeout = TRUE;
> 538
> 539     if (0 == retry_count)
> 540         no_timeout = FALSE;
> 541     else if (0  > retry_count)
> 542         no_timeout = TRUE;
> 543 #ifdef  USE_SSL
> 544     else if (sock->ssl == NULL)
> 545         no_timeout = TRUE;
> 546 #endif /* USE_SSL */
> 547     do {
> 548 #ifdef  HAVE_POLL
> 549         fds.fd = sock->socket;
> 550         fds.events = output ? POLLOUT : POLLIN;
> 551         fds.revents = 0;
> 552         ret = poll(&fds, 1, no_timeout ? -1 : retry_count * 1000);
> 553 mylog("!!!  poll ret=%d revents=%x\n", ret, fds.revents);
> 554 #else
> 555         FD_ZERO(&fds);
> 556         FD_ZERO(&except_fds);
> 557         FD_SET(sock->socket, &fds);
> 558         FD_SET(sock->socket, &except_fds);
> 559         if (!no_timeout)
> 560         {
> 561             tm.tv_sec = retry_count;
> 562             tm.tv_usec = 0;
> 563         }
> 564         ret = select((int) sock->socket + 1, output ? NULL : &fds, output ? &fds : NULL, &except_fds, no_timeout
?NULL : &tm); 
> 565 #endif /* HAVE_POLL */
> 566         gerrno = SOCK_ERRNO;
> 567     } while (ret < 0 && EINTR == gerrno);
> 568     if (retry_count < 0)
> 569         retry_count *= -1;
> 570     if (0 == ret && retry_count > MAX_RETRY_COUNT)
> 571     {
> 572         ret = -1;
> 573         SOCK_set_error(sock, output ? SOCKET_WRITE_TIMEOUT : SOCKET_READ_TIMEOUT, "SOCK_wait_for_ready timeout");
> 574     }
> 575     return ret;
> 576 }

Ugh. Yes, it's clearly bogus as it is. no_timeout is only ever set to
FALSE, when retry_count is 0. And when retry_count is 0, the timeout is
also set to 0. So the whole timeout and retry_count thing is a very
complicated way of saying that when retry_count==0, just check if there
is any outstanding data to read / room for writing without blocking,
otherwise wait forever.

I'm not so sure it's supposed to behave like you described either,
though. For example, if we normally a timeout by default, why would we
not want a timeout when SSL is enabled? And the "0 > retry_count" check
is never going to be true (unless retry_count wraps around)

I wonder why we need any timeout or retry counters, ever. Frankly, I'm
inclined to just rip all that out. A timeout would be a useful feature
for many applications, to detect a broken connection more quickly, but
the way it's currently implemented is pretty much useless for that. For
starters, it would have to be configurable, because many applications
would not want to have a timeout at all. A built-in timeout of 60
seconds or so, which I think you'd get with your patch, would be too
short for OLAP kind of queries that can run for hours before returning
any results.

- Heikki


В списке pgsql-odbc по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Issue with retry_count in socket.c and fix for the same
Следующее
От: Barry Bell
Дата:
Сообщение: Re: psqlODBC text length with no records