Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram

Поиск
Список
Период
Сортировка
От Magnus Hagander
Тема Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram
Дата
Msg-id 9837222c0908030730kd59b9c2v7583666c5716aead@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram  (Nikhil Sontakke <nikhil.sontakke@enterprisedb.com>)
Ответы Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram  (Nikhil Sontakke <nikhil.sontakke@enterprisedb.com>)
Список pgsql-bugs
On Mon, Aug 3, 2009 at 16:20, Nikhil
Sontakke<nikhil.sontakke@enterprisedb.com> wrote:
> Hi,
>
>>>>>
>>>>> And this fact should lend credence to Alvaro's (as well as mine)
>>>>> suspicions that it seems to be a Windows kernel issue.
>>>>>
>>>>> As a consequence, Magnus I was wondering if having a loop similar to
>>>>> the WRITE handling of waiting for a fixed timeout in a loop (rather
>>>>> than an INFINITE call to WaitForMultipleObjectsEx) inside the
>>>>> pgwin32_waitforsinglesocket() function will help for the READ case
>>>>> too? I believe Teogor Sigaev had raised a similar concern a while back
>>>>> about it:
>>>>>
>>>>> http://www.nabble.com/-GENERAL--Stats-collector-frozen--td8569977i20.html
>>>>
>>>> Maybe. I'm unsure if it's enough to just try another
>>>> WaitForSingleObjectEx() on it, or if we need to actually issue a
>>>> WSARecv() on it as well. Maybe it would be enough to just change the
>>>> INIFINTE on line 318 of socket.c to a fixed value. That will loop down
>>>> to WSARecv() which should exit with WSAEWOULDBLOCK which will cause us
>>>> to do a short sleep and come back. But we'd have to change the limit
>>>> of 5 somehow then, since in theory we should wait forever. Maybe that
>>>> outer loop should just be a for(;;), what do you think?
>>>>
>>>
>>> Yes, line 318 seems to be a much better location to me. If Windows and
>>> it's socket logic behaves properly most of the times :), most of the
>>> calls should come out within the first few tries, so changing 5 to an
>>> infinite loop shouldn't hurt those normal use cases in theory.
>>>
>>> OTOH, I was wondering what if we kill the stats collector and on a
>>> restart the socket communication resumes properly. Would that
>>> conclusively mean that it is a flaw in our code?
>>
>> No, if we kill the stats collector that will destroy all sockets, and
>> when the new one starts all the sockets it operates on are fresh and
>> new. So it doesn't show that the flaw is in our code - but it also
>> doesn't show that it's in the kernel or runtime libraries.
>>
>
> AFAICS in the code, the inherited pgStatSock socket FD remains the
> same across the restart of the stats collector process...

Partially correct, I think.

Each backend has it's own handle on win32, since we use EXEC_BACKEND
(this includes the "utility processes" like the stats collector). When
we start the new one, we are going to use DuplicateHandle() in
save_backend_variables(). This will therefor get it a new handle, but
they are both pointing to the same kernel object. I don't know if
WaitForMultipleObjectsEx() is going to see these as two different
objects or not, but I think it does.


--
 Magnus Hagander
 Self: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Nikhil Sontakke
Дата:
Сообщение: Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram
Следующее
От: Nikhil Sontakke
Дата:
Сообщение: Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram