Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram

Поиск
Список
Период
Сортировка
От Magnus Hagander
Тема Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram
Дата
Msg-id 9837222c0908030656m6ed69a26gc90a6c94b41a4c47@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram  (Nikhil Sontakke <nikhil.sontakke@enterprisedb.com>)
Ответы Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram  (Nikhil Sontakke <nikhil.sontakke@enterprisedb.com>)
Список pgsql-bugs
On Mon, Aug 3, 2009 at 15:47, Nikhil
Sontakke<nikhil.sontakke@enterprisedb.com> wrote:
> Hi,
>
>>>
>>>
>>>> ntdll.dll!NtWaitForMultipleObjects+0xc
>>>> kernel32.dll!WaitForMultipleObjectsEx+0x11a
>>>> postgres.exe!pgwin32_waitforsinglesocket+0x1ed
>>>> postgres.exe!pgwin32_recv+0x90
>>>> postgres.exe!PgstatCollectorMain+0x17f
>>>> postgres.exe!SubPostmasterMain+0x33a
>>>> postgres.exe!main+0x168
>>>> postgres.exe!__tmainCRTStartup+0x10f
>>>> kernel32.dll!BaseProcessStart+0x23
>>>
>>> I have seen this problem too. =A0The process seems stuck for no good
>>> reason. =A0I wondered at the time if it could be a kernel issue. =A0I
>>> remember trying to send some data to the collector to verify whether
>>> it'd wake up, but no luck. =A0(I mean I couldn't find a way to do it on
>>> Windows).
>>
>> I have seen this as well, but only in cases where there has been
>> broken firewall software or such things involved. I have seen a couple
>> of reports from the field though.
>>
>> Anyway, this really is a should-never-happen thing. As soon as a new
>> packet is sent in, WaitForMultipleObjectsEx() should return right
>> away. And given that backends regularly send packets over, it
>> shouldn't be an issue even if we miss one...
>>
>
> And this fact should lend credence to Alvaro's (as well as mine)
> suspicions that it seems to be a Windows kernel issue.
>
> As a consequence, Magnus I was wondering if having a loop similar to
> the WRITE handling of waiting for a fixed timeout in a loop (rather
> than an INFINITE call to WaitForMultipleObjectsEx) inside the
> pgwin32_waitforsinglesocket() function will help for the READ case
> too? I believe Teogor Sigaev had raised a similar concern a while back
> about it:
>
> http://www.nabble.com/-GENERAL--Stats-collector-frozen--td8569977i20.html

Maybe. I'm unsure if it's enough to just try another
WaitForSingleObjectEx() on it, or if we need to actually issue a
WSARecv() on it as well. Maybe it would be enough to just change the
INIFINTE on line 318 of socket.c to a fixed value. That will loop down
to WSARecv() which should exit with WSAEWOULDBLOCK which will cause us
to do a short sleep and come back. But we'd have to change the limit
of 5 somehow then, since in theory we should wait forever. Maybe that
outer loop should just be a for(;;), what do you think?

=46rom what I understand, none of you have an environment where you can
reliably reproduce this? That means it's going to be a PITA to try to
figure out if we're actually fixing anything :S


--=20
 Magnus Hagander
 Self: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

В списке pgsql-bugs по дате отправления:

Предыдущее
От: ""
Дата:
Сообщение: BUG #4961: pg_standby.exe crashes with no args
Следующее
От: Nikhil Sontakke
Дата:
Сообщение: Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram