Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram
Дата
Msg-id CA+TgmoZheGK5AvR8Nw0WPwwxzvfpzFENdCbP_2ennJSBnraEnA@mail.gmail.com
обсуждение исходный текст
Ответ на BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram  ("Luke Koops" <luke.koops@entrust.com>)
Ответы Re: BUG #4958: Stats collector hung on WaitForMultipleObjectsEx while attempting to recv a datagram
Список pgsql-bugs
On Fri, Jul 31, 2009 at 10:59 AM, Luke Koops <luke.koops@entrust.com> wrote:
> -- postgres.exe!mainCRTStartup --
> ntoskrnl.exe!KiSwapContext+0x26
> ntoskrnl.exe!KiSwapThread+0x2e5
> ntoskrnl.exe!KeWaitForSingleObject+0x346
> ntoskrnl.exe!KiSuspendThread+0x18
> ntoskrnl.exe!KiDeliverApc+0x117
> ntoskrnl.exe!KiSwapThread+0x300
> ntoskrnl.exe!KeWaitForMultipleObjects+0x3d7
> ntoskrnl.exe!ObpWaitForMultipleObjects+0x202
> ntoskrnl.exe!NtWaitForMultipleObjects+0xe9
> ntoskrnl.exe!KiFastCallEntry+0xfc
> ntdll.dll!KiFastSystemCallRet
> ntdll.dll!NtWaitForMultipleObjects+0xc
> kernel32.dll!WaitForMultipleObjectsEx+0x11a
> postgres.exe!pgwin32_waitforsinglesocket+0x1ed
> postgres.exe!pgwin32_recv+0x90
> postgres.exe!PgstatCollectorMain+0x17f
> postgres.exe!SubPostmasterMain+0x33a
> postgres.exe!main+0x168
> postgres.exe!__tmainCRTStartup+0x10f
> kernel32.dll!BaseProcessStart+0x23

We just had a customer hit a very similar problem on 9.1.3, running on
Windows Server 2008 SP2.  They were able to extract the following
stack trace:

ntoskrnl.exe!KiSwapContext+0x7a
ntoskrnl.exe!KiCommitThreadWait+0x1d2
ntoskrnl.exe!KeWaitForMultipleObjects+0x271
ntoskrnl.exe!ObpWaitForMultipleObjects+0x294
ntoskrnl.exe!NtWaitForMultipleObjects+0xe5
ntoskrnl.exe!KiSystemServiceCopyEnd+0x13
ntdll.dll!ZwWaitForMultipleObjects+0xa
KERNELBASE.dll!WaitForMultipleObjectsEx+0xe8
kernel32.dll!WaitForMultipleObjectsExImplementation+0xb3
postgres.exe!pgwin32_waitforsinglesocket+0x26d
postgres.exe!pgwin32_recv+0xf0
postgres.exe!PgstatCollectorMain+0x1cc
postgres.exe!SubPostmasterMain+0x4c2
postgres.exe!main+0x1d0
postgres.exe!__tmainCRTStartup+0x11a
kernel32.dll!BaseThreadInitThunk+0xd
ntdll.dll!RtlUserThreadStart+0x1d

The customer finds that they can reproduce this on a variety of
systems under heavy load.  However, removing the load doesn't fix the
problem; the system continues to spew pgstat wait timeout messages
into the logs.  Autovacuum fails to DTRT due to lack of current stats
and things go downhill rapidly from there.  Terminating the stats
collector process resolves the issue; the postmaster starts a new one
within 60 seconds and after that the pgstat wait timeout messages
cease and vacuuming consequently resumes.

Now, it looks to me like for this stack trace to happen,
PgstatCollectorMain() has got to call pgwin32_waitforsinglesocket (at
line 3002), and that function has to return true, so that got_data
gets set to true.  Then PgstatCollectorMain() will call recv(), which
on Windows will really be pgwin32_recv, which will call
pgwin32_waitforsinglesocket, which must now hang.  The fact that the
first pgwin32_waitforsinglesocket call returned true should mean that
the stats collector socket is ready for read, while the fact that the
second one did not return seems to imply that it's not ready for read,
close, or accept.  So it almost looks like Windows can change its mind
about whether the socket is readable.

Or maybe we're telling it to change its mind.  This sounds an awful
lot like something that could have been caused by the oversights fixed
in commit b85427f2276d02756b558c0024949305ea65aca5.  Was there a
reason we didn't back-patch that?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: Error on pg_settings.bytea_output for pg9.1
Следующее
От: Robert Haas
Дата:
Сообщение: Re: BUG #6738: pg_dump does not handle extensions properly/invalid pg_dump output