Re: PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.

Поиск
Список
Период
Сортировка
От Noah Misch
Тема Re: PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.
Дата
Msg-id 20140124042742.GA1975540@tornado.leadboat.com
обсуждение исходный текст
Ответ на PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.  ("Shiv Shivaraju Gowda (shivshi)" <shivshi@cisco.com>)
Ответы Re: Re: PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.
Re: PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.
Список pgsql-bugs
On Fri, Nov 22, 2013 at 01:59:58AM +0000, Shiv Shivaraju Gowda (shivshi) wrote:
> There seems to be a has a dormant bug  in PostgreSQL source code build using VisualStudio(VS)  which shows up with
thenewer OS( not sure if it is the OS or some other thing in the environment ). 

Thanks for this report, and apologies for the delay.

Please tell us as much as possible about the OS configurations that have
exhibited the problem and the OS configurations that have *not* exhibited the
problem with the same binaries.

> MinGW build doesn't access this code path and thus doesn't hit this bug.

A MinGW build certainly includes win32_sema.c, and I don't see why it wouldn't
use that code path.  How did you determine that?

> SYMPTOM:
> The following symptom is encountered in PostgreSQL build using Visual Studio:  PostgreSQL regression tests fail with
servercrashing repeatedly with this message in the log file: "PANIC:  could not lock semaphore: error code 0". The
issueis encountered for VS 2005, VS 2008( 32bit and 64bit executables), VS 2010 and VS 2012 built executables.  The
issuewas reproducible with PostgreSQL 6.2.3 and 6.2.5. We didn't encounter this issue in MinGW build or EnterpriseDB
Packagedexecutables (which seems to have been built using VisualStudio 2010). 

I assume you mean 9.2.3 and 9.2.5?

> CAUSE:
> The PGSemaphoreLock function in postgresql-9.2.5\src\backend\port\win32_sema.c
(https://github.com/postgres/postgres/blob/master/src/backend/port/win32_sema.c)uses "Ex" version of
"WaitForMultipleObjects"Windows function
(http://msdn.microsoft.com/en-us/library/windows/desktop/ms687028%28v=vs.85%29.aspx)however doesn't handle the
additionalawake calls from the "bAlertable" state. Specifically, it doesn't handle the WAIT_IO_COMPLETION return code
whenwoken up by a User-mode Asynchronous Procedure Call(APC) or Async IO completion. 
>
> The part I do NOT understand is why do the User-Mode APC or Async IO completion triggers get fired only in the
executableswe built and not in the ones built by EnterpriseDB and bits built by other users since no one has complained
aboutit. Irrespective if it is triggered or not, the code should have handled all the return codes of the
WaitForMultipleObjectsEXAPI and that is the reason I think it is a bug. 
>
> I checked the source code for calls which will trigger the Async IO or user-mode
APC(ReadFileEx<http://msdn.microsoft.com/en-us/library/windows/desktop/aa365468(v=vs.85).aspx>,
WriteFileEx<http://msdn.microsoft.com/en-us/library/windows/desktop/aa365748(v=vs.85).aspx>,
QueueUserAPC<http://msdn.microsoft.com/en-us/library/windows/desktop/ms684954(v=vs.85).aspx>)and could not find any. I
amnot sure what triggers the WAIT_IO_COMPLETION return call. I could not find a way to figure out that information in
thedebug environment. 

Yes, that's weird.  I could believe anti-malware software doing this.  If any
successful test has taken place on the same machine and user account as the
unsuccessful test, that explanation is unlikely.  It's also possible that
you're building against an unusual version of a 3rd-party library that, unlike
versions used in popular builds, does use APCs or async I/O.

All that being said, it's clearly wrong to call WaitForMultipleObjectsEx(...,
TRUE) and then treat WAIT_IO_COMPLETION like WAIT_FAILED.

> (Posible) FIX:
> Either of the following changes to the PGSemaphoreLock function work fine and pass the regression tests.
>
> 1) Replace the call to WaitForMultipleObjectsEx with WaitForMultipleObjects.

I don't know how we decided which waits to make alertable, so I lean against
changing this wait from alertable to normal.

> 2) Handle the WAIT_IO_COMPLETION return code same as WAIT_OBJECT_0. There is a similar code like this in socket.c, so
thischange should be safe too. 

Sounds good.  I looked through the other WaitFor*Object* callers for similar
problems.  pgwin32_select() also mishandles WAIT_IO_COMPLETION, returning zero
as though the full timeout had elapsed.  It should return -1 and set EINTR.
This could affect CheckRADIUSAuth() and PgstatCollectorMain(), both of which
distinguish full timeout expiration from EINTR.

Would you verify that the attached patch fixes your builds?

nm

--
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: BUG #8218: Error when querying an JSON data, 9.3beta
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: BUG #8218: Error when querying an JSON data, 9.3beta