Обсуждение: PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.

Поиск
Список
Период
Сортировка

PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.

От
"Shiv Shivaraju Gowda (shivshi)"
Дата:
There seems to be a has a dormant bug  in PostgreSQL source code build usin=
g VisualStudio(VS)  which shows up with the newer OS( not sure if it is the=
 OS or some other thing in the environment ). MinGW build doesn't access th=
is code path and thus doesn't hit this bug.

SYMPTOM:
The following symptom is encountered in PostgreSQL build using Visual Studi=
o:  PostgreSQL regression tests fail with server crashing repeatedly with t=
his message in the log file: "PANIC:  could not lock semaphore: error code =
0". The issue is encountered for VS 2005, VS 2008( 32bit and 64bit executab=
les), VS 2010 and VS 2012 built executables.  The issue was reproducible wi=
th PostgreSQL 6.2.3 and 6.2.5. We didn't encounter this issue in MinGW buil=
d or EnterpriseDB Packaged executables (which seems to have been built usin=
g VisualStudio 2010).


CAUSE:
The PGSemaphoreLock function in postgresql-9.2.5\src\backend\port\win32_sem=
a.c (https://github.com/postgres/postgres/blob/master/src/backend/port/win3=
2_sema.c) uses "Ex" version of "WaitForMultipleObjects" Windows function (h=
ttp://msdn.microsoft.com/en-us/library/windows/desktop/ms687028%28v=3Dvs.85=
%29.aspx) however doesn't handle the additional awake calls from the "bAler=
table" state. Specifically, it doesn't handle the WAIT_IO_COMPLETION return=
 code when woken up by a User-mode Asynchronous Procedure Call(APC) or Asyn=
c IO completion.

The part I do NOT understand is why do the User-Mode APC or Async IO comple=
tion triggers get fired only in the executables we built and not in the one=
s built by EnterpriseDB and bits built by other users since no one has comp=
lained about it. Irrespective if it is triggered or not, the code should ha=
ve handled all the return codes of the WaitForMultipleObjectsEX API and tha=
t is the reason I think it is a bug.

I checked the source code for calls which will trigger the Async IO or user=
-mode APC(ReadFileEx<http://msdn.microsoft.com/en-us/library/windows/deskto=
p/aa365468(v=3Dvs.85).aspx>, WriteFileEx<http://msdn.microsoft.com/en-us/li=
brary/windows/desktop/aa365748(v=3Dvs.85).aspx>, QueueUserAPC<http://msdn.m=
icrosoft.com/en-us/library/windows/desktop/ms684954(v=3Dvs.85).aspx>) and c=
ould not find any. I am not sure what triggers the WAIT_IO_COMPLETION retur=
n call. I could not find a way to figure out that information in the debug =
environment.

(Posible) FIX:
Either of the following changes to the PGSemaphoreLock function work fine a=
nd pass the regression tests.

1) Replace the call to WaitForMultipleObjectsEx with WaitForMultipleObjects=
.
2) Handle the WAIT_IO_COMPLETION return code same as WAIT_OBJECT_0. There i=
s a similar code like this in socket.c, so this change should be safe too.

Re: PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.

От
Alvaro Herrera
Дата:
Shiv Shivaraju Gowda (shivshi) wrote:

> SYMPTOM:
> The following symptom is encountered in PostgreSQL build using Visual Studio:  PostgreSQL regression tests fail with
servercrashing repeatedly with this message in the log file: "PANIC:  could not lock semaphore: error code 0". The
issueis encountered for VS 2005, VS 2008( 32bit and 64bit executables), VS 2010 and VS 2012 built executables.  The
issuewas reproducible with PostgreSQL 6.2.3 and 6.2.5. We didn't encounter this issue in MinGW build or EnterpriseDB
Packagedexecutables (which seems to have been built using VisualStudio 2010). 

> (Posible) FIX:
> Either of the following changes to the PGSemaphoreLock function work fine and pass the regression tests.
>
> 1) Replace the call to WaitForMultipleObjectsEx with WaitForMultipleObjects.
> 2) Handle the WAIT_IO_COMPLETION return code same as WAIT_OBJECT_0. There is a similar code like this in socket.c, so
thischange should be safe too. 

Is anyone looking into this?


--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.

От
Noah Misch
Дата:
On Fri, Nov 22, 2013 at 01:59:58AM +0000, Shiv Shivaraju Gowda (shivshi) wrote:
> There seems to be a has a dormant bug  in PostgreSQL source code build using VisualStudio(VS)  which shows up with
thenewer OS( not sure if it is the OS or some other thing in the environment ). 

Thanks for this report, and apologies for the delay.

Please tell us as much as possible about the OS configurations that have
exhibited the problem and the OS configurations that have *not* exhibited the
problem with the same binaries.

> MinGW build doesn't access this code path and thus doesn't hit this bug.

A MinGW build certainly includes win32_sema.c, and I don't see why it wouldn't
use that code path.  How did you determine that?

> SYMPTOM:
> The following symptom is encountered in PostgreSQL build using Visual Studio:  PostgreSQL regression tests fail with
servercrashing repeatedly with this message in the log file: "PANIC:  could not lock semaphore: error code 0". The
issueis encountered for VS 2005, VS 2008( 32bit and 64bit executables), VS 2010 and VS 2012 built executables.  The
issuewas reproducible with PostgreSQL 6.2.3 and 6.2.5. We didn't encounter this issue in MinGW build or EnterpriseDB
Packagedexecutables (which seems to have been built using VisualStudio 2010). 

I assume you mean 9.2.3 and 9.2.5?

> CAUSE:
> The PGSemaphoreLock function in postgresql-9.2.5\src\backend\port\win32_sema.c
(https://github.com/postgres/postgres/blob/master/src/backend/port/win32_sema.c)uses "Ex" version of
"WaitForMultipleObjects"Windows function
(http://msdn.microsoft.com/en-us/library/windows/desktop/ms687028%28v=vs.85%29.aspx)however doesn't handle the
additionalawake calls from the "bAlertable" state. Specifically, it doesn't handle the WAIT_IO_COMPLETION return code
whenwoken up by a User-mode Asynchronous Procedure Call(APC) or Async IO completion. 
>
> The part I do NOT understand is why do the User-Mode APC or Async IO completion triggers get fired only in the
executableswe built and not in the ones built by EnterpriseDB and bits built by other users since no one has complained
aboutit. Irrespective if it is triggered or not, the code should have handled all the return codes of the
WaitForMultipleObjectsEXAPI and that is the reason I think it is a bug. 
>
> I checked the source code for calls which will trigger the Async IO or user-mode
APC(ReadFileEx<http://msdn.microsoft.com/en-us/library/windows/desktop/aa365468(v=vs.85).aspx>,
WriteFileEx<http://msdn.microsoft.com/en-us/library/windows/desktop/aa365748(v=vs.85).aspx>,
QueueUserAPC<http://msdn.microsoft.com/en-us/library/windows/desktop/ms684954(v=vs.85).aspx>)and could not find any. I
amnot sure what triggers the WAIT_IO_COMPLETION return call. I could not find a way to figure out that information in
thedebug environment. 

Yes, that's weird.  I could believe anti-malware software doing this.  If any
successful test has taken place on the same machine and user account as the
unsuccessful test, that explanation is unlikely.  It's also possible that
you're building against an unusual version of a 3rd-party library that, unlike
versions used in popular builds, does use APCs or async I/O.

All that being said, it's clearly wrong to call WaitForMultipleObjectsEx(...,
TRUE) and then treat WAIT_IO_COMPLETION like WAIT_FAILED.

> (Posible) FIX:
> Either of the following changes to the PGSemaphoreLock function work fine and pass the regression tests.
>
> 1) Replace the call to WaitForMultipleObjectsEx with WaitForMultipleObjects.

I don't know how we decided which waits to make alertable, so I lean against
changing this wait from alertable to normal.

> 2) Handle the WAIT_IO_COMPLETION return code same as WAIT_OBJECT_0. There is a similar code like this in socket.c, so
thischange should be safe too. 

Sounds good.  I looked through the other WaitFor*Object* callers for similar
problems.  pgwin32_select() also mishandles WAIT_IO_COMPLETION, returning zero
as though the full timeout had elapsed.  It should return -1 and set EINTR.
This could affect CheckRADIUSAuth() and PgstatCollectorMain(), both of which
distinguish full timeout expiration from EINTR.

Would you verify that the attached patch fixes your builds?

nm

--
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Вложения

Re: Re: PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> Would you verify that the attached patch fixes your builds?

The aspect of this patch that leaves ImmediateInterruptOK set over a
much larger code range in PGSemaphoreLock --- and in particular doesn't
bother to reset it before entering ereport() --- is a horrid idea.
Keep in mind that setting ImmediateInterruptOK means you can lose control
*anywhere*, and so have no ability to safely manipulate anything except
local variables.

            regards, tom lane

Re: Re: PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.

От
Noah Misch
Дата:
On Fri, Jan 24, 2014 at 10:14:24AM -0500, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > Would you verify that the attached patch fixes your builds?
>
> The aspect of this patch that leaves ImmediateInterruptOK set over a
> much larger code range in PGSemaphoreLock --- and in particular doesn't
> bother to reset it before entering ereport() --- is a horrid idea.
> Keep in mind that setting ImmediateInterruptOK means you can lose control
> *anywhere*, and so have no ability to safely manipulate anything except
> local variables.

What's wrong with ereport(FATAL) under ImmediateInterruptOK?
ClientAuthentication() does it, and errfinish() clears the flag.

--
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Вложения

Re: PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.

От
shivshi
Дата:
Thanks Noah Misch for looking into this.

> Please tell us as much as possible about the OS configurations that have
> exhibited the problem and the OS configurations that have *not* exhibited
> the problem with the same binaries.

The OS configuration that has exhibited the problem is Windows 7 Enterprise,
64 bit OS on Intel Core-I7.
We are not aware of a OS configuration on which this did work.

> A MinGW build certainly includes win32_sema.c, and I don't see why it
> wouldn't use that code path.  How did you determine that?

I thought the MinGW build which is built with gnu compiler invokes Posix
libraries and doesn't call into Windows System calls directly.

> I assume you mean 9.2.3 and 9.2.5?

Correct, apologize for the typo. I couldn't change the title(or the content)
after the issue was reated.

> Would you verify that the attached patch fixes your builds?

Do you want me to go ahead with the tests or do you want me to wait to
incorporate Tom Lane's suggestions






--
View this message in context:
http://postgresql.1045698.n5.nabble.com/PostgreSQL-6-2-5-Visual-Studio-Build-does-not-pass-the-regression-tests-tp5779681p5793932.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.

Re: Re: PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.

От
Alvaro Herrera
Дата:
Noah Misch wrote:
> On Fri, Jan 24, 2014 at 10:14:24AM -0500, Tom Lane wrote:
> > Noah Misch <noah@leadboat.com> writes:
> > > Would you verify that the attached patch fixes your builds?

You haven't pushed this patch yet, right?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.

От
Noah Misch
Дата:
[Please use reply-to-all to keep me on the recipients list.]

On Thu, Feb 27, 2014 at 01:43:20PM -0800, shivshi wrote:
> > Please tell us as much as possible about the OS configurations that have
> > exhibited the problem and the OS configurations that have *not* exhibited
> > the problem with the same binaries.
>
> The OS configuration that has exhibited the problem is Windows 7 Enterprise,
> 64 bit OS on Intel Core-I7.
> We are not aware of a OS configuration on which this did work.

Do you use any anti-malware software on all of those systems?

> > A MinGW build certainly includes win32_sema.c, and I don't see why it
> > wouldn't use that code path.  How did you determine that?
>
> I thought the MinGW build which is built with gnu compiler invokes Posix
> libraries and doesn't call into Windows System calls directly.

No; that approximately describes the Cygwin build.

> > Would you verify that the attached patch fixes your builds?
>
> Do you want me to go ahead with the tests or do you want me to wait to
> incorporate Tom Lane's suggestions

Please go ahead and test w32-io-completion-v1.patch.

--
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: Re: PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.

От
Noah Misch
Дата:
On Tue, Mar 04, 2014 at 12:37:25PM -0300, Alvaro Herrera wrote:
> Noah Misch wrote:
> > On Fri, Jan 24, 2014 at 10:14:24AM -0500, Tom Lane wrote:
> > > Noah Misch <noah@leadboat.com> writes:
> > > > Would you verify that the attached patch fixes your builds?
>
> You haven't pushed this patch yet, right?

Right.  (Thanks for the bump; I missed the traffic on this thread.)

--
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com

Re: PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.

От
"Shiv Shivaraju Gowda (shivshi)"
Дата:
> Do you use any anti-malware software on all of those systems?


Yes they come pre-installed, but in one of the system I tested I tried my
best to disable all the ones I knew.

> No; that approximately describes the Cygwin build.


OK, in that case I don=B9t why we don=B9t hit the problem with MinGW. May b=
e
it is the same reason why other people are not seeing this on VS build,
i.e=20
WAIT_IO_COMPLETION is not triggered somehow.


We are testing w32-io-completion-v1.patch, will keep you posted.


On 3/4/14, 8:22 AM, "Noah Misch" <noah@leadboat.com> wrote:

>[Please use reply-to-all to keep me on the recipients list.]
>
>On Thu, Feb 27, 2014 at 01:43:20PM -0800, shivshi wrote:
>> > Please tell us as much as possible about the OS configurations that
>>have
>> > exhibited the problem and the OS configurations that have *not*
>>exhibited
>> > the problem with the same binaries.
>>=20
>> The OS configuration that has exhibited the problem is Windows 7
>>Enterprise,
>> 64 bit OS on Intel Core-I7.
>> We are not aware of a OS configuration on which this did work.
>
>Do you use any anti-malware software on all of those systems?
>
>> > A MinGW build certainly includes win32_sema.c, and I don't see why it
>> > wouldn't use that code path.  How did you determine that?
>>=20
>> I thought the MinGW build which is built with gnu compiler invokes Posix
>> libraries and doesn't call into Windows System calls directly.
>
>No; that approximately describes the Cygwin build.
>
>> > Would you verify that the attached patch fixes your builds?
>>=20
>> Do you want me to go ahead with the tests or do you want me to wait to
>> incorporate Tom Lane's suggestions
>
>Please go ahead and test w32-io-completion-v1.patch.
>
>--=20
>Noah Misch
>EnterpriseDB                                 http://www.enterprisedb.com

Re: PostgreSQL 6.2.5 Visual Studio Build does not pass the regression tests.

От
Noah Misch
Дата:
On Wed, Mar 05, 2014 at 10:00:12PM +0000, Shiv Shivaraju Gowda (shivshi) wrote:
> We are testing w32-io-completion-v1.patch, will keep you posted.

What was the result?  Absent a specific report of the patch helping, I think
I'll commit it to 9.5 only.

--
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com