Обсуждение: stats test on Windows is now failing repeatably?
I just looked over the buildfarm results and was struck by the
observation that the stats regression test, which lately had been
failing once-in-a-while on Windows and never anywhere else, has a
batting average of 0-for-10-or-so over the past 24 hours on the Windows
buildfarm machines. I still have no idea what the real problem is there
--- but since it suddenly seems to have gotten very repeatable, I trust
someone with a Windows box and a debugger will get after it before the
source code drifts again.
[ urk ... must ... resist ... temptation ... failing ... AUTOVACUUM? ]
regards, tom lane
Tom Lane wrote: > I just looked over the buildfarm results and was struck by the > observation that the stats regression test, which lately had been > failing once-in-a-while on Windows and never anywhere else, has a > batting average of 0-for-10-or-so over the past 24 hours on the Windows > buildfarm machines. I still have no idea what the real problem is there > --- but since it suddenly seems to have gotten very repeatable, I trust > someone with a Windows box and a debugger will get after it before the > source code drifts again. maybe it's worth pointing out that leveret(fedora core5/x86_64/icc) manages to trigger that too on occassion - so maybe it is not a "windows only" bug: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=leveret&dt=2006-08-17%2008:30:01 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=leveret&dt=2006-08-10%2000:30:02 Stefan
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I just looked over the buildfarm results and was struck by the
> observation that the stats regression test, which lately had been
> failing once-in-a-while on Windows and never anywhere else, has a
> batting average of 0-for-10-or-so over the past 24 hours on the Windows
> buildfarm machines.
I tested HEAD on Windows and saw some Windows-specific logs.
LOG: Windows fopen("base/16384/pg_internal.init","rb") failed: code 2, errno 2
LOG: Windows fopen("global/pgstat.stat","rb") failed: code 32, errno 13
The code 2 means ERROR_FILE_NOT_FOUND, "The system cannot find the file
specified." and the code 32 means ERROR_SHARING_VIOLATION, "The process
cannot access the file because it is being used by another process."
We use the tmpfile-and-rename trick on both pg_internal.init and pgstat.stat.
Are there any incompatible behavior in the trick between POSIX and Windows?
Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes:
> I tested HEAD on Windows and saw some Windows-specific logs.
> LOG: Windows fopen("base/16384/pg_internal.init","rb") failed: code 2, errno 2
> LOG: Windows fopen("global/pgstat.stat","rb") failed: code 32, errno 13
> The code 2 means ERROR_FILE_NOT_FOUND, "The system cannot find the file
> specified." and the code 32 means ERROR_SHARING_VIOLATION, "The process
> cannot access the file because it is being used by another process."
The first of those is probably normal operation --- we remove
pg_internal.init whenever it is out-of-date. The second is bad though.
> We use the tmpfile-and-rename trick on both pg_internal.init and pgstat.stat.
> Are there any incompatible behavior in the trick between POSIX and Windows?
It looks to me like we have implemented Windows' FILE_SHARE_DELETE flag
for open() calls but not for fopen(). Isn't this a problem? We do use
fopen() for stuff like pgstat.stat.
regards, tom lane
"Magnus Hagander" <mha@sollentuna.net> writes:
>> It looks to me like we have implemented Windows' FILE_SHARE_DELETE
>> flag for open() calls but not for fopen(). Isn't this a problem?
>> We do use fopen() for stuff like pgstat.stat.
> That definitely sounds like a problem, there is no reason why the issue
> shouldn't occur for fopen(). Do you want to work up a patch for that
> based on open(), or do you want me to take a look at it?
It looks straightforward to apply our reimplemented pgwin32_open()
followed by fdopen(), but since I don't have a Windows build environment
I couldn't test the patch. Please take a look at it.
regards, tom lane
> > The code 2 means ERROR_FILE_NOT_FOUND, "The system cannot find > the > > file specified." and the code 32 means ERROR_SHARING_VIOLATION, > "The > > process cannot access the file because it is being used by > another process." > > The first of those is probably normal operation --- we remove > pg_internal.init whenever it is out-of-date. The second is bad > though. > > > We use the tmpfile-and-rename trick on both pg_internal.init and > pgstat.stat. > > Are there any incompatible behavior in the trick between POSIX > and Windows? > > It looks to me like we have implemented Windows' FILE_SHARE_DELETE > flag for open() calls but not for fopen(). Isn't this a problem? > We do use > fopen() for stuff like pgstat.stat. That definitely sounds like a problem, there is no reason why the issue shouldn't occur for fopen(). Do you want to work up a patch for that based on open(), or do you want me to take a look at it? //Magnus
> >> It looks to me like we have implemented Windows' > FILE_SHARE_DELETE > >> flag for open() calls but not for fopen(). Isn't this a > problem? > >> We do use fopen() for stuff like pgstat.stat. > > > That definitely sounds like a problem, there is no reason why the > > issue shouldn't occur for fopen(). Do you want to work up a patch > for > > that based on open(), or do you want me to take a look at it? > > It looks straightforward to apply our reimplemented pgwin32_open() > followed by fdopen(), but since I don't have a Windows build > environment I couldn't test the patch. Please take a look at it. I think this is what we want. It passes regression tests on my machine. I never managed to reproduce the original problem on this machine, so don't know if it solves the problem, but I don't think it makes it worse :-) //Magnus
Вложения
"Magnus Hagander" <mha@sollentuna.net> writes:
>> It looks straightforward to apply our reimplemented pgwin32_open()
>> followed by fdopen(), but since I don't have a Windows build
>> environment I couldn't test the patch. Please take a look at it.
> I think this is what we want. It passes regression tests on my machine.
> I never managed to reproduce the original problem on this machine, so
> don't know if it solves the problem, but I don't think it makes it worse
> :-)
Applied, we'll see what happens ...
regards, tom lane
"Magnus Hagander" <mha@sollentuna.net> wrote: > > FILE_SHARE_DELETE > > I think this is what we want. It passes regression tests on my machine. > I never managed to reproduce the original problem on this machine, so > don't know if it solves the problem, but I don't think it makes it worse > :-) It seems to work very well! I ran the same workload on the HEAD, and I did not see any pgstat.stat related logs now. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center