Обсуждение: stats test on Windows is now failing repeatably?
I just looked over the buildfarm results and was struck by the observation that the stats regression test, which lately had been failing once-in-a-while on Windows and never anywhere else, has a batting average of 0-for-10-or-so over the past 24 hours on the Windows buildfarm machines. I still have no idea what the real problem is there --- but since it suddenly seems to have gotten very repeatable, I trust someone with a Windows box and a debugger will get after it before the source code drifts again. [ urk ... must ... resist ... temptation ... failing ... AUTOVACUUM? ] regards, tom lane
Tom Lane wrote: > I just looked over the buildfarm results and was struck by the > observation that the stats regression test, which lately had been > failing once-in-a-while on Windows and never anywhere else, has a > batting average of 0-for-10-or-so over the past 24 hours on the Windows > buildfarm machines. I still have no idea what the real problem is there > --- but since it suddenly seems to have gotten very repeatable, I trust > someone with a Windows box and a debugger will get after it before the > source code drifts again. maybe it's worth pointing out that leveret(fedora core5/x86_64/icc) manages to trigger that too on occassion - so maybe it is not a "windows only" bug: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=leveret&dt=2006-08-17%2008:30:01 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=leveret&dt=2006-08-10%2000:30:02 Stefan
Tom Lane <tgl@sss.pgh.pa.us> wrote: > I just looked over the buildfarm results and was struck by the > observation that the stats regression test, which lately had been > failing once-in-a-while on Windows and never anywhere else, has a > batting average of 0-for-10-or-so over the past 24 hours on the Windows > buildfarm machines. I tested HEAD on Windows and saw some Windows-specific logs. LOG: Windows fopen("base/16384/pg_internal.init","rb") failed: code 2, errno 2 LOG: Windows fopen("global/pgstat.stat","rb") failed: code 32, errno 13 The code 2 means ERROR_FILE_NOT_FOUND, "The system cannot find the file specified." and the code 32 means ERROR_SHARING_VIOLATION, "The process cannot access the file because it is being used by another process." We use the tmpfile-and-rename trick on both pg_internal.init and pgstat.stat. Are there any incompatible behavior in the trick between POSIX and Windows? Regards, --- ITAGAKI Takahiro NTT Open Source Software Center
ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp> writes: > I tested HEAD on Windows and saw some Windows-specific logs. > LOG: Windows fopen("base/16384/pg_internal.init","rb") failed: code 2, errno 2 > LOG: Windows fopen("global/pgstat.stat","rb") failed: code 32, errno 13 > The code 2 means ERROR_FILE_NOT_FOUND, "The system cannot find the file > specified." and the code 32 means ERROR_SHARING_VIOLATION, "The process > cannot access the file because it is being used by another process." The first of those is probably normal operation --- we remove pg_internal.init whenever it is out-of-date. The second is bad though. > We use the tmpfile-and-rename trick on both pg_internal.init and pgstat.stat. > Are there any incompatible behavior in the trick between POSIX and Windows? It looks to me like we have implemented Windows' FILE_SHARE_DELETE flag for open() calls but not for fopen(). Isn't this a problem? We do use fopen() for stuff like pgstat.stat. regards, tom lane
"Magnus Hagander" <mha@sollentuna.net> writes: >> It looks to me like we have implemented Windows' FILE_SHARE_DELETE >> flag for open() calls but not for fopen(). Isn't this a problem? >> We do use fopen() for stuff like pgstat.stat. > That definitely sounds like a problem, there is no reason why the issue > shouldn't occur for fopen(). Do you want to work up a patch for that > based on open(), or do you want me to take a look at it? It looks straightforward to apply our reimplemented pgwin32_open() followed by fdopen(), but since I don't have a Windows build environment I couldn't test the patch. Please take a look at it. regards, tom lane
> > The code 2 means ERROR_FILE_NOT_FOUND, "The system cannot find > the > > file specified." and the code 32 means ERROR_SHARING_VIOLATION, > "The > > process cannot access the file because it is being used by > another process." > > The first of those is probably normal operation --- we remove > pg_internal.init whenever it is out-of-date. The second is bad > though. > > > We use the tmpfile-and-rename trick on both pg_internal.init and > pgstat.stat. > > Are there any incompatible behavior in the trick between POSIX > and Windows? > > It looks to me like we have implemented Windows' FILE_SHARE_DELETE > flag for open() calls but not for fopen(). Isn't this a problem? > We do use > fopen() for stuff like pgstat.stat. That definitely sounds like a problem, there is no reason why the issue shouldn't occur for fopen(). Do you want to work up a patch for that based on open(), or do you want me to take a look at it? //Magnus
> >> It looks to me like we have implemented Windows' > FILE_SHARE_DELETE > >> flag for open() calls but not for fopen(). Isn't this a > problem? > >> We do use fopen() for stuff like pgstat.stat. > > > That definitely sounds like a problem, there is no reason why the > > issue shouldn't occur for fopen(). Do you want to work up a patch > for > > that based on open(), or do you want me to take a look at it? > > It looks straightforward to apply our reimplemented pgwin32_open() > followed by fdopen(), but since I don't have a Windows build > environment I couldn't test the patch. Please take a look at it. I think this is what we want. It passes regression tests on my machine. I never managed to reproduce the original problem on this machine, so don't know if it solves the problem, but I don't think it makes it worse :-) //Magnus
Вложения
"Magnus Hagander" <mha@sollentuna.net> writes: >> It looks straightforward to apply our reimplemented pgwin32_open() >> followed by fdopen(), but since I don't have a Windows build >> environment I couldn't test the patch. Please take a look at it. > I think this is what we want. It passes regression tests on my machine. > I never managed to reproduce the original problem on this machine, so > don't know if it solves the problem, but I don't think it makes it worse > :-) Applied, we'll see what happens ... regards, tom lane
"Magnus Hagander" <mha@sollentuna.net> wrote: > > FILE_SHARE_DELETE > > I think this is what we want. It passes regression tests on my machine. > I never managed to reproduce the original problem on this machine, so > don't know if it solves the problem, but I don't think it makes it worse > :-) It seems to work very well! I ran the same workload on the HEAD, and I did not see any pgstat.stat related logs now. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center