Обсуждение: inconsistant regression test results...
I'm trying to build+install Postgresql 7.2.1 on a OpenBSD 3.1-stable computer. The first time I built it, the 12/79 of the regression tests failed. This scared me, so I did a gmake distclean and then reconfigured and rebuilt everything. This time, 14/79 tests failed. It was getting late (or rather, early, the sun was coming up), so I decided to put things off until later. This morning I got back to it. I redownloaded the src distribution, made sure that its MP5 hash matched the expected, and then rebuilt everything useing the following configure options: ./configure --prefix=/usr/local/encap/postgresql-7.2.1 --sysconfdir=/etc/postgresql --enable-multibyte --with-CXX --with-openssl This time, 11/79 test failed. This got me wondering, so I reran the entire process (untaring, configuring, gmake'ing, and gmake check'ing) three more times. Different results each time (14, 15, then 10). I have saved the regression.out and regresson.diffs from each of these last four tests. You can seem them here: http://vvk.brownforces.org/postgresql-regression/ I've read the doc's[1] and understand that some of the tests will occasionally give different values, but I did not expect tests like join, subselect, and arrays (and others) to give inconsistant results. Is this expeceted? -Vik [1] specifically: http://www.postgresql.org/idocs/index.php?regress-evaluation.html -- Vikram Vinayak Kulkarni Ultimately, all things are known because vkulkarn@uiuc.edu you want to believe you know. vkulkarn@brownforces.org -Zensunni Koan
Vikram Kulkarni <vkulkarn@brownforces.org> writes: > I'm trying to build+install Postgresql 7.2.1 on a OpenBSD 3.1-stable > computer. The first time I built it, the 12/79 of the regression tests > failed. This scared me, so I did a gmake distclean and then reconfigured > and rebuilt everything. This time, 14/79 tests failed. ... > This time, 11/79 test failed. This got me wondering, so I reran the > entire process (untaring, configuring, gmake'ing, and gmake check'ing) > three more times. Different results each time (14, 15, then 10). It looks to me like the primary failures are that tests abort with either psql: Server process fork() failed: Resource temporarily unavailable or psql: could not send SSL negotiation packet: Broken pipe Some later tests may then fail because they expect to find tables or data created by the un-executed earlier tests. The fork-failed messages suggest very strongly that you are running out of kernel resources when you get more than a dozen or so server processes going. Perhaps you are too low on swap space, or need to enlarge the kernel's file table size. You could try to confirm this by running the regression tests serially instead of in parallel (use the installcheck option); or you could modify the parallel_schedule file to break apart the more highly parallel test sets into smaller groups. If the tests pass that way then the problem is triggered by load, not by any specific test. Not sure about the SSL complaint, but I suspect it's the same problem at bottom. You should look in the postmaster log file that's generated by the make check run, and see if you can find what gets logged by the postmaster when one of those failures is seen on the client side. regards, tom lane
I said: > The fork-failed messages suggest very strongly that you are running out > of kernel resources when you get more than a dozen or so server > processes going. Perhaps you are too low on swap space, or need to > enlarge the kernel's file table size. It's also possible that you are hitting a kernel limit on number of processes for a single user ID. See the TIP near the bottom of http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/regress-run.html regards, tom lane
On Sat, Aug 03, 2002 at 01:57:05PM -0400, Tom Lane wrote: > Tom Lane wrote: > > > > The fork-failed messages suggest very strongly that you are running > > out of kernel resources when you get more than a dozen or so server > > processes going. Perhaps you are too low on swap space, or need to > > enlarge the kernel's file table size. > > It's also possible that you are hitting a kernel limit on number of > processes for a single user ID. See the TIP near the bottom of > http://www.ca.postgresql.org/users-lounge/docs/7.2/postgres/regress-run.html Doh. That was it. I copied serial_schedule over parallel_schedule then all of the test passed. Now I feel silly for not noticing that note... Thanks alot. -Vik -- Vikram Vinayak Kulkarni you can take the poster out of .test vkulkarn@uiuc.edu but you can't take .test out of the vkulkarn@brownforces.org poster. -Jason Zych