Обсуждение: Can someone verify CVS tip on Win32?
I think I've applied all the porting patches that were in the queue. (Anyone have any I missed?) Can someone verify that the Win32 port is not busted, before we wrap beta5? regards, tom lane
pg_autovacuum.c is busted for Win32: duplicate 'N' cases in commandline arg processing make / make check ran without a hitch. cheers andrew Tom Lane wrote: >I think I've applied all the porting patches that were in the queue. >(Anyone have any I missed?) Can someone verify that the Win32 port >is not busted, before we wrap beta5? > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > >
*Yikes* I should have been more careful, I forgot to check that the new arguments I added didn't conflict with the windows only args. The new -n -N were basically picked at random, any other set of non-colliding letters would work fine. Matthew Andrew Dunstan wrote: > pg_autovacuum.c is busted for Win32: duplicate 'N' cases in > commandline arg processing > > make / make check ran without a hitch. > > cheers > > andrew > > Tom Lane wrote: > >> I think I've applied all the porting patches that were in the queue. >> (Anyone have any I missed?) Can someone verify that the Win32 port >> is not busted, before we wrap beta5? >> >> regards, tom lane >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org >> >> >> > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html > -- Matthew O'Connor V.P. of Operations Terrie O'Connor Realtors 201-934-4900 x27
"Matthew T. O'Connor" <matthew@tocr.com> writes: > *Yikes* I should have been more careful, I forgot to check that the new > arguments I added didn't conflict with the windows only args. > The new -n -N were basically picked at random, any other set of > non-colliding letters would work fine. OK. I changed it to -l. regards, tom lane
Tom Lane wrote: >"Matthew T. O'Connor" <matthew@tocr.com> writes: > > >>*Yikes* I should have been more careful, I forgot to check that the new >>arguments I added didn't conflict with the windows only args. >> >> > > > >>The new -n -N were basically picked at random, any other set of >>non-colliding letters would work fine. >> >> > >OK. I changed it to -l. > > Everything builds fine now on Win32. Make check, and make installcheck run clean. I still got a contrib/installcheck failure, but I'm not sure it's related to recent changes. I seem to see this regularly on SPARC/(Solaris or OpenBSD) platforms and now I've seen it on Intel/Windows - see buildfarm histories for members spoonbill, potorooo and loris. The symptom is that some regression script will fail and this will be the only output: psql: FATAL: database "regression" does not exist I have not seen it on the REL7_4_STABLE branch, only on HEAD. I wish I knew what it was. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > I seem to see this regularly on SPARC/(Solaris or OpenBSD) platforms and > now I've seen it on Intel/Windows - see buildfarm histories for members > spoonbill, potorooo and loris. Which are where? regards, tom lane
Tom Lane wrote: >Andrew Dunstan <andrew@dunslane.net> writes: > > >>I seem to see this regularly on SPARC/(Solaris or OpenBSD) platforms and >>now I've seen it on Intel/Windows - see buildfarm histories for members >>spoonbill, potorooo and loris. >> >> > >Which are where? > > Start here: http://www.pgbuildfarm.org/cgi-bin/show_status.pl If you click the member name you see its recent history for the given branch. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Start here: > http://www.pgbuildfarm.org/cgi-bin/show_status.pl Hmm ... I have a theory about it, but I'm not sure how to reproduce the problem. How many databases have you created in the installation that the contrib installcheck is running against? Are you running autovacuum against it? regards, tom lane
Tom Lane wrote: >Andrew Dunstan <andrew@dunslane.net> writes: > > >>Start here: >>http://www.pgbuildfarm.org/cgi-bin/show_status.pl >> >> > >Hmm ... I have a theory about it, but I'm not sure how to reproduce the >problem. How many databases have you created in the installation that >the contrib installcheck is running against? > Just what make installcheck / make contrib installcheck runs. For full details of what we do, see http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pgbuildfarm/client-code/run_build.pl?rev=1.11&content-type=text/x-cvsweb-markup >Are you running autovacuum >against it? > > > > No. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> Hmm ... I have a theory about it, but I'm not sure how to reproduce the >> problem. How many databases have you created in the installation that >> the contrib installcheck is running against? > Just what make installcheck / make contrib installcheck runs. OK. I still haven't been able to reproduce it, but the place where it is failing is consistent with my theory, which is: 1. CREATE DATABASE creates a pg_database row for "regression" that is the last or nearly last row that will fit into block 0 of pg_database. It then flushes this block to disk to ensure that new backends can see the row in GetRawDatabaseInfo. 2. pg_regress.sh then does several ALTER DATABASE operations. These will mark the original row dead and make a new row. At the end of this, I hypothesize that the live copy of the "regression" row is in pg_database block 1, not block 0. And it's not been flushed to disk, because ALTER DATABASE fails to do that. 3. (Here's the hard-to-reproduce part.) Assume that something causes block 0, but not block 1, of pg_database to be flushed from shared buffers to disk. 4. Now, an incoming backend will see the original pg_database row for "regression" as committed dead, so it'll ignore it. It can't see the live row because that's not been flushed to disk; it's only in shared buffers. Ergo, GetRawDatabaseInfo fails. The problem goes away as soon as a checkpoint happens, but it's still possible for the regression tests to fail this way. A reasonable theory about step 3 is that the bgwriter chooses to write out block 0 at just the right time. This would happen infrequently enough to explain why we've not seen this reported before. This theory explains why the failure consistently happens at the same place in the test sequence, and why that place is machine-architecture dependent: it can only happen when a certain number of pg_database rows have been created and deleted, and the magic number depends on the machine MAXALIGN value because that affects the size of the rows. The fix of course is that ALTER DATABASE must flush pg_database to disk, just as RENAME does. regards, tom lane
Tom Lane wrote: > >The fix of course is that ALTER DATABASE must flush pg_database to disk, >just as RENAME does. > > > > I can't think of a downside to this. so I vote to do it. If you do it right now we'd know in about 13 hours from now if it fixed things on spoonbill, where it seems to happen most reliably. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > I can't think of a downside to this. so I vote to do it. If you do it > right now we'd know in about 13 hours from now if it fixed things on > spoonbill, where it seems to happen most reliably. Patch committed. regards, tom lane
Tom Lane schrieb: > Andrew Dunstan <andrew@dunslane.net> writes: > >>Tom Lane wrote: >> >>>Hmm ... I have a theory about it, but I'm not sure how to reproduce the >>>problem. How many databases have you created in the installation that >>>the contrib installcheck is running against? > > >>Just what make installcheck / make contrib installcheck runs. > > OK. I still haven't been able to reproduce it, but the place where it > is failing is consistent with my theory, which is: > > 1. CREATE DATABASE creates a pg_database row for "regression" that is > the last or nearly last row that will fit into block 0 of pg_database. > It then flushes this block to disk to ensure that new backends can see > the row in GetRawDatabaseInfo. > > 2. pg_regress.sh then does several ALTER DATABASE operations. These > will mark the original row dead and make a new row. At the end of this, > I hypothesize that the live copy of the "regression" row is in > pg_database block 1, not block 0. And it's not been flushed to disk, > because ALTER DATABASE fails to do that. > > 3. (Here's the hard-to-reproduce part.) Assume that something causes > block 0, but not block 1, of pg_database to be flushed from shared > buffers to disk. > > 4. Now, an incoming backend will see the original pg_database row for > "regression" as committed dead, so it'll ignore it. It can't see the > live row because that's not been flushed to disk; it's only in shared > buffers. Ergo, GetRawDatabaseInfo fails. > > The problem goes away as soon as a checkpoint happens, but it's still > possible for the regression tests to fail this way. > > A reasonable theory about step 3 is that the bgwriter chooses to write > out block 0 at just the right time. This would happen infrequently > enough to explain why we've not seen this reported before. > > This theory explains why the failure consistently happens at the same > place in the test sequence, and why that place is machine-architecture > dependent: it can only happen when a certain number of pg_database rows > have been created and deleted, and the magic number depends on the > machine MAXALIGN value because that affects the size of the rows. > > The fix of course is that ALTER DATABASE must flush pg_database to disk, > just as RENAME does. This also explains my strange regression problems on cygwin. Thanks for the change. Everything looks much easier now. -- Reini Urban http://xarch.tu-graz.ac.at/home/rurban/