Обсуждение: Can someone verify CVS tip on Win32?

Поиск
Список
Период
Сортировка

Can someone verify CVS tip on Win32?

От
Tom Lane
Дата:
I think I've applied all the porting patches that were in the queue.
(Anyone have any I missed?)  Can someone verify that the Win32 port
is not busted, before we wrap beta5?

            regards, tom lane

Re: Can someone verify CVS tip on Win32?

От
Andrew Dunstan
Дата:
pg_autovacuum.c is busted for Win32:  duplicate 'N' cases in commandline
arg processing

make / make check ran without a hitch.

cheers

andrew

Tom Lane wrote:

>I think I've applied all the porting patches that were in the queue.
>(Anyone have any I missed?)  Can someone verify that the Win32 port
>is not busted, before we wrap beta5?
>
>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>
>

Re: Can someone verify CVS tip on Win32?

От
"Matthew T. O'Connor"
Дата:
*Yikes*  I should have been more careful, I forgot to check that the new
arguments I added didn't conflict with the windows only args.

The new -n -N were basically picked at random, any other set of
non-colliding letters would work fine.

Matthew


Andrew Dunstan wrote:

> pg_autovacuum.c is busted for Win32:  duplicate 'N' cases in
> commandline arg processing
>
> make / make check ran without a hitch.
>
> cheers
>
> andrew
>
> Tom Lane wrote:
>
>> I think I've applied all the porting patches that were in the queue.
>> (Anyone have any I missed?)  Can someone verify that the Win32 port
>> is not busted, before we wrap beta5?
>>
>>             regards, tom lane
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>>
>>
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faqs/FAQ.html
>

--
Matthew O'Connor
V.P. of Operations
Terrie O'Connor Realtors
201-934-4900 x27


Re: Can someone verify CVS tip on Win32?

От
Tom Lane
Дата:
"Matthew T. O'Connor" <matthew@tocr.com> writes:
> *Yikes*  I should have been more careful, I forgot to check that the new
> arguments I added didn't conflict with the windows only args.

> The new -n -N were basically picked at random, any other set of
> non-colliding letters would work fine.

OK.  I changed it to -l.

            regards, tom lane

Re: Can someone verify CVS tip on Win32?

От
Andrew Dunstan
Дата:

Tom Lane wrote:

>"Matthew T. O'Connor" <matthew@tocr.com> writes:
>
>
>>*Yikes*  I should have been more careful, I forgot to check that the new
>>arguments I added didn't conflict with the windows only args.
>>
>>
>
>
>
>>The new -n -N were basically picked at random, any other set of
>>non-colliding letters would work fine.
>>
>>
>
>OK.  I changed it to -l.
>
>

Everything builds fine now on Win32. Make check, and make installcheck
run clean. I still got a contrib/installcheck failure, but I'm not sure
it's related to recent changes.

I seem to see this regularly on SPARC/(Solaris or OpenBSD) platforms and
now I've seen it on Intel/Windows - see buildfarm histories for members
spoonbill, potorooo and loris. The symptom is that some regression
script will fail and this will be the only output:

psql: FATAL:  database "regression" does not exist

I have not seen it on the REL7_4_STABLE branch, only on HEAD.

I wish I knew what it was.

cheers

andrew





Re: Can someone verify CVS tip on Win32?

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> I seem to see this regularly on SPARC/(Solaris or OpenBSD) platforms and
> now I've seen it on Intel/Windows - see buildfarm histories for members
> spoonbill, potorooo and loris.

Which are where?

            regards, tom lane

Re: Can someone verify CVS tip on Win32?

От
Andrew Dunstan
Дата:

Tom Lane wrote:

>Andrew Dunstan <andrew@dunslane.net> writes:
>
>
>>I seem to see this regularly on SPARC/(Solaris or OpenBSD) platforms and
>>now I've seen it on Intel/Windows - see buildfarm histories for members
>>spoonbill, potorooo and loris.
>>
>>
>
>Which are where?
>
>
Start here:

http://www.pgbuildfarm.org/cgi-bin/show_status.pl

If you click the member name you see its recent history for the given
branch.


cheers

andrew

Re: Can someone verify CVS tip on Win32?

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> Start here:
> http://www.pgbuildfarm.org/cgi-bin/show_status.pl

Hmm ... I have a theory about it, but I'm not sure how to reproduce the
problem.  How many databases have you created in the installation that
the contrib installcheck is running against?  Are you running autovacuum
against it?

            regards, tom lane

Re: Can someone verify CVS tip on Win32?

От
Andrew Dunstan
Дата:

Tom Lane wrote:

>Andrew Dunstan <andrew@dunslane.net> writes:
>
>
>>Start here:
>>http://www.pgbuildfarm.org/cgi-bin/show_status.pl
>>
>>
>
>Hmm ... I have a theory about it, but I'm not sure how to reproduce the
>problem.  How many databases have you created in the installation that
>the contrib installcheck is running against?
>

Just what make installcheck / make contrib installcheck  runs. For full
details of what we do, see


http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pgbuildfarm/client-code/run_build.pl?rev=1.11&content-type=text/x-cvsweb-markup


>Are you running autovacuum
>against it?
>
>
>
>

No.

cheers

andrew

Re: Can someone verify CVS tip on Win32?

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> Tom Lane wrote:
>> Hmm ... I have a theory about it, but I'm not sure how to reproduce the
>> problem.  How many databases have you created in the installation that
>> the contrib installcheck is running against?

> Just what make installcheck / make contrib installcheck  runs.

OK.  I still haven't been able to reproduce it, but the place where it
is failing is consistent with my theory, which is:

1. CREATE DATABASE creates a pg_database row for "regression" that is
the last or nearly last row that will fit into block 0 of pg_database.
It then flushes this block to disk to ensure that new backends can see
the row in GetRawDatabaseInfo.

2. pg_regress.sh then does several ALTER DATABASE operations.  These
will mark the original row dead and make a new row.  At the end of this,
I hypothesize that the live copy of the "regression" row is in
pg_database block 1, not block 0.  And it's not been flushed to disk,
because ALTER DATABASE fails to do that.

3. (Here's the hard-to-reproduce part.)  Assume that something causes
block 0, but not block 1, of pg_database to be flushed from shared
buffers to disk.

4. Now, an incoming backend will see the original pg_database row for
"regression" as committed dead, so it'll ignore it.  It can't see the
live row because that's not been flushed to disk; it's only in shared
buffers.  Ergo, GetRawDatabaseInfo fails.

The problem goes away as soon as a checkpoint happens, but it's still
possible for the regression tests to fail this way.

A reasonable theory about step 3 is that the bgwriter chooses to write
out block 0 at just the right time.  This would happen infrequently
enough to explain why we've not seen this reported before.

This theory explains why the failure consistently happens at the same
place in the test sequence, and why that place is machine-architecture
dependent: it can only happen when a certain number of pg_database rows
have been created and deleted, and the magic number depends on the
machine MAXALIGN value because that affects the size of the rows.

The fix of course is that ALTER DATABASE must flush pg_database to disk,
just as RENAME does.

            regards, tom lane

Re: Can someone verify CVS tip on Win32?

От
Andrew Dunstan
Дата:

Tom Lane wrote:

>
>The fix of course is that ALTER DATABASE must flush pg_database to disk,
>just as RENAME does.
>
>
>
>

I can't think of a downside to this. so I vote to do it. If you do it
right now we'd know in about 13 hours from now if it fixed things on
spoonbill, where it seems to happen most reliably.

cheers

andrew

Re: Can someone verify CVS tip on Win32?

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> I can't think of a downside to this. so I vote to do it. If you do it
> right now we'd know in about 13 hours from now if it fixed things on
> spoonbill, where it seems to happen most reliably.

Patch committed.

            regards, tom lane

Re: Can someone verify CVS tip on Win32?

От
Reini Urban
Дата:
Tom Lane schrieb:
> Andrew Dunstan <andrew@dunslane.net> writes:
>
>>Tom Lane wrote:
>>
>>>Hmm ... I have a theory about it, but I'm not sure how to reproduce the
>>>problem.  How many databases have you created in the installation that
>>>the contrib installcheck is running against?
>
>
>>Just what make installcheck / make contrib installcheck  runs.
>
> OK.  I still haven't been able to reproduce it, but the place where it
> is failing is consistent with my theory, which is:
>
> 1. CREATE DATABASE creates a pg_database row for "regression" that is
> the last or nearly last row that will fit into block 0 of pg_database.
> It then flushes this block to disk to ensure that new backends can see
> the row in GetRawDatabaseInfo.
>
> 2. pg_regress.sh then does several ALTER DATABASE operations.  These
> will mark the original row dead and make a new row.  At the end of this,
> I hypothesize that the live copy of the "regression" row is in
> pg_database block 1, not block 0.  And it's not been flushed to disk,
> because ALTER DATABASE fails to do that.
>
> 3. (Here's the hard-to-reproduce part.)  Assume that something causes
> block 0, but not block 1, of pg_database to be flushed from shared
> buffers to disk.
>
> 4. Now, an incoming backend will see the original pg_database row for
> "regression" as committed dead, so it'll ignore it.  It can't see the
> live row because that's not been flushed to disk; it's only in shared
> buffers.  Ergo, GetRawDatabaseInfo fails.
>
> The problem goes away as soon as a checkpoint happens, but it's still
> possible for the regression tests to fail this way.
>
> A reasonable theory about step 3 is that the bgwriter chooses to write
> out block 0 at just the right time.  This would happen infrequently
> enough to explain why we've not seen this reported before.
>
> This theory explains why the failure consistently happens at the same
> place in the test sequence, and why that place is machine-architecture
> dependent: it can only happen when a certain number of pg_database rows
> have been created and deleted, and the magic number depends on the
> machine MAXALIGN value because that affects the size of the rows.
>
> The fix of course is that ALTER DATABASE must flush pg_database to disk,
> just as RENAME does.

This also explains my strange regression problems on cygwin. Thanks for
the change. Everything looks much easier now.

--
Reini Urban
http://xarch.tu-graz.ac.at/home/rurban/