Обсуждение: [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

Поиск
Список
Период
Сортировка

[Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

От
Dave Page
Дата:
I've been seeing this failure intermittently on Narwhal HEAD, and once
on 8.1. Other branches have been OK, as have other animals running on
the same physical box. Narwhal-HEAD is run more often than any other
builds however.

Anyone have any idea what might be wrong? It seems unlikely to be a
hardware issue given that it's the exact same test failures each time.

Regards, Dave.

-------- Original Message --------
Subject: PGBuildfarm member narwhal Branch HEAD Status changed from OK
to InstallCheck failure
Date: Fri, 20 Apr 2007 13:46:22 -0700 (PDT)
From: PG Build Farm <pgbuildfarm-web@hosting-two.commandprompt.com>
To: pgbuildfarm-status-chngs@pgfoundry.org,
pgbuildfarm-status-green@pgfoundry.org


The PGBuildfarm member narwhal had the following event on branch HEAD:

Status changed from OK to InstallCheck failure

The snapshot timestamp for the build that triggered this notification
is: 2007-04-20 20:00:01

The specs of this machine are:
OS:  Windows Server 2003 R2 / 5.2.3790
Arch: i686
Comp: GCC / 3.4.2 (mingw-special)

For more information, see
http://www.pgbuildfarm.org/cgi-bin/show_history.pl?nm=narwhal&br=HEAD


Dave Page <dpage@postgresql.org> writes:
> I've been seeing this failure intermittently on Narwhal HEAD, and once
> on 8.1. Other branches have been OK, as have other animals running on
> the same physical box. Narwhal-HEAD is run more often than any other
> builds however.

> Anyone have any idea what might be wrong? It seems unlikely to be a
> hardware issue given that it's the exact same test failures each time.

Yeah, I'd been wondering about that too, but have no clue what's up.
It seems particularly odd that all the failures are in installcheck
not check.

If you want to poke at it, I'd suggest changing the ERROR to PANIC
(it's in bufmgr.c) to cause a core dump, run installchecks till you
get a panic, and then look around in the dump to see what you can find.
It'd be particularly interesting to see what the buffer actually
contains.  Also you could look at the corresponding page of the disk
file (which in theory should be the same as the buffer contents,
since this error check is only made just after a read() ...)
        regards, tom lane


Re: [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

От
Dave Page
Дата:
Tom Lane wrote:
> Dave Page <dpage@postgresql.org> writes:
>> I've been seeing this failure intermittently on Narwhal HEAD, and once
>> on 8.1. Other branches have been OK, as have other animals running on
>> the same physical box. Narwhal-HEAD is run more often than any other
>> builds however.
> 
>> Anyone have any idea what might be wrong? It seems unlikely to be a
>> hardware issue given that it's the exact same test failures each time.
> 
> Yeah, I'd been wondering about that too, but have no clue what's up.
> It seems particularly odd that all the failures are in installcheck
> not check.
> 
> If you want to poke at it, I'd suggest changing the ERROR to PANIC
> (it's in bufmgr.c) to cause a core dump, run installchecks till you
> get a panic, and then look around in the dump to see what you can find.
> It'd be particularly interesting to see what the buffer actually
> contains.  Also you could look at the corresponding page of the disk
> file (which in theory should be the same as the buffer contents,
> since this error check is only made just after a read() ...)

Hmm, I'll give it a go when I'm back in the office, but bear in mind
this is a Mingw build on which debugging is nigh-on impossible.

Regards, Dave.


Dave Page <dpage@postgresql.org> writes:
> Tom Lane wrote:
>> If you want to poke at it, I'd suggest changing the ERROR to PANIC
>> (it's in bufmgr.c) to cause a core dump, run installchecks till you
>> get a panic, and then look around in the dump to see what you can find.
>> It'd be particularly interesting to see what the buffer actually
>> contains.  Also you could look at the corresponding page of the disk
>> file (which in theory should be the same as the buffer contents,
>> since this error check is only made just after a read() ...)

> Hmm, I'll give it a go when I'm back in the office, but bear in mind
> this is a Mingw build on which debugging is nigh-on impossible.

I was afraid of that.  Well, at least get a dump of page 104 in that
index so we can see what's on-disk.
        regards, tom lane


Re: [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

От
Dave Page
Дата:
Tom Lane wrote:
> Dave Page <dpage@postgresql.org> writes:
>> Tom Lane wrote:
>>> If you want to poke at it, I'd suggest changing the ERROR to PANIC
>>> (it's in bufmgr.c) to cause a core dump, run installchecks till you
>>> get a panic, and then look around in the dump to see what you can find.
>>> It'd be particularly interesting to see what the buffer actually
>>> contains.  Also you could look at the corresponding page of the disk
>>> file (which in theory should be the same as the buffer contents,
>>> since this error check is only made just after a read() ...)
> 
>> Hmm, I'll give it a go when I'm back in the office, but bear in mind
>> this is a Mingw build on which debugging is nigh-on impossible.
> 
> I was afraid of that.  Well, at least get a dump of page 104 in that
> index so we can see what's on-disk.

Sure - I'll have to try with 8.1/8.2 unless you have a pg_filedump 
that'll work with -HEAD?

/D


Dave Page <dpage@postgresql.org> writes:
> Tom Lane wrote:
>> I was afraid of that.  Well, at least get a dump of page 104 in that
>> index so we can see what's on-disk.

> Sure - I'll have to try with 8.1/8.2 unless you have a pg_filedump 
> that'll work with -HEAD?

No, I don't, but a plain hex/ascii dump is probably the best thing
anyway, since we know the page header is wrong.  So use any old
version of pg_filedump with -d switch.
        regards, tom lane


Re: [Fwd: PGBuildfarm member narwhal Branch HEAD Statuschanged from OK to InstallCheck failure]

От
"Zeugswetter Andreas ADI SD"
Дата:
> Hmm, I'll give it a go when I'm back in the office, but bear
> in mind this is a Mingw build on which debugging is nigh-on
> impossible.

I use the Snapshot
http://prdownloads.sf.net/mingw/gdb-6.3-2.exe?download from sf.net.
It has some issues, but it is definitely useable.

Andreas


Re: [Fwd: PGBuildfarm member narwhal Branch HEAD Statuschanged from OK to InstallCheck failure]

От
Dave Page
Дата:
Zeugswetter Andreas ADI SD wrote:
>> Hmm, I'll give it a go when I'm back in the office, but bear 
>> in mind this is a Mingw build on which debugging is nigh-on 
>> impossible.
> 
> I use the Snapshot
> http://prdownloads.sf.net/mingw/gdb-6.3-2.exe?download from sf.net.
> It has some issues, but it is definitely useable.

I'll give it a go - thanks.

Regards, Dave.



Re: [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

От
Dave Page
Дата:
Dave Page wrote:

>> If you want to poke at it, I'd suggest changing the ERROR to PANIC
>> (it's in bufmgr.c) to cause a core dump, run installchecks till you
>> get a panic, and then look around in the dump to see what you can find.

Well, in typical fashion after 25+ runs this morning there's not a 
failure in sight :-(. I'll keep trying this afternoon, but in case that 
doesn't work, I've tweaked my buildfarm config to leave error trees in 
place so maybe we can catch it that way (though that'll be without the 
PANIC of course).

Regards, Dave


Dave Page <dpage@postgresql.org> writes:
> I've been seeing this failure intermittently on Narwhal HEAD, and once
> on 8.1. Other branches have been OK, as have other animals running on
> the same physical box. Narwhal-HEAD is run more often than any other
> builds however.

Oh, this is interesting:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=baiji&dt=2007-04-26%2022:00:02

Different compiler, different OS, not quite the same block number (109,
whereas IIRC all the previous examples have complained of block 104).
Is this the same physical machine as narwhal?
        regards, tom lane


Re: [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

От
Dave Page
Дата:
Tom Lane wrote:
> Dave Page <dpage@postgresql.org> writes:
>> I've been seeing this failure intermittently on Narwhal HEAD, and once
>> on 8.1. Other branches have been OK, as have other animals running on
>> the same physical box. Narwhal-HEAD is run more often than any other
>> builds however.
> 
> Oh, this is interesting:
> 
> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=baiji&dt=2007-04-26%2022:00:02
> 
> Different compiler, different OS, not quite the same block number (109,
> whereas IIRC all the previous examples have complained of block 104).
> Is this the same physical machine as narwhal?

Yes, it is. It's an FC6 box running VMWare server, with a Win 2k3r2 VM
and a Vista ultimate VM, both with mingw and msvc animals.

I'm still not convinced it's a hardware problem - aside from the fact
that it's the same error every time (although, I note in this case it
was in check, not installcheck), I would expect at least one of SMART,
FC6, VMware or 2k3/Vista to spot that there was a problem. I have also
recreated the virtual disks of both VMs since this started happening. I
wonder if we're hitting some odd bug in VMware.

Anyhoo, unfortunately Baiji wasn't set to keep error builds - I've
changed that now and will run it a few times again. I'll also run a
sector level check of Narwhal's virtual disk and see if that complains.

Regards, Dave.


Dave Page <dpage@postgresql.org> writes:
> Tom Lane wrote:
>> Is this the same physical machine as narwhal?

> Yes, it is. It's an FC6 box running VMWare server, with a Win 2k3r2 VM
> and a Vista ultimate VM, both with mingw and msvc animals.

> I'm still not convinced it's a hardware problem - aside from the fact
> that it's the same error every time (although, I note in this case it
> was in check, not installcheck), I would expect at least one of SMART,
> FC6, VMware or 2k3/Vista to spot that there was a problem. I have also
> recreated the virtual disks of both VMs since this started happening. I
> wonder if we're hitting some odd bug in VMware.

I concur it's too regular to be a hardware issue.  The VMware idea is
a bit plausible though.  If that's it, we ought to see failures of this
ilk on all four animals sooner or later ...
        regards, tom lane


Re: [Fwd: PGBuildfarm member narwhal Branch HEAD Status changed from OK to InstallCheck failure]

От
Dave Page
Дата:
Tom Lane wrote:
> I concur it's too regular to be a hardware issue.  The VMware idea is
> a bit plausible though.  If that's it, we ought to see failures of this
> ilk on all four animals sooner or later ...

I've run full disk scans in both Windows VMs, and forced an fsck of the
host just to be on the safe side and nothing showed up. By chance it
seems that VMWare released an update just yesterday so I've upgraded
everything. Hopefully the problem will go away now, but I'm not holding
my breath!

If not, one other option would be to roll back a couple of versions of
VMware - an older version hosted Bandicoot for some time with no problems.

Regards, Dave