Jo De Haes wrote:
> OK. The saga continues, everything is a little bit more clear, but at
> the same time a lot more confusing.
>
> Today i wanted to reproduce the problem again. And guess what? A
> vacuum of the database went thru without any problems.
>
> I dump the block i was having problems with yesterday. It doesn't
> report an invalid header anymore and it contains other data!!!
>
Inconsistant problems esp. with PostgreSQL are usually the result of
hardware failure.
> Turns out the data that was returned yesterday belongs to another
> database!
>
> Some more detail about the setup. This server runs 2 instances of
> postgresql. One production instance which is version 8.0.3. And
> another testing instance installed in a different folder which runs
> version 8.1.3 Am I wrong thinking this setup ought to work?
No. Ihave done it before too. PostgreSQL instances running on
different ports or addresses are sufficiently isolated to prevent this
from being a problem.
>
> Both instances use completely seperated data folders.
>
> So the first dump returned data that actually belongs to an 8.0.3
> database (that runs fine). And today without _any_ intervention that
> same block returns the correct data and the complete database is fine.
>
> Where is the problem?
> The fact that i'm running 2 different instances?
> Cache on raid controller messing up?
> Some strange voodoo?
I would see what sort of memory testing suite you can run on your system
first (memtestx86, for example) and go from there. It sounds to me like
some sort of a hardware issue. It *could* be bits flipped anywhere,
from the writehead on the disk to the main system memory or the CPU.
The likelihood that it is a random RAM error is reduced if you are using
ECC RAM. Otherwise it could be anything.
This being said, when I have seen bits flipped by the CPU usually you
get a lot of index issues and shared memory corruptions, so I would be
more inclined to think that this was RAM or RAID cache.
Best Wishes,
Chris Travers
Metatron Technology Consulting