Re: invalid page header

Поиск
Список
Период
Сортировка
От Jo De Haes
Тема Re: invalid page header
Дата
Msg-id e2i0pa$28ng$1@news.hub.org
обсуждение исходный текст
Ответ на Re: invalid page header  (Chris Travers <chris@metatrontech.com>)
Список pgsql-general
Just a little followup on this problem.

We've moved the database to another server where it ran without problems.

HP just released new raid controller drivers for Suse and a firmware
update for the controller itself.

Until now the problem hasn't occurred anymore.

Thanks!
Jo.

Chris Travers wrote:
> Jo De Haes wrote:
>
>> OK.  The saga continues, everything is a little bit more clear, but at
>> the same time a lot more confusing.
>>
>> Today i wanted to reproduce the problem again.  And guess what? A
>> vacuum of the database went thru without any problems.
>>
>> I dump the block i was having problems with yesterday.  It doesn't
>> report an invalid header anymore and it contains other data!!!
>>
> Inconsistant problems esp. with PostgreSQL are usually the result of
> hardware failure.
>
>> Turns out the data that was returned yesterday belongs to another
>> database!
>>
>> Some more detail about the setup.  This server runs 2 instances of
>> postgresql.  One production instance which is version 8.0.3.  And
>> another testing instance installed in a different folder which runs
>> version 8.1.3  Am I wrong thinking this setup ought to work?
>
>
> No.  Ihave done it before too.  PostgreSQL instances running on
> different ports or addresses are sufficiently isolated to prevent this
> from being a problem.
>
>>
>> Both instances use completely seperated data folders.
>>
>> So the first dump returned data that actually belongs to an 8.0.3
>> database (that runs fine).  And today without _any_ intervention that
>> same block returns the correct data and the complete database is fine.
>>
>> Where is the problem?
>>     The fact that i'm running 2 different instances?
>>     Cache on raid controller messing up?
>>     Some strange voodoo?
>
>
> I would see what sort of memory testing suite you can run on your system
> first (memtestx86, for example) and go from there.  It sounds to me like
> some sort of a hardware issue.  It *could* be bits flipped anywhere,
> from the writehead on the disk to the main system memory or the CPU.
>
> The likelihood that it is a random RAM error is reduced if you are using
> ECC RAM.  Otherwise it could be anything.
>
> This being said, when I have seen bits flipped by the CPU usually you
> get a lot of index issues and shared memory corruptions, so I would be
> more inclined to think that this was RAM or RAID cache.
>
> Best Wishes,
> Chris Travers
> Metatron Technology Consulting
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq

В списке pgsql-general по дате отправления:

Предыдущее
От: "Andrus"
Дата:
Сообщение: How to close dead connections immediately
Следующее
От: Richard Huxton
Дата:
Сообщение: Re: Automatically assuming a specific role after connecting