Re: Explained by known hardware failures, or keep looking?

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Re: Explained by known hardware failures, or keep looking?
Дата
Msg-id 1182199973.6855.279.camel@silverbirch.site
обсуждение исходный текст
Ответ на Explained by known hardware failures, or keep looking?  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Список pgsql-admin
On Mon, 2007-06-18 at 14:41 -0500, Kevin Grittner wrote:

> [2007-06-14 11:31:05.986 CDT] 6781 LOG:  redo starts at 1D2/6C739064
> [2007-06-14 11:31:46.533 CDT] 6781 WARNING:  invalid page header in block 182566 of relation "1523860"; zeroing out
page
> [2007-06-14 11:31:46.533 CDT] 6781 CONTEXT:  xlog redo split_r: rel 1663/16386/1523860; tid 182566/92; oth 182563;
rgh115741 
> [2007-06-14 11:31:56.228 CDT] 6781 WARNING:  invalid page header in block 182567 of relation "1523860"; zeroing out
page
> [2007-06-14 11:31:56.229 CDT] 6781 CONTEXT:  xlog redo split_r: rel 1663/16386/1523860; tid 182567/94; oth 182128;
rgh114655 
> [2007-06-14 11:32:04.964 CDT] 6781 WARNING:  invalid page header in block 123644 of relation "1524189"; zeroing out
page
> [2007-06-14 11:32:04.964 CDT] 6781 CONTEXT:  xlog redo split_r: rel 1663/16386/1524189; tid 123644/101; oth 123634;
rgh106665 
> [2007-06-14 11:32:11.327 CDT] 6781 WARNING:  invalid page header in block 356562 of relation "1524219"; zeroing out
page
> [2007-06-14 11:32:11.327 CDT] 6781 CONTEXT:  xlog redo split_r: rel 1663/16386/1524219; tid 356562/58; oth 356549;
rgh34892 
> [2007-06-14 11:32:14.795 CDT] 6781 LOG:  record with zero length at 1D2/70C31890
> [2007-06-14 11:32:14.795 CDT] 6781 LOG:  redo done at 1D2/70C31868
> [2007-06-14 11:32:33.833 CDT] 6781 LOG:  database system is ready

I can potentially believe that this could be caused by blocks that were
written to, but not yet flushed at checkpoint. This could happen if the
blocks were reasonably heavily used, say as right-edge of index for two
connected tables at time of crash. I've got no diagnostics to back that
up, however detailed the logs look. Other explanations welcome.

> Could all of this be reasonably explained by the controller failure and/or the subsequent abrupt power loss, or
shouldI be looking for another cause?  Personally, as I look at this, I'm suspicious that either the controller didn't
persistdirty pages in the June 14th failure or there is some ongoing hardware problem. 

Yes. The controller failure means data loss. PostgreSQL doesn't have a
disk check utility because your data is never at risk from us when
running with full transaction guarantees (ref new feature in 8,3), but
the disk failure has meant stuff you thought was on disk wasn't really.

So your DB has holes in it and you need to recover/failover/pull-hair.

--
  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com



В списке pgsql-admin по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Explained by known hardware failures, or keep looking?
Следующее
От: Rodrigo De León
Дата:
Сообщение: Re: Postgres VS Oracle