> >> Um, Vadim? Still of the opinion that elog(STOP) is a good
> >> idea here? That's two people now for whom that decision has
> >> turned localized corruption into complete database failure.
> >> I don't think it's a good tradeoff.
>
> > One is able to use pg_resetxlog so I don't see point in
> > removing elog(STOP) there. What do you think?
>
> Well, pg_resetxlog would get around the symptom, but at the cost of
> possibly losing updates that are further along in the xlog than the
> update for the corrupted page. (I'm assuming that the problem here
> is a page with a corrupt LSN.) I think it's better to treat flush
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
On restart, entire content of all modified after last checkpoint pages
should be restored from WAL. In Denis case it looks like newly allocated
for update page was somehow corrupted before heapam.c:2235 (7.1.2 src)
and so there was no XLOG_HEAP_INIT_PAGE flag in WAL record => page
content was not initialized on restart. Denis reported system crash -
very likely due to memory problem.
> request past end of log as a DEBUG or NOTICE condition and keep going.
> Sure, it indicates badness somewhere, but we should try to have some
> robustness in the face of that badness. I do not see any reason why
> XLOG has to declare defeat and go home because of this condition.
Ok - what about setting some flag there on restart and abort restart
after all records from WAL applied? So DBA will have choice either
to run pg_resetxlog after that and try to dump data or restore from
old backup. I still object just NOTICE there - easy to miss it. And
in normal processing mode I'd leave elog(STOP) there.
Vadim
P.S. Further discussions will be in hackers-list, sorry.