> 1. src/backend/access/nbtree/nbtinsert.c, line 867: shouldn't this
> END_CRIT_SECTION be moved up to before the _bt_wrtbuf call? It seems
> to me that an elog during the wrtbuf is not a critical failure. If
> this code is correct, then all the other crit sections are wrong,
> because all of them release the crit section before writing buffers,
> not after.
Ok to move it up.
> 2. src/backend/commands/vacuum.c, line 1907: does this
> START_CRIT_SECTION really have to be here, and not down at line 1935,
> just before PageRepairFragmentation()? I really don't like the idea
> of turning those elogs that are inside the loop into reasons to force a
> system-wide restart.
Agreed.
> 3. src/backend/access/transam/xlog.c, routine CreateCheckPoint:
> does this *entire* routine need to be a critical section? Again,
> I fear a shotgun approach will mean a net decrease in reliability,
> not an improvement. How much of this code really has to be critical?
When postmaster has to create Checkpoint this routine is called from
bootstrap.c:BootstrapMain() - ie without normal initialization, so
I don't know result of elog(ERROR) in this case -:(
Probably I should spend more time in this area in attempt to make
Checkpoints rollback-able, but ...
Anyway it seems that the real source of elog(ERROR) there is
FlushBufferPool(). Maybe we could just initialize Warn_restart in
that point? All other places are mostly related to WAL itself
- they are more critical and less likely subject to elog(ERROR).
> Do you really want a failure in, say, MoveOfflineLogs to take down the
> whole database?
Well, this one could be changed at least (ie STOP --> LOG).
Vadim