Обсуждение: RE: A couple of fishy-looking critical sections

Поиск
Список
Период
Сортировка

RE: A couple of fishy-looking critical sections

От
"Mikheev, Vadim"
Дата:
> 1. src/backend/access/nbtree/nbtinsert.c, line 867: shouldn't this
> END_CRIT_SECTION be moved up to before the _bt_wrtbuf call?  It seems
> to me that an elog during the wrtbuf is not a critical failure.  If
> this code is correct, then all the other crit sections are wrong,
> because all of them release the crit section before writing buffers,
> not after.

Ok to move it up.

> 2. src/backend/commands/vacuum.c, line 1907: does this
> START_CRIT_SECTION really have to be here, and not down at line 1935,
> just before PageRepairFragmentation()? I really don't like the idea
> of turning those elogs that are inside the loop into reasons to force a
> system-wide restart.

Agreed.

> 3. src/backend/access/transam/xlog.c, routine CreateCheckPoint:
> does this *entire* routine need to be a critical section?  Again,
> I fear a shotgun approach will mean a net decrease in reliability,
> not an improvement.  How much of this code really has to be critical?

When postmaster has to create Checkpoint this routine is called from
bootstrap.c:BootstrapMain() - ie without normal initialization, so
I don't know result of elog(ERROR) in this case -:(
Probably I should spend more time in this area in attempt to make
Checkpoints rollback-able, but ...
Anyway it seems that the real source of elog(ERROR) there is
FlushBufferPool(). Maybe we could just initialize Warn_restart in
that point? All other places are mostly related to WAL itself
- they are more critical and less likely subject to elog(ERROR).

> Do you really want a failure in, say, MoveOfflineLogs to take down the
> whole database?

Well, this one could be changed at least (ie STOP --> LOG).

Vadim


Re: A couple of fishy-looking critical sections

От
Tom Lane
Дата:
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
>> 3. src/backend/access/transam/xlog.c, routine CreateCheckPoint:
>> does this *entire* routine need to be a critical section?  Again,
>> I fear a shotgun approach will mean a net decrease in reliability,
>> not an improvement.  How much of this code really has to be critical?

> When postmaster has to create Checkpoint this routine is called from
> bootstrap.c:BootstrapMain() - ie without normal initialization, so
> I don't know result of elog(ERROR) in this case -:(

I believe elog(ERROR) will be treated like FATAL in this case (because
Warn_restart isn't set).  So the checkpoint process will clean up and
exit, but there wouldn't be a system-wide restart were it not for the
critical section.

The question that's bothering me is whether a system-wide restart is
actually going to make things better, rather than worse, if the
checkpoint process has a problem ...
        regards, tom lane