Re: Online enabling of checksums

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Online enabling of checksums
Дата
Msg-id 20180406235126.d4sg4dtgicdpucnj@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Online enabling of checksums  (Daniel Gustafsson <daniel@yesql.se>)
Список pgsql-hackers
On 2018-04-07 01:27:13 +0200, Daniel Gustafsson wrote:
> > On 07 Apr 2018, at 01:13, Andres Freund <andres@anarazel.de> wrote:
> > 
> > On 2018-04-07 01:04:50 +0200, Daniel Gustafsson wrote:
> >>> I'm fairly certain that the bug here is a simple race condition in the
> >>> test (not the main code!):
> >> 
> >> I wonder if it may perhaps be a case of both?
> > 
> > See my other message about the atomic fallback bit.
> 
> Yep, my MUA pulled it down just as I had sent this.  Thanks for confirming my
> suspicion.

But note it fails because the code using it is WRONG. There's a reason
there's "unlocked" in the name. But even leaving that aside, it probably
*still* be wrong if it were locked.

It seems *extremely* dubious that we'll allow to re-enable the checksums
while a worker is still doing stuff for the old cycle in the
background.  Consider what happens if the checksum helper is currently
doing RequestCheckpoint() (something that can certainly take a *LONG*)
while. Another process disables checksums. Pages get written out
without checksums. Yet another process re-enables checksums. Helper
process does SetDataChecksumsOn(). Which succeeds because

    if (ControlFile->data_checksum_version != PG_DATA_CHECKSUM_INPROGRESS_VERSION)
    {
        LWLockRelease(ControlFileLock);
        elog(ERROR, "Checksums not in inprogress mode");
    }

succeeds.  Boom. Cluster with partially set checksums but marked as
valid.


Greetings,

Andres Freund


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Chapman Flack
Дата:
Сообщение: [PATCH] Update README for Resource Owners
Следующее
От: Andres Freund
Дата:
Сообщение: Re: [PATCH] Update README for Resource Owners