Re: FSM corruption leading to errors

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: FSM corruption leading to errors
Дата
Msg-id CAB7nPqTnVw7TfD9n+-fLX7u+1qWEU3pYTOv0TPq1NoRp1SbhAQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: FSM corruption leading to errors  (Heikki Linnakangas <hlinnaka@iki.fi>)
Список pgsql-hackers
On Mon, Oct 17, 2016 at 8:04 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> visibilitymap_truncate is actually also wrong, in a different way. The
> truncation WAL record is written only after the VM (and FSM) are truncated.
> But visibilitymap_truncate() has already modified and dirtied the page. If
> the VM page change is flushed to disk before the WAL record, and you crash,
> you might have a torn VM page and a checksum failure.

Good point! I didn't think about that.

> Simply replacing the MarkBufferDirtyHint() call with MarkBufferDirty() in
> FreeSpaceMapTruncateRel would have the same issue. If you call
> MarkBufferDirty(), you must WAL-log the change, and also set the page's LSN
> to make sure the WAL record is flushed first.

Right.. All the code paths calling FreeSpaceMapTruncateRel or
visibilitymap_truncate do flush the record, but it definitely make
things more robust to set the LSN on the page. So +1 this way.

> I think we need something like the attached.

OK, I had a look at that.

        MarkBufferDirty(mapBuffer);
+       if (lsn != InvalidXLogRecPtr)
+           PageSetLSN(page, lsn);
        UnlockReleaseBuffer(mapBuffer);
Nit: using XLogRecPtrIsInvalid() makes more sense for such checks?

pg_visibility calls as well visibilitymap_truncate, but this patch did
not update it. And it would be better to do the actual truncate after
writing the WAL record I think.

You are also breaking XLogFlush() in RelationTruncate() because vm and
fsm are assigned thanks to a call of smgrexists(), something done
before XLogFlush is called on HEAD, and after with your patch. So your
patch finishes by never calling XLogFlush() on relation truncation
even if there is a VM or a FSM.

Attached is an updated version with those issues fixed. The final
version does not need the test as that's really a corner-case, but I
still included it to help testing for now.
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Parallel Index Scans
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Re: [COMMITTERS] pgsql: Replace PostmasterRandom() with a stronger way of generating ran