Re: POC: Cleaning up orphaned files using undo logs

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: POC: Cleaning up orphaned files using undo logs
Дата
Msg-id CA+hUKG+MgYoOn55gG9ppw8=vvEazGWQLtZ2kkPBZ9gN22ki1wQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: POC: Cleaning up orphaned files using undo logs  (Kuntal Ghosh <kuntalghosh.2007@gmail.com>)
Ответы Re: POC: Cleaning up orphaned files using undo logs  (Kuntal Ghosh <kuntalghosh.2007@gmail.com>)
Список pgsql-hackers
On Mon, Sep 16, 2019 at 5:27 AM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote:
> While testing zheap over undo apis, we've found the following
> issues/scenarios that might need some fixes/discussions:

Thanks!

> 1. In UndoLogAllocateInRecovery, when we find the current log number
> from the list of registered blocks, we don't check whether the
> block->in_use flag is true or not. In XLogResetInsertion, we just
> reset in_use flag without reseting the blocks[]->rnode information.
> So, if we don't check the in_use flag, it's possible that we'll
> consult some block information from the previous WAL record. IMHO,
> just adding an in_use check in UndoLogAllocateInRecovery will solve
> the problem.

Agreed.  I added a line to break out of that loop if !block->in_use.

BTW I am planning to simplify that code considerably, based on a plan
to introduce a new rule: there can be only one undo record and
therefore only one undo allocation per WAL record.

> 2. A transaction, inserts one undo record and generated a WAL record
> for the same, say at WAL location 0/2000A000. Next, the undo record
> gets discarded and WAL is generated to update the meta.discard pointer
> at location 0/2000B000  At the same time, an ongoing checkpoint with
> checkpoint.redo at 0/20000000 flushes the latest meta.discard pointer.
> Now, the system crashes.
> Now, the recovery starts from the location 0/20000000. When the
> recovery of 0/2000A000 happens, it sees the undo record that it's
> about to insert, is already discarded as per meta.discard (flushed by
> checkpoint). In this case, should we just skip inserting the undo
> record?

I see two options:

1.  We make it so that if you're allocating in recovery and discard >
insert, we'll just set discard = insert so you can proceed.  The code
in undofile_get_segment_file() already copes with missing files during
recovery.

2.  We skip the insert as you said.

I think option 1 is probably best, otherwise you have to cope with
failure to insert by skipping, as you said.

> 3. Currently, we create a backup image of the unlogged part of the
> undo log's metadata only when some backend allocates some space from
> the undo log (in UndoLogAllocate). This helps us restore the unlogged
> meta part after a checkpoint.
> When we perform an undo action, we also update the undo action
> progress and emit an WAL record. The same operation can performed by
> the undo worker which doesn't allocate any space from the undo log.
> So, if an undo worker emits an WAL record to update undo action
> progress after a checkpoint, it'll not be able to WAL log the backup
> image of the meta unlogged part. IMHO, this breaks the recovery logic
> of unlogged part of undo meta.

I thought that was OK because those undo data updates don't depend on
the insert pointer.  But I see what you mean: the next modification of
the page that DOES depend on the insert pointer might not log the
meta-data if it's not the first WAL record to touch it after a
checkpoint.  Rats.  I'll have to think about that some more.

-- 
Thomas Munro
https://enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: nilsocket
Дата:
Сообщение:
Следующее
От: nilsocket
Дата:
Сообщение: Re: