Re: Verified fix for Bug 4137

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Verified fix for Bug 4137
Дата
Msg-id 4820C4C7.3020306@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Verified fix for Bug 4137  (Simon Riggs <simon@2ndquadrant.com>)
Ответы Re: Verified fix for Bug 4137  (Simon Riggs <simon@2ndquadrant.com>)
Список pgsql-patches
Simon Riggs wrote:
> On Tue, 2008-05-06 at 17:52 +0100, Heikki Linnakangas wrote:
>> I didn't suggest that alphabetical sorting property is a new assumption;
>> it sure isn't. The new assumption is that you never call ReadRecord()
>> for record 0002 before you call it for record 0001 (before initializing
>> the checkPointCopy field from the checkpoint record, anyway).
>
> I've not done anything with the ordering of calls to ReadRecord(). There
> is no such assumption introduced here.

Hmm, I see that I was inaccurate when I said the patch introduces that
assumption, sorry about the confusion. It's more like the code is
currently completely oblivious of the issue, and the patch makes it
better but doesn't fully fix it.

The code in CVS is broken, as we now know, because it assumes that we
can delete all files < checkPointCopy.redo, which is not true when
checkPointCopy.redo has not yet been initialized from the backup label
file, and points to a location that's ahead of the real safe truncation
point. The patch makes it better, and the assumption is now that it's
safe to truncate everything < min(checkPointCopy.redo, xlog file we're
reading). That still seems too fragile to me, because we assume that
after you've asked for xlog record X, we never need anything earlier
then that.

In fact, what will happen if the checkpoint record's redo pointer points
to an earlier xlog file:

1. The location of the checkpoint record is read by read_backup_label().
Let's say that it's 0005.
2. ReadCheckpointRecord() is called for 0005. The restore command is
called because that xlog file is not present. The safe truncation point
is determined to be 0005, as that's what we're reading. Everything
before that is truncated
3. The redo pointer in the checkpoint record points to 0003. That's
where we should start the recovery. Oops :-(

I haven't tested this, so, am I missing something that makes that not
happen?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

В списке pgsql-patches по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: Verified fix for Bug 4137
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Snapshot management, final