Re: New WAL record to detect the checkpoint redo location

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: New WAL record to detect the checkpoint redo location
Дата	18 сентября 2023 г. 18:57:54
Msg-id	CA+TgmoYy-Vc6G9QKcAKNksCa29cv__czr+N9X_QCxEfQVpp_8w@mail.gmail.com обсуждение исходный текст
Ответ на	Re: New WAL record to detect the checkpoint redo location (Andres Freund <andres@anarazel.de>)
Ответы	Re: New WAL record to detect the checkpoint redo location
Список	pgsql-hackers

Дерево обсуждения

On Fri, Jul 14, 2023 at 11:16 AM Andres Freund <andres@anarazel.de> wrote:
> I suspect we might be able to get rid of the need for exclusive inserts
> here. If we rid of that, we could determine the redoa location based on the
> LSN determined by the XLogInsert().

I've been brainstorming about this today, trying to figure out some
ideas to make it work.

As Michael Paquier correctly noted downthread, we need to make sure
that a backend inserting a WAL record knows whether it needs to
contain an FPI. The comments in the do...while loop in XLogInsert are
pretty helpful here: doPageWrites can't change once XLogInsertRecord
acquires a WAL insertion lock. For that to be true, the redo pointer
can only move when holding all WAL insertion locks. That means that if
we add an XLOG_CHECKPOINT_REDO to mark the location of the redo
pointer, we've got to either (a) insert the record *and* update our
notion of the last redo pointer while holding all the WAL insertion
locks or (b) change the concurrency model in some way.

Let's explore (b) first. Perhaps my imagination is too limited here,
but I'm having trouble seeing a good way of doing this. One idea that
occurred to me was to make either the insertion of the
XLOG_CHECKPOINT_REDO record fail softly if somebody inserts a record
after it that omits FPIs, but that doesn't really work because then
we're left with a hole in the WAL. It's too late to move the later
record earlier. We could convert the intended XLOG_CHECKPOINT_REDO
record into a dummy record but that seems complex and wasteful.
Similarly, you could try to make the insertion of the later record
fail, but that basically has the same problem: there could be an even
later record being inserted after that which it's already too late to
reposition. Basically, it feels like once we get to the point where we
have a range of LSNs and we're copying data into wal_buffers, it's
awfully late to be trying to back out. Other people can already be
depending on us to put the amount of WAL that we promised to insert at
the place where we promised to put it.

The only other approach to (b) that I can think of is to force FPIs on
for all backends from just before to just after we insert the
XLOG_CHECKPOINT_REDO record. However, since we currently require
taking all the WAL insertion locks to start requiring full page
writes, this doesn't seem like it gains much. In theory perhaps we
could have an approach where we flip full page writes to sorta-on,
then wait until we've seen each WAL insertion lock unheld at least
once, and then at that point we know all new WAL insertions will see
them and can deem them fully on. However, when I start to think along
these lines, I feel like maybe I'm losing the plot. Checkpoints are
rare enough that the overhead of taking all the WAL insertion locks at
the same time isn't really a big problem, or at least I don't think it
is. I think the concern here is more about avoiding useless branches
in hot paths that potentially cost something for every single record
whether it has anything to do with this mechanism or not.

OK, so let's suppose we abandon the idea of changing the concurrency
model in any fundamental way and just try to figure out how to both
insert the record and update our notion of the last redo pointer while
holding all the WAL insertion locks i.e. (a) from the two options
above. Dilip's patch approaches this problem by pulling acquisition of
the WAL insertion locks up to the place where we're already setting
the redo pointer. I wonder if we could also consider the other
possible approach of pushing the update to Insert->RedoRecPtr down
into XLogInsertRecord(), which already has a special case for
acquiring all locks when the record being inserted is an XLOG_SWITCH
record. That would have the advantage of holding all of the insertion
locks for a smaller period of time than what Dilip's patch does -- in
particular, it wouldn't need to hold the lock across the
XLOG_CHECKPOINT_REDO's XLogRecordAssemble -- or across the rather
lengthy tail of XLogInsertRecord. But the obvious objection is that it
would put more branches into XLogInsertRecord which nobody wants.

But ... what if it didn't? Suppose we pulled the XLOG_SWITCH case out
of XLogInsertRecord and made a separate function for that case. It
looks to me like that would avoid 5 branches in that function in the
normal case where we're not inserting XLOG_SWITCH. We would want to
move some logic, particularly the WAL_DEBUG stuff and maybe other
things, into reusable subroutines. Then, we could decide to treat
XLOG_CHECKPOINT_REDO either in the normal path -- adding a couple of
those branches back again -- or in the XLOG_SWITCH function and either
way I think the normal path would have fewer branches than it does
today. One idea that I had was to create a new rmgr for "weird
records," initially XLOG_SWITCH and XLOG_CHECKPOINT_REDO. Then the
test as to whether to use the "normal" version of XLogInsertRecord or
the "weird" version could just be based on the rmid, and the "normal"
function wouldn't need to concern itself with anything specific to the
"weird" cases.

A variant on this idea would be to just accept a few extra branches
and hope it's not really that big of a deal. For instance, instead of
this:

    bool        isLogSwitch = (rechdr->xl_rmid == RM_XLOG_ID &&
                               info == XLOG_SWITCH);

We could have this:

bool isAllLocks = (rechdr->xl_rmid == RM_BIZARRE_ID);
bool isLogSwitch = (isAllLocks && info == XLOG_SWITCH);

...and then conditionalize on either isAllLocks or isLogSwitch as
apppropriate. You'd still need an extra branch someplace to update
Insert->RedoRecPtr when isAllLocks && info == XLOG_CHECKPOINT_REDO,
but maybe that's not so bad?

> Alternatively, I think we should split XLogInsertRecord() so that the part
> with the insertion locks held is a separate function, that we could use here.

The difficulty that I see with this is that the function does a lot
more stuff after calling WALInsertLockRelease(). So just pushing the
part that's between acquiring and releasing WAL insertion locks into a
separate function wouldn't actually avoid a lot of code duplication,
if the goal was to do everything else that XLogInsertRecord() does
except for the lock manipulation. To get there, I think we'd need to
move all of the stuff after the lock release into one or more static
functions, too. Which is possibly an OK approach. I haven't checked
how much additional parameter passing we'd end up doing if we went
that way.

--
Robert Haas
EDB: http://www.enterprisedb.com

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: New WAL record to detect the checkpoint redo location