Re: trying again to get incremental backup

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: trying again to get incremental backup
Дата
Msg-id CA+TgmoboTXthvQepDSiOUY=RAfCyOGp3f1HX4qg0WNrG4vQkWQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: trying again to get incremental backup  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Список pgsql-hackers
On Tue, Nov 14, 2023 at 8:12 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> 0001 looks OK to push, and since it stands on its own I would get it out
> of the way soon rather than waiting for the rest of the series to be
> further reviewed.

All right, done.

> 0003:
> AmWalSummarizerProcess() is unused.  Remove?

The intent seems to be to have one of these per enum value, whether it
gets used or not. Some of the others aren't used, either.

> MaybeWalSummarizer() is called on each ServerLoop() in postmaster.c?
> This causes a function call to be emitted every time through.  That
> looks odd.  All other process starts have some triggering condition.

I'm not sure how much this matters, really. I would expect that the
function call overhead here wouldn't be very noticeable. Generally I
think that when ServerLoop returns from WaitEventSetWait it's going to
be because we need to fork a process. That's pretty expensive compared
to a function call. If we can iterate through this loop lots of times
without doing any real work then it might matter, but I feel like
that's probably not the case, and probably something we would want to
fix if it were the case.

Now, I could nevertheless move some of the triggering conditions in
MaybeStartWalSummarizer(), but moving, say, just the summarize_wal
condition wouldn't be enough to avoid having MaybeStartWalSummarizer()
called repeatedly when there was no work to do, because summarize_wal
could be true and the summarizer could all be running. Similarly, if I
move just the WalSummarizerPID == 0 condition, the function gets
called repeatedly without doing anything when summarize_wal = off. So
at a minimum you have to move both of those if you care about avoiding
the function call overhead, and then you have to wonder if you care
about the corner cases where the function would be called repeatedly
for no gain even then.

Another approach would be to make the function static inline rather
than just static. Or we could delete the whole function and just
duplicate the logic it contains at both call sites. Personally I'm
inclined to just leave it how it is in the absence of some evidence
that there's a real problem here. It's nice to have all the triggering
conditions in one place with nothing duplicated.

> GetOldestUnsummarizedLSN uses while(true), but WaitForWalSummarization
> and SummarizeWAL use while(1). Maybe settle on one style?

OK.

> 0004:
> in PrepareForIncrementalBackup(), the logic that determines
> earliest_wal_range_tli and latest_wal_range_tli looks pretty weird.  I
> think it works fine if there's a single timeline, but not otherwise.
> Or maybe the trick is that it relies on timelines returned by
> readTimeLineHistory being sorted backwards?  If so, maybe add a comment
> about that somewhere; I don't think other callers of readTimeLineHistory
> make that assumption.

It does indeed rely on that assumption, and the comment at the top of
the for (i = 0; i < num_wal_ranges; ++i) loop explains that. Note also
the comment just below that begins "If we found this TLI in the
server's history". I agree with you that this logic looks strange, and
it's possible that there's some better way to do encode the idea than
what I've done here, but I think it might be just that the particular
calculation we're trying to do here is strange. It's almost easier to
understand the logic if you start by reading the sanity checks
("manifest requires WAL from initial timeline %u starting at %X/%X,
but that timeline begins at %X/%X" et. al.), look at the triggering
conditions for those, and then work upward to see how
earliest/latest_wal_range_tli get set, and then look up from there to
see how saw_earliest/latest_wal_range_tli are used in computing those
values.

We do rely on the ordering assumption elsewhere. For example, in
XLogFileReadAnyTLI, see if (tli < curFileTLI) break. We also use it to
set expectedTLEs, which is documented to have this property. And
AddWALInfoToBackupManifest relies on it too; see the comment "Because
the timeline history file lists newer timelines before older ones" in
AddWALInfoToBackupManifest. We're not entirely consistent about this,
e.g., unlike XLogFileReadAnyTLI, tliInHistory() and
tliOfPointInHistory() don't have an early exit provision, but we do
use it some places.

--
Robert Haas
EDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: pgsql: doc: fix wording describing the checkpoint_flush_after GUC
Следующее
От: Robert Haas
Дата:
Сообщение: Re: trying again to get incremental backup