Re: trying again to get incremental backup

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: trying again to get incremental backup
Дата
Msg-id CA+TgmoaF-TnXK9D5m3m1pTj6p5u0DiyzKuPZM1pqPGKzXhQw1w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: trying again to get incremental backup  (Andres Freund <andres@anarazel.de>)
Ответы Re: trying again to get incremental backup  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Wed, Jun 14, 2023 at 3:47 PM Andres Freund <andres@anarazel.de> wrote:
> I assume this is "solely" required for keeping the incremental backups as
> small as possible, rather than being required for correctness?

I believe so. I want to spend some more time thinking about this to
make sure I'm not missing anything.

> Could we just recompute the WAL summary for the [redo, end of chunk] for the
> relevant summary file?

I'm not understanding how that would help. If we were going to compute
a WAL summary on the fly rather than waiting for one to show up on
disk, what we'd want is [end of last WAL summary that does exist on
disk, redo]. But I'm not sure that's a great approach, because that
LSN gap might be large and then we're duplicating a lot of work that
the summarizer has probably already done most of.

> FWIW, I like the idea of a special WAL record at that point, independent of
> this feature. It wouldn't be a meaningful overhead compared to the cost of a
> checkpoint, and it seems like it'd be quite useful for debugging. But I can
> see uses going beyond that - we occasionally have been discussing associating
> additional data with redo points, and that'd be a lot easier to deal with
> during recovery with such a record.
>
> I don't really see a performance and concurrency angle right now - what are
> you wondering about?

I'm not really sure. I expect Dilip would be happy to post his patch,
and if you'd be willing to have a look at it and express your concerns
or lack thereof, that would be super valuable.

> > Another thing that I'm not too sure about is: what happens if we find
> > a relation file on disk that doesn't appear in the backup_manifest for
> > the previous backup and isn't mentioned in the WAL summaries either?
>
> Wouldn't that commonly happen for unlogged relations at least?
>
> I suspect there's also other ways to end up with such additional files,
> e.g. by crashing during the creation of a new relation.

Yeah, this needs some more careful thought.

> > A few less-serious problems with the patch:
> >
> > - We don't have an incremental JSON parser, so if you have a
> > backup_manifest>1GB, pg_basebackup --incremental is going to fail.
> > That's also true of the existing code in pg_verifybackup, and for the
> > same reason. I talked to Andrew Dunstan at one point about adapting
> > our JSON parser to support incremental parsing, and he had a patch for
> > that, but I think he found some problems with it and I'm not sure what
> > the current status is.
>
> As a stopgap measure, can't we just use the relevant flag to allow larger
> allocations?

I'm not sure that's a good idea, but theoretically, yes. We can also
just choose to accept the limitation that your data directory can't be
too darn big if you want to use this feature. But getting incremental
JSON parsing would be better.

Not having the manifest in JSON would be an even better solution, but
regrettably I did not win that argument.

> That seems like a feature for the future...

Sure.

> I don't know the tar format well, but my understanding is that it doesn't have
> a "central metadata" portion. I.e. doing something like this would entail
> scanning the tar file sequentially, skipping file contents?  And wouldn't you
> have to create an entirely new tar file for the modified output? That kind of
> makes it not so incremental ;)
>
> IOW, I'm not sure it's worth bothering about this ever, and certainly doesn't
> seem worth bothering about now. But I might just be missing something.

Oh, yeah, it's just an idle thought. I'll get to it when I get to it,
or else I won't.

--
Robert Haas
EDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Let's make PostgreSQL multi-threaded
Следующее
От: Matthias van de Meent
Дата:
Сообщение: Re: trying again to get incremental backup