Re: block-level incremental backup

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: block-level incremental backup
Дата
Msg-id CA+TgmoYg9i8TZhyjf8MqCyU8unUVuW+03FeBF1LGDu_-eOONag@mail.gmail.com
обсуждение исходный текст
Ответ на Re: block-level incremental backup  (Stephen Frost <sfrost@snowman.net>)
Ответы Re: block-level incremental backup  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-hackers
On Mon, Sep 16, 2019 at 10:38 AM Stephen Frost <sfrost@snowman.net> wrote:
> In a number of cases, trying to make sure that on a failover or copy of
> the backup the next 'incremental' is really an 'incremental' is
> dangerous.  A better strategy to address this, and the other issues
> realized on this thread recently, is to:
>
> - Have a manifest of every file in each backup
> - Always back up new files that weren't in the prior backup
> - Keep a checksum of each file
> - Track the timestamp of each file as of when it was backed up
> - Track the file size of each file
> - Track the starting timestamp of each backup
> - Always include files with a modification time after the starting
>   timestamp of the prior backup, or if the file size has changed
> - In the event of any anomolies (which includes things like a timeline
>   switch), use checksum matching (aka 'delta checksum backup') to
>   perform the backup instead of using timestamps (or just always do that
>   if you want to be particularly careful- having an option for it is
>   great)
> - Probably other things I'm not thinking of off-hand, but this is at
>   least a good start.  Make sure to checksum this information too.

I agree with some of these ideas but not all of them.  I think having
a backup manifest is a good idea; that would allow taking a new
incremental backup to work from the manifest rather than the data
directory, which could be extremely useful, because it might be a lot
faster and the manifest could also be copied to a machine other than
the one where the entire backup is stored. If the backup itself has
been pushed off to S3 or whatever, you can't access it quickly, but
you could keep the manifest around.

I also agree that backing up all files that weren't in the previous
backup is a good strategy.  I proposed that fairly explicitly a few
emails back; but also, the contrary is obviously nonsense. And I also
agree with, and proposed, that we record the size along with the file.

I don't really agree with your comments about checksums and
timestamps.  I think that, if possible, there should be ONE method of
determining whether a block has changed in some important way, and I
think if we can make LSN work, that would be for the best. If you use
multiple methods of detecting changes without any clearly-defined
reason for so doing, maybe what you're saying is that you don't really
believe that any of the methods are reliable but if we throw the
kitchen sink at the problem it should come out OK. Any bugs in one
mechanism are likely to be masked by one of the others, but that's not
as as good as one method that is known to be altogether reliable.

> By having a manifest for each backed up file for each backup, you also
> gain the ability to validate that a backup in the repository hasn't been
> corrupted post-backup, a feature that at least some other database
> backup and restore systems have (referring specifically to the big O in
> this particular case, but I bet others do too).

Agreed. The manifest only lets you validate to a limited extent, but
that's still useful.

> Having a system of keeping track of which backups are full and which are
> differential in an overall system also gives you the ability to do
> things like expiration in a sensible way, including handling WAL
> expiration.

True, but I'm not sure that functionality belongs in core. It
certainly needs to be possible for out-of-core code to do this part of
the work if desired, because people want to integrate with enterprise
backup systems, and we can't come in and say, well, you back up
everything else using Netbackup or Tivoli, but for PostgreSQL you have
to use pg_backrest. I mean, maybe you can win that argument, but I
know I can't.

> I'd like to clarify that while I would like to have an easier way to
> parallelize backups, that's a relatively minor complaint- the much
> bigger issue that I have with this feature is that trying to address
> everything correctly while having only the amount of information that
> could be passed on the command-line about the prior full/incremental is
> going to be extremely difficult, complicated, and likely to lead to
> subtle bugs in the actual code, and probably less than subtle bugs in
> how users end up using it, since they'll have to implement the
> expiration and tracking of information between backups themselves
> (unless something's changed in that part during this discussion- I admit
> that I've not read every email in this thread).

Well, the evidence seems to show that you are right, at least to some
extent. I consider it a positive good if the client needs to give the
server only a limited amount of information. After all, you could
always take an incremental backup by shipping every byte of the
previous backup to the server, having it compare everything to the
current contents, and having it then send you back the stuff that is
new or different. But that would be dumb, because most of the point of
an incremental backup is to save on sending lots of data over the
network unnecessarily. Now, it seems that I took that goal to an
unhealthy extreme, because as we've now realized, sending only an LSN
and nothing else isn't enough to get a correct backup. So we need to
send more, and it doesn't have to be the absolutely most
stripped-down, bear-bones version of what could be sent. But it should
be fairly minimal, I think; that's kinda the point of the feature.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: block-level incremental backup
Следующее
От: Alexey Kondratov
Дата:
Сообщение: Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions