Re: backup manifests

Поиск
Список
Период
Сортировка
От David Steele
Тема Re: backup manifests
Дата
Msg-id 4b6244ea-4094-9055-080b-7afb274bf270@pgmasters.net
обсуждение исходный текст
Ответ на backup manifests  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: backup manifests  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi Robert,

On 9/18/19 1:48 PM, Robert Haas wrote:
> That whole approach might still be dead on
> arrival if it's possible to add new blocks with old LSNs to existing
> files,[7] but there seems to be room to hope that there are no such
> cases.[8]

I sure hope there are no such cases, but we should be open to the idea
just in case.

> So, let's suppose we invent a backup manifest. What should it contain?
> I imagine that it would consist of a list of files, and the lengths of
> those files, and a checksum for each file. 

These are essential.

Also consider adding the timestamp.  You have justifiable concerns about
using timestamps for deltas and I get that.  However, there are a number
of methods that can be employed to make it *much* safer.  I won't go
into that here since it is an entire thread in itself.  Suffice to say
we can detect many anomalies in the timestamps and require a checksum
backup when we see them.  I'm really interested in scanning the WAL for
changed files but that method is very complex and getting it right might
be harder than ensuring FS checksums are reliable.  Still worth trying,
though, since the benefits are enormous.  We are planning to use
timestamp + size + wal data to do incrementals if we get there.

Consider adding a reference to each file that specifies where the file
can be found in if it is not in this backup.  As I understand the
pg_basebackup proposal, it would only be implementing differential
backups, i.e. an incremental that is *only* based on the last full
backup.  So, the reference can be inferred in this case.  However, if
the user selects the wrong full backup on restore, and we have labeled
each backup, then a differential restore with references against the
wrong full backup would result in a hard error rather than corruption.

> I think you should have a
> choice of what kind of checksums to use, because algorithms that used
> to seem like good choices (e.g. MD5) no longer do; this trend can
> probably be expected to continue. Even if we initially support only
> one kind of checksum -- presumably SHA-something since we have code
> for that already for SCRAM -- I think that it would also be a good
> idea to allow for future changes. And maybe it's best to just allow a
> choice of SHA-224, SHA-256, SHA-384, and SHA-512 right out of the
> gate, so that we can avoid bikeshedding over which one is secure
> enough. I guess we'll still have to argue about the default. 

Based on my original calculations (which sadly I don't have anymore),
the combination of SHA1, size, and file name is *extremely* unlikely to
generate a collision.  As in, unlikely to happen before the end of the
universe kind of unlikely.  Though, I guess it depends on your
expectations for the lifetime of the universe.

These checksums don't have to be cryptographically secure, in the sense
that you could infer the plaintext from the checksum.  They just need to
have a suitably low collision rate.  These days I would choose something
with more bits because the computation time is similar, though the
larger size requires more storage.

> I also
> think that it should be possible to build a manifest with no
> checksums, so that one need not pay the overhead of computing
> checksums if one does not wish. 

Our benchmarks have indicated that checksums only account for about 1%
of total cpu time when gzip -6 compression is used.  Without compression
the percentage may be higher of course, but in that case we find network
latency is the primary bottleneck.

For S3 backups we do a SHA1 hash for our manifest, a SHA256 hash for
authv4 and a good-old-fashioned MD5 checksum for each upload part.  This
is barely noticeable when compression is enabled.

> Of course, such a manifest is of much
> less utility for checking backup integrity, but you can still check
> that you've got the right files, which is noticeably better than
> nothing.  

Absolutely -- and yet.  There was a time when we made checksums optional
but eventually gave up on that once we profiled and realized how low the
cost was vs. the benefit.

> The manifest should probably also contain a checksum of its
> own contents so that the integrity of the manifest itself can be
> verified. 

This is a good idea.  Amazingly we've never seen a manifest checksum
error in the field but it's only a matter of time.

And maybe a few other bits of metadata, but I'm not sure
> exactly what.  Ideas?

A backup label for sure.  You can also use this as the directory/tar
name to save the user coming up with one.  We use YYYYMMDDHH24MMSSF for
full backups and YYYYMMDDHH24MMSSF_YYYYMMDDHH24MMSS(D|I) for
incrementals and have logic to prevent two backups from having the same
label.  This is unlikely outside of testing but still a good idea.

Knowing the start/stop time of the backup is useful in all kinds of
ways, especially monitoring and time-targeted PITR.  Start/stop LSN is
also good.  I know this is also in backup_label but having it all in one
place is nice.

We include the version/sysid of the cluster to avoid mixups.  It's a
great extra check on top of references to be sure everything is kosher.

A manifest version is good in case we change the format later.  I'd
recommend JSON for the format since it is so ubiquitous and easily
handles escaping which can be gotchas in a home-grown format.  We
currently have a format that is a combination of Windows INI and JSON
(for human-readability in theory) and we have become painfully aware of
escaping issues.  Really, why would you drop files with '=' in their
name in PGDATA?  And yet it happens.

> Once we invent the concept of a backup manifest, what do we need to do
> with them? I think we'd want three things initially:
> 
> (1) When taking a backup, have the option (perhaps enabled by default)
> to include a backup manifest.

Manifests are cheap to builds so I wouldn't make it an option.

> (2) Given an existing backup that has not got a manifest, construct one.

Might be too late to be trusted and we'd have to write extra code for
it.  I'd leave this for a project down the road, if at all.

> (3) Cross-check a manifest against a backup and complain about extra
> files, missing files, size differences, or checksum mismatches.

Verification is the best part of the manifest.  Plus, you can do
verification pretty cheaply on restore.  We also restore pg_control last
so clusters that have a restore error won't start.

> One thing I'm not quite sure about is where to store the backup
> manifest. If you take a base backup in tar format, you get base.tar,
> pg_wal.tar (unless -Xnone), and an additional tar file per tablespace.
> Does the backup manifest go into base.tar? Get written into a separate
> file outside of any tar archive? Something else? And what about a
> plain-format backup? I suppose then we should just write the manifest
> into the top level of the main data directory, but perhaps someone has
> another idea.

We do:

[backup_label]/
    backup.manifest
    pg_data/
    pg_tblspc/

In general, having the manifest easily accessible is ideal.

Regards,
-- 
-David
david@pgmasters.net



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Proposal: Add more compile-time asserts to exposeinconsistencies.
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] [PATCH] pageinspect function to decode infomasks