Re: block-level incremental backup

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: block-level incremental backup
Дата
Msg-id 20190418223946.GK6197@tamriel.snowman.net
обсуждение исходный текст
Ответ на Re: block-level incremental backup  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: block-level incremental backup  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Greetings,

Ok, responding to the rest of this email.

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Wed, Apr 17, 2019 at 6:43 PM Stephen Frost <sfrost@snowman.net> wrote:
> > Sadly, I haven't got any great ideas today.  I do know that the WAL-G
> > folks have specifically mentioned issues with the visibility map being
> > large enough across enough of their systems that it kinda sucks to deal
> > with.  Perhaps we could do something like the rsync binary-diff protocol
> > for non-relation files?  This is clearly just hand-waving but maybe
> > there's something reasonable in that idea.
>
> I guess it all comes down to how complicated you're willing to make
> the client-server protocol.  With the very simple protocol that I
> proposed -- client provides a threshold LSN and server sends blocks
> modified since then -- the client need not have access to the old
> incremental backup to take a new one.

Where is the client going to get the threshold LSN from?

> Of course, if it happens to
> have access to the old backup then it can delta-compress however it
> likes after-the-fact, but that doesn't help with the amount of network
> transfer.

If it doesn't have access to the old backup, then I'm a bit confused as
to how a incremental backup would be possible?  Isn't that a requirement
here?

> That problem could be solved by doing something like what
> you're talking about (with some probably-negligible false match rate)
> but I have no intention of trying to implement anything that
> complicated, and I don't really think it's necessary, at least not for
> a first version.  What I proposed would already allow, for most users,
> a large reduction in transfer and storage costs; what you are talking
> about here would help more, but also be a lot more work and impose
> some additional requirements on the system.  I don't object to you
> implementing the more complex system, but I'll pass.

I was talking about the rsync binary-diff specifically for the files
that aren't easy to deal with in the WAL stream.  I wouldn't think we'd
use it for other files, and there is definitely a question there of if
there's a way to do better than a binary-diff approach for those files.

> > There's something like 6 different backup tools, at least, for
> > PostgreSQL that provide backup management, so I have a really hard time
> > agreeing with this idea that users don't want a PG backup management
> > system.  Maybe that's not what you're suggesting here, but that's what
> > came across to me.
>
> Let me be a little more clear.  Different users want different things.
> Some people want a canned PostgreSQL backup solution, while other
> people just want access to a reasonable set of facilities from which
> they can construct their own solution.  I believe that the proposal I
> am making here could be used either by backup tool authors to enhance
> their offerings, or by individuals who want to build up their own
> solution using facilities provided by core.

The last thing that I think users really want it so build up their own
solution.  There may be some organizations who would like to provide
their own tool, but that's a bit different.  Personally, I'd *really*
like PG to have a good tool in this area and I've been working, as I've
said before, to try to get to a point where we at least have the option
to add in such a tool that meets our various requirements.

Further, I'm concerned that the approach being presented here won't be
interesting to most of the external tools because it's limited and can't
be used in a parallel fashion.

> > Unless maybe I'm misunderstanding and what you're suggesting here is
> > that the "existing solution" is something like the external PG-specific
> > backup tools?  But then the rest doesn't seem to make sense, as only
> > maybe one or two of those tools use pg_basebackup internally.
>
> Well, what I'm really talking about is in two pieces: providing some
> new facilities via the replication protocol, and making pg_basebackup
> able to use those facilities.  Nothing would stop other tools from
> using those facilities directly if they wish.

If those facilities are developed and implemented in the same way as the
protocol used by pg_basebackup works, then I strongly suspect that the
existing backup tools will treat it similairly- which is to say, they'll
largely end up ignoring it.

> > ... but this is exactly the situation we're in already with all of the
> > *other* features around backup (parallel backup, backup management, WAL
> > management, etc).  Users want those features, pg_basebackup/PG core
> > doesn't provide it, and therefore there's a bunch of other tools which
> > have been written that do.  In addition, saying that PG has incremental
> > backup but no built-in management of those full-vs-incremental backups
> > and telling users that they basically have to build that themselves
> > really feels a lot like we're trying to address a check-box requirement
> > rather than making something that our users are going to be happy with.
>
> I disagree.  Yes, parallel backup, like incremental backup, needs to
> go in core.  And pg_basebackup should be able to do a parallel backup.
> I will fight tooth, nail, and claw any suggestion that the server
> should know how to do a parallel backup but pg_basebackup should not
> have an option to exploit that capability.  And similarly for
> incremental.

These aren't independent things though, the way it seems like you're
portraying them, because there are ways we can implement incremental
backup that would support it being parallelized, and ways we can
implement it that wouldn't work with parallelism at all, and all I'm
argueing for is that we add in this feature in a way that it can be
parallelized (since that's what most of the external tools do today...),
even though pg_basebackup can't be, but in a way that pg_basebackup can
also use it (albeit in a serial fashion).

> > I don't think that I was very clear in what my specific concern here
> > was.  I'm not asking for pg_basebackup to have parallel backup (at
> > least, not in this part of the discussion), I'm asking for the
> > incremental block-based protocol that's going to be built-in to core to
> > be able to be used in a parallel fashion.
> >
> > The existing protocol that pg_basebackup uses is basically, connect to
> > the server and then say "please give me a tarball of the data directory"
> > and that is then streamed on that connection, making that protocol
> > impossible to use for parallel backup.  That's fine as far as it goes
> > because only pg_basebackup actually uses that protocol (note that nearly
> > all of the other tools for doing backups of PostgreSQL don't...).  If
> > we're expecting the external tools to use the block-level incremental
> > protocol then that protocol really needs to have a way to be
> > parallelized, otherwise we're just going to end up with all of the
> > individual tools doing their own thing for block-level incremental
> > (though perhaps they'd reimplement whatever is done in core but in a way
> > that they could parallelize it...), if possible (which I add just in
> > case there's some idea that we end up in a situation where the
> > block-level incremental backup has to coordinate with the backend in
> > some fashion to work...  which would mean that *everyone* has to use the
> > protocol even if it isn't parallel and that would be really bad, imv).
>
> The obvious way of extending this system to parallel backup is to have
> N connections each streaming a separate tarfile such that when you
> combine them all you recreate the original data directory.  That would
> be perfectly compatible with what I'm proposing for incremental
> backup.  Maybe you have another idea in mind, but I don't know what it
> is exactly.

So, while that's an obvious approach, it isn't the most sensible- and
we know that from experience in actually implementing parallel backup of
PG files.  I'm happy to discuss the approach we use in pgBackRest if
you'd like to discuss this further, but it seems a bit far afield from
the topic of discussion here and it seems like you're not interested or
offering to work on supporting parallel backup in core.

I don't think what you're proposing here wouldn't, technically, work for
the various external tools, what I'm saying is that they aren't going to
actually use it, which means that you're really implementing it *only*
for pg_basebackup's benefit... and only for as long as pg_basebackup is
serial in nature.

> > > Wait, you want to make it maximally easy for users to start the server
> > > in a state that is 100% certain to result in a corrupted and unusable
> > > database?  Why?? I'd l like to make that a tiny bit difficult.  If
> > > they really want a corrupted database, they can remove the file.
> >
> > No, I don't want it to be easy for users to start the server in a state
> > that's going to result in a corrupted cluster.  That's basically the
> > complete opposite of what I was going for- having a file that can be
> > trivially removed to start up the cluster is *going* to result in people
> > having corrupted clusters, no matter how much we tell them "don't do
> > that".  This is exactly the problem with have with backup_label today.
> > I'd really rather not double-down on that.
>
> Well, OK, but short of scanning the entire directory tree on startup,
> I don't see how to achieve that.

Ok, so, this is a bit of spit-balling, just to be clear, but we
currently track things like "where we know the heap files are
consistant" by storing it in the control file as a checkpoint LSN, and
then we have a backup_label file to say where we need to get to in order
to be consistent from a backup.  Perhaps there's a way to use those to
cross-validate while we are updating a data directory to be consistent?
Maybe we update those files as we go, and add a cross-check flag between
them, so that we know from two places that we're restoring from a backup
(incremental or full), and then also know where we need to start from
and where we need to get to, in order to be conistant.

Of course, users can still get past this by hacking these files around
and maybe we can provide a tool along the lines of pg_resetwal which
lets them force the files to agree, but then we can at least throw big
glaring warnings and tell users "this is really bad, type YES to
continue".

> > There's really two things here- the first is that I agree with the
> > concern about potentially destorying the existing backup if the
> > pg_basebackup doesn't complete, but there's some ways to address that
> > (such as filesystem snapshotting), so I'm not sure that the idea is
> > quite that bad, but it would need to be more than just what
> > pg_basebackup does in this case in order to be trustworthy (at least,
> > for most).
>
> Well, I did mention in my original email that there could be a
> combine-backups-destructively option.  I guess this is just taking
> that to the next level: merge a backup being taken into an existing
> backup on-the-fly.  Given you remarks above, it is worth noting that
> this GREATLY increases the chances of people accidentally causing
> corruption in ways that are almost undetectable.  All they have to do
> is kill -9 the backup tool half way through and then start postgres on
> the resulting directory.

Right, we need to come up with a way to detect if that happens and
complain loudly, and not continue to move forward unless and until the
user explicitly insists that it's the right thing to do.

> > The other part here is the idea of endless incrementals where the blocks
> > which don't appear to have changed are never re-validated against what's
> > in the backup.  Unfortunately, latent corruption happens and you really
> > want to have a way to check for that.  In past discussions that I've had
> > with David, there's been some idea to check some percentage of the
> > blocks that didn't appear to change for each backup against what's in
> > the backup.
>
> Sure, I'm not trying to block anybody from developing something like
> that, and I acknowledge that there is risk in a system like this,
> but...
>
> > I share this just to point out that there's some risk to that approach,
> > not to say that we shouldn't do it or that we should discourage the
> > development of such a feature.
>
> ...it seems we are viewing this, at least, from the same perspective.

Great, but I feel like the question here is if we're comfortable putting
out this capability *without* some mechanism to verify that the existing
blocks are clean/not corrupted/changed, or if we feel like this risk is
enough that we want to include a check of the existing blocks, in some
fashion, as part of the incremental backup feature.

Personally, and in discussion with David, we've generally felt like we
don't want this feature until we have a way to verify the blocks that
aren't being backed up every time and we are assuming are clean/correct,
(at least some portion of them anyway, with a way to make sure we
eventually check them all) because we are concerned that users will get
bit by latent corruption and then be quite unhappy with us for not
picking up on that.

> > Wow.  I have to admit that I feel completely opposite of that- I'd
> > *love* to have an independent tool (which ideally uses the same code
> > through the common library, or similar) that can be run to apply WAL.
> >
> > In other words, I don't agree that it's the server's problem at all to
> > solve that, or, at least, I don't believe that it needs to be.
>
> I mean, I guess I'd love to have that if I could get it by waving a
> magic wand, but I wouldn't love it if I had to write the code or
> maintain it.  The routines for applying WAL currently all assume that
> you have a whole bunch of server infrastructure present; that code
> wouldn't run in a frontend environment, I think.  I wouldn't want to
> have a second copy of every WAL apply routine that might have its own
> set of bugs.

I agree that we don't want to have multiple implementations or copies of
the WAL apply routines.  On the other hand, while I agree that there's
some server infrastructure they depend on today, I feel like a lot of
that infrastructure is things that we'd actually like to have in at
least some of the client tools (and likely pg_basebackup specifically).
I understand that it's not trivial to implement, of course, or to pull
out into a common library.  We are already seeing some efforts to
consolidate common routines in the client libraries (Peter E's recent
work around the error messaging being a good example) and I feel like
that's something we should encourage and expect to see happening more in
the future as we add more sophisticated client utilities.

> > I've tried to outline how the incremental backup capability and backup
> > management are really very closely related and having those be
> > implemented by independent tools is not a good interface for our users
> > to have to live with.
>
> I disagree.  I think the "existing backup tools don't use
> pg_basebackup" argument isn't very compelling, because the reason
> those tools don't use pg_basebackup is because it can't do what they
> need.  If it did, they'd probably use it.  People don't write a whole
> separate engine for running backups just because it's fun to not reuse
> code -- they do it because there's no other way to get what they want.

I understand that you disagree but I don't clearly understand the
subsequent justification for why you disagree.  As I understand it, you
disagree that an incremental backup capability and backup management are
closely related, but that's because the existing tools don't leverage
pg_basebackup (or the backup protocol), but aren't those pretty
distinct things?  I accept that perhaps it's my fault for implying that
these topics were related in the emails I've sent, and while replying to
various parts of the discussion which has traveled across a number of
topics, some related and some not.  I see incremental backups and backup
management as related because, in part, of expiration- if you expire out
a 'full' backup then you must expire out any incremental or differential
backups based on it.  Just generally that association of which
incremental depends on which full (or prior differential, or prior
incremental) is extremely important and necessary to avoid corrupt
systems (consider that you might apply an incremental to a full backup,
but the incremental taken was actually based on another incremental and
not based on the full, or variations of that...).

In short, I don't think I could confidently trust any incremental backup
that's taken without having a clear link to the backup it's based on,
and having it be expired when the backup it depends on is expired.

> > Most of the external tools don't use pg_basebackup, nor the base backup
> > protocol (or, if they do, it's only as an option among others).  In my
> > opinion, that's pretty clear indication that pg_basebackup and the base
> > backup protocol aren't sufficient to cover any but the simplest of
> > use-cases (though those simple use-cases are handled rather well).
> > We're talking about adding on a capability that's much more complicated
> > and is one that a lot of tools have already taken a stab at, let's try
> > to do it in a way that those tools can leverage it and avoid having to
> > implement it themselves.
>
> I mean, again, if it were part of pg_basebackup and available via the
> replication protocol, they could do exactly that, through either
> method.  I don't get it.

No, they can't.  Today there exists *exactly* this situation:
pg_basebackup uses the base backup protocol for doing backups, and the
external tools don't use it.

Why?

Because it can't be used in a parallel manner, making it largely
uninteresting as a mechanism for doing backups of systems at any scale.

Yes, sure, they *could* technically use it, but from a *practical*
standpoint they don't because it *sucks*.  Let's not do that for
incremental backups.

> You seem to be arguing that we shouldn't add
> the necessary capabilities to the replication protocol or
> pg_basebackup, but at the same time arguing that pg_basebackup is
> inadequate because it's missing important capabilities.  This confuses
> me.

I'm sorry for not being clear.  I'm not argueing that we *shouldn't* add
such capabilities.  I *want* these capabilities to be added, but I want
them added in a way that's actually useful to the external tools and not
something that only works for pg_basebackup (which is currently
single-threaded).

I hope that's the kind of feedback you've been looking for on this
thread.

> > It's an interesting idea to add in everything to pg_basebackup that
> > users doing backups would like to see, but that's quite a list:
> >
> > - full backups
> > - differential backups
> > - incremental backups / block-level backups
> > - (server-side) compression
> > - (server-side) encryption
> > - page-level checksum validation
> > - calculating checksums (on the whole file)
> > - External object storage (S3, et al)
> > - more things...
> >
> > I'm really not convinced that I agree with the division of labor as
> > you've outlined it, where all of the above is done by pg_basebackup,
> > where just archiving and backup retention are handled by some external
> > tool (except that we already have pg_receivewal, so archiving isn't
> > really an externally handled thing either, unless you want features like
> > parallel archive-push or parallel archive-get...).
>
> Yeah, if it were up to me, I'd choose put most of that in the server
> and make it available via the replication protocol, and then give
> pg_basebackup able to use that functionality.

I'm all about that.  I don't know that the client-side tool would still
be called 'pg_basebackup' at that point, but I definitely want to get to
a point where we have all of these capabilities available in core.

> And external tools
> could use that functionality via pg_basebackup or by using the
> replication protocol directly.  I actually don't really understand
> what the alternative is.  If you want server-side compression, for
> example, that really has to be done on the server.  And how would the
> server expose that, except through the replication protocol?  Sure, we
> could design a new protocol for it. Call it... say... the
> shmeplication protocol.  And then you could use the replication
> protocol for what it does today and the shmeplication protocol for all
> the cool bits.  But why would that be better?

The replication protocol (or base backup protocol, really..) is what we
make it, in the end.  Of course server-side compression needs to be done
on the server and we need a way to tell the server "please compress this
for us before sending it".  I'm not suggesting there's some alternative
to that.  What I'm suggesting is that when we go to implement the
incremental backup protocol that we have a way for that to be
parallelized (at least...  maybe other things too) because that's what
the external tools would really like.

Even pg_dump works in the way that it connects and builds a list of
things to run against and then farms that out to the parallel processes,
so we have an example of how this is done in core today.

> > What would really help me, at least, understand the idea here would be
> > to understand exactly what the existing tools do that the subset of
> > users you're thinking about doesn't like/want, but which pg_basebackup,
> > today, does.  Is the issue that there's a repository instead of just a
> > plain PG directory or set of tar files, like what pg_basebackup produces
> > today?  But how would we do things like have compression, or encryption,
> > or block-based incremental backups without some kind of repository or
> > directory that doesn't actually look exactly like a PG data directory?
>
> I guess we're still wallowing in the same confusion here.
> pg_basebackup, for me, is just a convenient place to stick this
> functionality.  If the server has the ability to construct and send an
> incremental backup by some means, then it needs a client on the other
> end to receive and store that backup, and since pg_basebackup already
> knows how to do that for full backups, extending it to incremental
> backups (and/or parallel, encrypted, compressed, and validated
> backups) seems very natural to me.  Otherwise I add server-side
> functionality to allow $X and then have to  write an entirely new
> client to interact with that instead of just using the client I've
> already got.  That's more work, and I'm lazy.

I'm not suggesting that we don't add this functionality to
pg_basebackup, I'm just saying that we should be thinking about how the
external tools will want to leverage this new capability because it's
materially different from the basic minimum that pg_basebackup requires.
Yes, it'd be a bit more work and a somewhat more complicated protocol
than the simple approach needed by pg_basebackup, but that's what those
other tools will want.  If we don't care about them, ok, I get that, but
I thought the idea here was to build something that's useful to both the
external tools and pg_basebackup.  We won't get that if we focus on just
implementing a protocol for pg_basebackup to use.

> Now it's true that if we wanted to build something like the rsync
> protocol into PostgreSQL, jamming that into pg_basebackup might well
> be a bridge too far.  That would involve taking backups via a method
> so different from what we're currently doing that it would probably
> make sense to at least consider creating a whole new tool for that
> purpose.  But that wasn't my proposal...

The idea around the rsync binary-diff protocol was *specifically* for
things that we can't do through block-level updates with WAL scanning,
just to be clear.  I wasn't thinking that would be good for the relation
files since we have more information for those in the LSN, et al.

Thanks!

Stephen

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Race conditions with checkpointer and shutdown
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Adding a TAP test checking data consistency on standby withminRecoveryPoint