Re: block-level incremental backup

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: block-level incremental backup
Дата
Msg-id CA+TgmobkKWn71=Q8n3=gXSS_FKT+PRPi+0CDft3V9+qyv8RE+w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: block-level incremental backup  (Stephen Frost <sfrost@snowman.net>)
Ответы Re: block-level incremental backup  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-hackers
On Wed, Apr 24, 2019 at 9:28 AM Stephen Frost <sfrost@snowman.net> wrote:
> Looking at it from what I'm sitting, I brought up two ways that we
> could extend the protocol to "request from the server only those blocks
> where LSN >= threshold_value" with one being the modification to
> BASE_BACKUP and the other being a new set of commands that could be
> parallelized.  If I had assumed that you'd be thinking the same way I am
> about extending the backup protocol, I wouldn't have said anything now
> and then would have complained after you wrote a patch that just
> extended the BASE_BACKUP command, at which point I likely would have
> been told that it's now been done and that I should have mentioned it
> earlier.

Fair enough.

> At least in part then it seems like we're viewing the level of effort
> around what I'm talking about quite differently, and I feel like that's
> largely because every time I mention parallel anything there's this
> assumption that I'm asking you to parallelize pg_basebackup or write a
> whole bunch more code to provide a fully optimized server-side parallel
> implementation for backups.  That really wasn't what I was going for.  I
> was thinking it would be a modest amount of additional work add
> incremental backup via a few new commands, instead of through the
> BASE_BACKUP protocol command, that would make parallelization possible.

I'm not sure about that.  It doesn't seem crazy difficult, but there
are a few wrinkles.  One is that if the client is requesting files one
at a time, it's got to have a list of all the files that it needs to
request, and that means that it has to ask the server to make a
preparatory pass over the whole PGDATA directory to get a list of all
the files that exist.  That overhead is not otherwise needed.  Another
is that the list of files might be really large, and that means that
the client would either use a lot of memory to hold that great big
list, or need to deal with spilling the list to a spool file
someplace, or else have a server protocol that lets the list be
fetched in incrementally in chunks.  A third is that, as you mention
further on, it means that the client has to care a lot more about
exactly how the server is figuring out which blocks have been
modified.  If it just says BASE_BACKUP ..., the server an be
internally reading each block and checking the LSN, or using
WAL-scanning or ptrack or whatever and the client doesn't need to know
or care.  But if the client is asking for a list of modified files or
blocks, then that presumes the information is available, and not too
expensively, without actually reading the files.  Fourth, MAX_RATE
probably won't actually limit to the correct rate overall if the limit
is applied separately to each file.

I'd be afraid that a patch that tried to handle all that as part of
this project would get rejected on the grounds that it was trying to
solve too many unrelated problems.  Also, though not everybody has to
agree on what constitutes a "modest amount of additional work," I
would not describe solving all of those problems as a modest effort,
but rather a pretty substantial one.

> There's a tangent on all of this that's pretty key though, which is the
> question around just how the blocks are identified.  If the WAL scanning
> is done to figure out the blocks, then that's quite a bit different from
> the other idea of "open this relation and scan it, but only give me the
> blocks after this LSN".  It's the latter case that I've been mostly
> thinking about in this thread, which is part of why I was thinking it'd
> be a modest amount of work to have protocol commands that accepted a
> file (or perhaps a relation..) to scan and return blocks from instead of
> baking this into BASE_BACKUP which by definition just serially scans the
> data directory and returns things as it finds them.  For the case where
> we have WAL scanning happening and modfiles which are being read and
> used to figure out the blocks to send, it seems like it might be more
> complicated and therefore potentially quite a bit more work to have a
> parallel version of that.

Yeah.  I don't entirely agree that the first one is simple, as per the
above, but I definitely agree that the second one is more complicated
than the first one.

> > Well, one thing you might want to do is have a tool that connects to
> > the server, enters backup mode, requests information on what blocks
> > have changed, copies those blocks via direct filesystem access, and
> > then exits backup mode.  Such a tool would really benefit from a
> > START_BACKUP / SEND_FILE_LIST / SEND_FILE_CONTENTS / STOP_BACKUP
> > command language, because it would just skip ever issuing the
> > SEND_FILE_CONTENTS command in favor of doing that part of the work via
> > other means.  On the other hand, a START_PARALLEL_BACKUP LSN '1/234'
> > command is useless to such a tool.
>
> That's true, but I hardly ever hear people talking about how wonderful
> it is that pgBackRest uses SSH to grab the data.  What I hear, often, is
> that people would really like backups to be done over the PG protocol on
> the same port that replication is done on.  A possible compromise is
> having a dedicated port for the backup agent to use, but it's definitely
> not the preference.

If you happen to be on the same system where the backup is running,
reading straight from the data directory might be a lot faster.
Otherwise, I tend to agree with you that using libpq is probably best.

> I agree that each has some pros and cons.  Certainly one of the big
> 'cons' here is that it'd be a lot more backend work to implement the
> 'maximally-efficient parallel backup', while the fine-grained commands
> wouldn't require nearly as much but would still allow a great deal of
> the benefit for both in-core and out-of-core tools, potentially.

I agree.

> The comments that Anastasia had around the issues with being able to
> identify the full backup that goes with a given incremental backup, et
> al, certainly echoed some my concerns regarding this part of the
> discussion.
>
> As for the concerns about trying to avoid corruption from starting up an
> invalid cluster, I didn't see much discussion about the idea of some
> kind of cross-check between pg_control and backup_label.  That was all
> very hand-wavy, so I'm not too surprised, but I don't think it's
> completely impossible to have something better than "well, if you just
> remove this one file, then you get a non-obviously corrupt cluster that
> you can happily start up".  I'll certainly accept that it requires more
> thought though and if we're willing to continue a discussion around
> that, great.

I think there are three different issues here that need to be
considered separately.

Issue #1: If you manually add files to your backup, remove files from
your backup, or change files in your backup, bad things will happen.
There is fundamentally nothing we can do to prevent this completely,
but it may be possible to make the system more resilient against
ham-handed modifications, at least to the extent of detecting them.
That's maybe a topic for another thread, but it's an interesting one:
Andres and I were brainstorming about it at some point.

Issue #2: You can only restore an LSN-based incremental backup
correctly if you have a base backup whose start-of-backup LSN is
greater than or equal to the threshold LSN used to take the
incremental backup.  If #1 is not in play, this is just a simple
cross-check at restoration time: retrieve the 'START WAL LOCATION'
from the prior backup's backup_label file and the threshold LSN for
the incremental backup from wherever you decide to store it and
compare them; if they do not have the right relationship, ERROR.  As
to whether #1 might end up in play here, anything's possible, but
wouldn't manually editing LSNs in backup metadata files be pretty
obviously a bad idea?  (Then again, I didn't really think the whole
backup_label thing was that confusing either, and obviously I was
wrong about that.  Still, editing a file requires a little more work
than removing it... you have to not only lie to the system, you have
to decide which lie to tell!)

Issue #3: Even if you clearly understand the rule articulated in #2,
you might find it hard to follow in practice.  If you take a full
backup on Sunday and an incremental against Sunday's backup or against
the previous day's backup on each subsequent day, it's not really that
hard to understand.  But in more complex scenarios it could be hard to
get right.  For example if you've been removing your backups when they
are a month old and and then you start doing the same thing once you
add incrementals to the picture you might easily remove a full backup
upon which a newer incremental depends.  I see the need for good tools
to manage this kind of complexity, but have no plan as part of this
project to provide them.  I think that just requires too many
assumptions about where those backups are being stored and how they
are being catalogued and managed; I don't believe I currently am
knowledgeable enough to design something that would be good enough to
meet core standards for inclusion, and I don't want to waste energy
trying.  If someone else wants to try, that's OK with me, but I think
it's probably better to let this be a thing that people experiment
with outside of core for a while until we see what ends up being a
winner.  I realize that this is a debatable position, but as I'm sure
you realize by now, I have a strong desire to limit the scope of this
project in such a way that I can get it done, 'cuz a bird in the hand
is worth two in the bush.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: set relispartition when attaching child index
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Unhappy about API changes in the no-fsm-for-small-rels patch