Re: block-level incremental backup

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: block-level incremental backup
Дата
Msg-id 20190415130111.GE6197@tamriel.snowman.net
обсуждение исходный текст
Ответ на block-level incremental backup  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: block-level incremental backup  (Bruce Momjian <bruce@momjian.us>)
Re: block-level incremental backup  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> Several companies, including EnterpriseDB, NTT, and Postgres Pro, have
> developed technology that permits a block-level incremental backup to
> be taken from a PostgreSQL server.  I believe the idea in all of those
> cases is that non-relation files should be backed up in their
> entirety, but for relation files, only those blocks that have been
> changed need to be backed up.

I love the general idea of having additional facilities in core to
support block-level incremental backups.  I've long been unhappy that
any such approach ends up being limited to a subset of the files which
need to be included in the backup, meaning the rest of the files have to
be backed up in their entirety.  I don't think we have to solve for that
as part of this, but I'd like to see a discussion for how to deal with
the other files which are being backed up to avoid needing to just
wholesale copy them.

> I would like to propose that we should
> have a solution for this problem in core, rather than leaving it to
> each individual PostgreSQL company to develop and maintain their own
> solution.

I'm certainly a fan of improving our in-core backup solutions.

I'm quite concerned that trying to graft this on to pg_basebackup
(which, as you note later, is missing an awful lot of what users expect
from a real backup solution already- retention handling, parallel
capabilities, WAL archive management, and many more... but also is just
not nearly as developed a tool as the external solutions) is going to
make things unnecessairly difficult when what we really want here is
better support from core for block-level incremental backup for the
existing external tools to leverage.

Perhaps there's something here which can be done with pg_basebackup to
have it work with the block-level approach, but I certainly don't see
it as a natural next step for it and really does seem like limiting the
way this is implemented to something that pg_basebackup can easily
digest might make it less useful for the more developed tools.

As an example, I believe all of the other tools mentioned (at least,
those that are open source I'm pretty sure all do) support parallel
backup and therefore having a way to get the block-level changes in a
parallel fashion would be a pretty big thing that those tools will want
and pg_basebackup is single-threaded today and this proposal doesn't
seem to be contemplating changing that, implying that a serial-based
block-level protocol would be fine but that'd be a pretty awful
restriction for the other tools.

> Generally my idea is:
>
> 1. There should be a way to tell pg_basebackup to request from the
> server only those blocks where LSN >= threshold_value.  There are
> several possible ways for the server to implement this, the simplest
> of which is to just scan all the blocks and send only the ones that
> satisfy that criterion.  That might sound dumb, but it does still save
> network bandwidth, and it works even without any prior setup. It will
> probably be more efficient in many cases to instead scan all the WAL
> generated since that LSN and extract block references from it, but
> that is only possible if the server has all of that WAL available or
> can somehow get it from the archive.  We could also, as several people
> have proposed previously, have some kind of additional relation for
> that stores either a single is-modified bit -- which only helps if the
> reference LSN for the is-modified bit is older than the requested LSN
> but not too much older -- or the highest LSN for each range of K
> blocks, or something like that.  I am at the moment not too concerned
> with the exact strategy we use here. I believe we may want to
> eventually support more than one, since they have different
> trade-offs.

This part of the discussion is a another example of how we're limiting
ourselves in this implementation to the "pg_basebackup can work with
this" case- by only consideration the options of "scan all the files" or
"use the WAL- if the request is for WAL we have available on the
server."  The other backup solutions mentioned in your initial email,
and others that weren't, have a WAL archive which includes a lot more
WAL than just what the primary currently has.  When I've thought about
how WAL could be used to build a differential or incremental backup, the
question of "do we have all the WAL we need" hasn't ever been a
consideration- because the backup tool manages the WAL archive and has
WAL going back across, most likely, weeks or even months.  Having a tool
which can essentially "compress" WAL would be fantastic and would be
able to be leveraged by all of the different backup solutions.

> 2. When you use pg_basebackup in this way, each relation file that is
> not sent in its entirety is replaced by a file with a different name.
> For example, instead of base/16384/16417, you might get
> base/16384/partial.16417 or however we decide to name them.  Each such
> file will store near the beginning of the file a list of all the
> blocks contained in that file, and the blocks themselves will follow
> at offsets that can be predicted from the metadata at the beginning of
> the file.  The idea is that you shouldn't have to read the whole file
> to figure out which blocks it contains, and if you know specifically
> what blocks you want, you should be able to reasonably efficiently
> read just those blocks.  A backup taken in this manner should also
> probably create some kind of metadata file in the root directory that
> stops the server from starting and lists other salient details of the
> backup.  In particular, you need the threshold LSN for the backup
> (i.e. contains blocks newer than this) and the start LSN for the
> backup (i.e. the LSN that would have been returned from
> pg_start_backup).

Two things here- having some file that "stops the server from starting"
is just going to cause a lot of pain, in my experience.  Users do a lot
of really rather.... curious things, and then come asking questions
about them, and removing the file that stopped the server from starting
is going to quickly become one of those questions on stack overflow that
people just follow the highest-ranked question for, even though everyone
who follows this list will know that doing so results in corruption of
the database.

An alternative approach in developing this feature would be to have
pg_basebackup have an option to run against an *existing* backup, with
the entire point being that the existing backup is updated with these
incremental changes, instead of having some independent tool which takes
the result of multiple pg_basebackup runs and then combines them.

An alternative tool might be one which simply reads the WAL and keeps
track of the FPIs and the updates and then eliminates any duplication
which exists in the set of WAL provided (that is, multiple FPIs for the
same page would be merged into one, and only the delta changes to that
page are preserved, across the entire set of WAL being combined).  Of
course, that's complicated by having to deal with the other files in the
database, so it wouldn't really work on its own.

> 3. There should be a new tool that knows how to merge a full backup
> with any number of incremental backups and produce a complete data
> directory with no remaining partial files.  The tool should check that
> the threshold LSN for each incremental backup is less than or equal to
> the start LSN of the previous backup; if not, there may be changes
> that happened in between which would be lost, so combining the backups
> is unsafe.  Running this tool can be thought of either as restoring
> the backup or as producing a new synthetic backup from any number of
> incremental backups.  This would allow for a strategy of unending
> incremental backups.  For instance, on day 1, you take a full backup.
> On every subsequent day, you take an incremental backup.  On day 9,
> you run pg_combinebackup day1 day2 -o full; rm -rf day1 day2; mv full
> day2.  On each subsequent day you do something similar.  Now you can
> always roll back to any of the last seven days by combining the oldest
> backup you have (which is always a synthetic full backup) with as many
> newer incrementals as you want, up to the point where you want to
> stop.

I'd really prefer that we avoid adding in another low-level tool like
the one described here.  Users, imv anyway, don't want to deal with
*more* tools for handling this aspect of backup/recovery.  If we had a
tool in core today which managed multiples backups, kept track of them,
and all of the WAL during and between them, then we could add options to
that tool to do what's being described here in a way that makes sense
and provides a good interface to users.  I don't know that we're going
to be able to do that with pg_basebackup when, really, the goal here
isn't actually to make pg_basebackup into an enterprise backup tool,
it's to make things easier for the external tools to do block-level
backups.

Thanks!

Stephen

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Justin Pryzby
Дата:
Сообщение: Re: PANIC: could not flush dirty data: Operation not permittedpower8, Redhat Centos
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: Zedstore - compressed in-core columnar storage