Re: block-level incremental backup
От | Stephen Frost |
---|---|
Тема | Re: block-level incremental backup |
Дата | |
Msg-id | 20190415130111.GE6197@tamriel.snowman.net обсуждение исходный текст |
Ответ на | block-level incremental backup (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: block-level incremental backup
(Bruce Momjian <bruce@momjian.us>)
Re: block-level incremental backup (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
Greetings, * Robert Haas (robertmhaas@gmail.com) wrote: > Several companies, including EnterpriseDB, NTT, and Postgres Pro, have > developed technology that permits a block-level incremental backup to > be taken from a PostgreSQL server. I believe the idea in all of those > cases is that non-relation files should be backed up in their > entirety, but for relation files, only those blocks that have been > changed need to be backed up. I love the general idea of having additional facilities in core to support block-level incremental backups. I've long been unhappy that any such approach ends up being limited to a subset of the files which need to be included in the backup, meaning the rest of the files have to be backed up in their entirety. I don't think we have to solve for that as part of this, but I'd like to see a discussion for how to deal with the other files which are being backed up to avoid needing to just wholesale copy them. > I would like to propose that we should > have a solution for this problem in core, rather than leaving it to > each individual PostgreSQL company to develop and maintain their own > solution. I'm certainly a fan of improving our in-core backup solutions. I'm quite concerned that trying to graft this on to pg_basebackup (which, as you note later, is missing an awful lot of what users expect from a real backup solution already- retention handling, parallel capabilities, WAL archive management, and many more... but also is just not nearly as developed a tool as the external solutions) is going to make things unnecessairly difficult when what we really want here is better support from core for block-level incremental backup for the existing external tools to leverage. Perhaps there's something here which can be done with pg_basebackup to have it work with the block-level approach, but I certainly don't see it as a natural next step for it and really does seem like limiting the way this is implemented to something that pg_basebackup can easily digest might make it less useful for the more developed tools. As an example, I believe all of the other tools mentioned (at least, those that are open source I'm pretty sure all do) support parallel backup and therefore having a way to get the block-level changes in a parallel fashion would be a pretty big thing that those tools will want and pg_basebackup is single-threaded today and this proposal doesn't seem to be contemplating changing that, implying that a serial-based block-level protocol would be fine but that'd be a pretty awful restriction for the other tools. > Generally my idea is: > > 1. There should be a way to tell pg_basebackup to request from the > server only those blocks where LSN >= threshold_value. There are > several possible ways for the server to implement this, the simplest > of which is to just scan all the blocks and send only the ones that > satisfy that criterion. That might sound dumb, but it does still save > network bandwidth, and it works even without any prior setup. It will > probably be more efficient in many cases to instead scan all the WAL > generated since that LSN and extract block references from it, but > that is only possible if the server has all of that WAL available or > can somehow get it from the archive. We could also, as several people > have proposed previously, have some kind of additional relation for > that stores either a single is-modified bit -- which only helps if the > reference LSN for the is-modified bit is older than the requested LSN > but not too much older -- or the highest LSN for each range of K > blocks, or something like that. I am at the moment not too concerned > with the exact strategy we use here. I believe we may want to > eventually support more than one, since they have different > trade-offs. This part of the discussion is a another example of how we're limiting ourselves in this implementation to the "pg_basebackup can work with this" case- by only consideration the options of "scan all the files" or "use the WAL- if the request is for WAL we have available on the server." The other backup solutions mentioned in your initial email, and others that weren't, have a WAL archive which includes a lot more WAL than just what the primary currently has. When I've thought about how WAL could be used to build a differential or incremental backup, the question of "do we have all the WAL we need" hasn't ever been a consideration- because the backup tool manages the WAL archive and has WAL going back across, most likely, weeks or even months. Having a tool which can essentially "compress" WAL would be fantastic and would be able to be leveraged by all of the different backup solutions. > 2. When you use pg_basebackup in this way, each relation file that is > not sent in its entirety is replaced by a file with a different name. > For example, instead of base/16384/16417, you might get > base/16384/partial.16417 or however we decide to name them. Each such > file will store near the beginning of the file a list of all the > blocks contained in that file, and the blocks themselves will follow > at offsets that can be predicted from the metadata at the beginning of > the file. The idea is that you shouldn't have to read the whole file > to figure out which blocks it contains, and if you know specifically > what blocks you want, you should be able to reasonably efficiently > read just those blocks. A backup taken in this manner should also > probably create some kind of metadata file in the root directory that > stops the server from starting and lists other salient details of the > backup. In particular, you need the threshold LSN for the backup > (i.e. contains blocks newer than this) and the start LSN for the > backup (i.e. the LSN that would have been returned from > pg_start_backup). Two things here- having some file that "stops the server from starting" is just going to cause a lot of pain, in my experience. Users do a lot of really rather.... curious things, and then come asking questions about them, and removing the file that stopped the server from starting is going to quickly become one of those questions on stack overflow that people just follow the highest-ranked question for, even though everyone who follows this list will know that doing so results in corruption of the database. An alternative approach in developing this feature would be to have pg_basebackup have an option to run against an *existing* backup, with the entire point being that the existing backup is updated with these incremental changes, instead of having some independent tool which takes the result of multiple pg_basebackup runs and then combines them. An alternative tool might be one which simply reads the WAL and keeps track of the FPIs and the updates and then eliminates any duplication which exists in the set of WAL provided (that is, multiple FPIs for the same page would be merged into one, and only the delta changes to that page are preserved, across the entire set of WAL being combined). Of course, that's complicated by having to deal with the other files in the database, so it wouldn't really work on its own. > 3. There should be a new tool that knows how to merge a full backup > with any number of incremental backups and produce a complete data > directory with no remaining partial files. The tool should check that > the threshold LSN for each incremental backup is less than or equal to > the start LSN of the previous backup; if not, there may be changes > that happened in between which would be lost, so combining the backups > is unsafe. Running this tool can be thought of either as restoring > the backup or as producing a new synthetic backup from any number of > incremental backups. This would allow for a strategy of unending > incremental backups. For instance, on day 1, you take a full backup. > On every subsequent day, you take an incremental backup. On day 9, > you run pg_combinebackup day1 day2 -o full; rm -rf day1 day2; mv full > day2. On each subsequent day you do something similar. Now you can > always roll back to any of the last seven days by combining the oldest > backup you have (which is always a synthetic full backup) with as many > newer incrementals as you want, up to the point where you want to > stop. I'd really prefer that we avoid adding in another low-level tool like the one described here. Users, imv anyway, don't want to deal with *more* tools for handling this aspect of backup/recovery. If we had a tool in core today which managed multiples backups, kept track of them, and all of the WAL during and between them, then we could add options to that tool to do what's being described here in a way that makes sense and provides a good interface to users. I don't know that we're going to be able to do that with pg_basebackup when, really, the goal here isn't actually to make pg_basebackup into an enterprise backup tool, it's to make things easier for the external tools to do block-level backups. Thanks! Stephen
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Justin PryzbyДата:
Сообщение: Re: PANIC: could not flush dirty data: Operation not permittedpower8, Redhat Centos