Re: block-level incremental backup
От | Anastasia Lubennikova |
---|---|
Тема | Re: block-level incremental backup |
Дата | |
Msg-id | 166c3ed1-7f36-55cb-b639-1081708e8600@postgrespro.ru обсуждение исходный текст |
Ответ на | Re: block-level incremental backup (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: block-level incremental backup
(Adam Brusselback <adambrusselback@gmail.com>)
Re: block-level incremental backup (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>) |
Список | pgsql-hackers |
22.04.2019 2:02, Robert Haas wrote: > I think we're getting closer to a meeting of the minds here, but I > don't think it's intrinsically necessary to rewrite the whole method > of operation of pg_basebackup to implement incremental backup in a > sensible way. One could instead just do a straightforward extension > to the existing BASE_BACKUP command to enable incremental backup. > Then, to enable parallel full backup and all sorts of out-of-core > hacking, one could expand the command language to allow tools to > access individual steps: START_BACKUP, SEND_FILE_LIST, > SEND_FILE_CONTENTS, STOP_BACKUP, or whatever. The second thing makes > for an appealing project, but I do not think there is a technical > reason why it has to be done first. Or for that matter why it has to > be done second. As I keep saying, incremental backup and full backup > are separate projects and I believe it's completely reasonable for > whoever is doing the work to decide on the order in which they would > like to do the work. > > Having said that, I'm curious what people other than Stephen (and > other pgbackrest hackers) think about the relative value of parallel > backup vs. incremental backup. Stephen appears quite convinced that > parallel backup is full of win and incremental backup is a bit of a > yawn by comparison, and while I certainly would not want to discount > the value of his experience in this area, it sometimes happens on this > mailing list that [ drum roll please ] not everybody agrees about > everything. So, what do other people think? > Personally, I believe that incremental backups are more useful to implement first since they benefit both backup speed and the space taken by a backup. Frankly speaking, I'm a bit surprised that the discussion of parallel backups took so much of this thread. Of course, we must keep it in mind, while designing the API to avoid introducing any architectural obstacles, but any further discussion of parallelism is a subject of another topic. I understand Stephen's concerns about the difficulties of incremental backup management. Even with an assumption that user is ready to manage backup chains, retention, and other stuff, we must consider the format of backup metadata that will allow us to perform some primitive commands: 1) Tell whether this backup full or incremental. 2) Tell what backup is a parent of this incremental backup. Probably, we can limit it to just returning "start_lsn", which later can be compared to "stop_lsn" of parent backup. 3) Take an incremental backup based on this backup. Here we must help a backup manager to retrieve the LSN to pass it to pg_basebackup. 4) Restore an incremental backup into a directory (on top of already restored full backup). One may use it to perform "merge" or "restore" of the incremental backup, depending on the destination directory. I wonder if it is possible to integrate it into any existing tool, or we end up with something like pg_basebackup/pg_baserestore as in case of pg_dump/pg_restore. Have you designed these? I may only recall "pg_combinebackup" from the very first message in this thread, which looks more like a sketch to explain the idea, rather than the thought-out feature design. I also found a page https://wiki.postgresql.org/wiki/Incremental_backup that raises the same questions. I'm volunteering to write a draft patch or, more likely, set of patches, which will allow us to discuss the subject in more detail. And to do that I wish we agree on the API and data format (at least broadly). Looking forward to hearing your thoughts. As I see it, ideally the backup management tools should concentrate more on managing multiple backups, while all the logic of taking a single backup (of any kind) should be integrated into the core. It means that any out-of-core client won't have to walk the PGDATA directory and care about all the postgres specific knowledge of data files consisting of blocks with headers and LSNs and so on. It simply requests data and gets it. Understandably, it won't be implemented in one take and what is more probably, it is not reachable fully. Still, it will be great to do our best to provide such tools (both existing and future) with conveniently formatted data and API to get it. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
В списке pgsql-hackers по дате отправления: