Re: block-level incremental backup

Поиск
Список
Период
Сортировка
От Anastasia Lubennikova
Тема Re: block-level incremental backup
Дата
Msg-id 166c3ed1-7f36-55cb-b639-1081708e8600@postgrespro.ru
обсуждение исходный текст
Ответ на Re: block-level incremental backup  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: block-level incremental backup  (Adam Brusselback <adambrusselback@gmail.com>)
Re: block-level incremental backup  (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
Список pgsql-hackers
22.04.2019 2:02, Robert Haas wrote:
> I think we're getting closer to a meeting of the minds here, but I
> don't think it's intrinsically necessary to rewrite the whole method
> of operation of pg_basebackup to implement incremental backup in a
> sensible way.  One could instead just do a straightforward extension
> to the existing BASE_BACKUP command to enable incremental backup.
> Then, to enable parallel full backup and all sorts of out-of-core
> hacking, one could expand the command language to allow tools to
> access individual steps: START_BACKUP, SEND_FILE_LIST,
> SEND_FILE_CONTENTS, STOP_BACKUP, or whatever.  The second thing makes
> for an appealing project, but I do not think there is a technical
> reason why it has to be done first.  Or for that matter why it has to
> be done second.  As I keep saying, incremental backup and full backup
> are separate projects and I believe it's completely reasonable for
> whoever is doing the work to decide on the order in which they would
> like to do the work.
>
> Having said that, I'm curious what people other than Stephen (and
> other pgbackrest hackers) think about the relative value of parallel
> backup vs. incremental backup.  Stephen appears quite convinced that
> parallel backup is full of win and incremental backup is a bit of a
> yawn by comparison, and while I certainly would not want to discount
> the value of his experience in this area, it sometimes happens on this
> mailing list that [ drum roll please ] not everybody agrees about
> everything.  So, what do other people think?
>
Personally, I believe that incremental backups are more useful to implement
first since they benefit both backup speed and the space taken by a backup.
Frankly speaking, I'm a bit surprised that the discussion of parallel 
backups
took so much of this thread.
Of course, we must keep it in mind, while designing the API to avoid 
introducing
any architectural obstacles, but any further discussion of parallelism is a
subject of another topic.


I understand Stephen's concerns about the difficulties of incremental backup
management.
Even with an assumption that user is ready to manage backup chains, 
retention,
and other stuff, we must consider the format of backup metadata that 
will allow
us to perform some primitive commands:

1) Tell whether this backup full or incremental.

2) Tell what backup is a parent of this incremental backup.
Probably, we can limit it to just returning "start_lsn", which later can be
compared to "stop_lsn" of parent backup.

3) Take an incremental backup based on this backup.
Here we must help a backup manager to retrieve the LSN to pass it to
pg_basebackup.

4) Restore an incremental backup into a directory (on top of already 
restored
full backup).
One may use it to perform "merge" or "restore" of the incremental backup,
depending on the destination directory.
I wonder if it is possible to integrate it into any existing tool, or we 
end up
with something like pg_basebackup/pg_baserestore as in case of
pg_dump/pg_restore.

Have you designed these? I may only recall "pg_combinebackup" from the very
first message in this thread, which looks more like a sketch to explain the
idea, rather than the thought-out feature design. I also found a page
https://wiki.postgresql.org/wiki/Incremental_backup that raises the same
questions.
I'm volunteering to write a draft patch or, more likely, set of patches, 
which
will allow us to discuss the subject in more detail.
And to do that I wish we agree on the API and data format (at least 
broadly).
Looking forward to hearing your thoughts.


As I see it, ideally the backup management tools should concentrate more on
managing multiple backups, while all the logic of taking a single backup 
(of any
kind) should be integrated into the core. It means that any out-of-core 
client
won't have to walk the PGDATA directory and care about all the postgres 
specific
knowledge of data files consisting of blocks with headers and LSNs and 
so on. It
simply requests data and gets it.
Understandably, it won't be implemented in one take and what is more 
probably,
it is not reachable fully.
Still, it will be great to do our best to provide such tools (both 
existing and
future) with conveniently formatted data and API to get it.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: New vacuum option to do only freezing
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Trouble with FETCH_COUNT and combined queries in psql