Re: block-level incremental backup

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: block-level incremental backup
Дата
Msg-id 20190409163500.p4rgwvgwzqz2csfj@alap3.anarazel.de
обсуждение исходный текст
Ответ на block-level incremental backup  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: block-level incremental backup  (Gary M <garym@oedata.com>)
Re: block-level incremental backup  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi,

On 2019-04-09 11:48:38 -0400, Robert Haas wrote:
> 2. When you use pg_basebackup in this way, each relation file that is
> not sent in its entirety is replaced by a file with a different name.
> For example, instead of base/16384/16417, you might get
> base/16384/partial.16417 or however we decide to name them.

Hm. But that means that files that are shipped nearly in their entirety,
need to be fully rewritten. Wonder if it's better to ship them as files
with holes, and have the metadata in a separate file. That'd then allow
to just fill in the holes with data from the older version.  I'd assume
that there's a lot of workloads where some significantly sized relations
will get updated in nearly their entirety between backups.


> Each such file will store near the beginning of the file a list of all the
> blocks contained in that file, and the blocks themselves will follow
> at offsets that can be predicted from the metadata at the beginning of
> the file.  The idea is that you shouldn't have to read the whole file
> to figure out which blocks it contains, and if you know specifically
> what blocks you want, you should be able to reasonably efficiently
> read just those blocks.  A backup taken in this manner should also
> probably create some kind of metadata file in the root directory that
> stops the server from starting and lists other salient details of the
> backup.  In particular, you need the threshold LSN for the backup
> (i.e. contains blocks newer than this) and the start LSN for the
> backup (i.e. the LSN that would have been returned from
> pg_start_backup).

I wonder if we shouldn't just integrate that into pg_control or such. So
that:

> 3. There should be a new tool that knows how to merge a full backup
> with any number of incremental backups and produce a complete data
> directory with no remaining partial files.

Could just be part of server startup?


> - I imagine that the server would offer this functionality through a
> new replication command or a syntax extension to an existing command,
> so it could also be used by tools other than pg_basebackup if they
> wished.

Would this logic somehow be usable from tools that don't want to copy
the data directory via pg_basebackup (e.g. for parallelism, to directly
send to some backup service / SAN / whatnot)?


> - It would also be nice if pg_basebackup could write backups to places
> other than the local disk, like an object store, a tape drive, etc.
> But that also sounds like a separate effort.

Indeed seems separate. But worthwhile.

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Arthur Zakirov
Дата:
Сообщение: Re: block-level incremental backup
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: jsonpath