Re: Extending BASE_BACKUP in replication protocol: incremental backup and backup format

Поиск
Список
Период
Сортировка
От Magnus Hagander
Тема Re: Extending BASE_BACKUP in replication protocol: incremental backup and backup format
Дата
Msg-id CABUevExZ-2NH6jxB5sjs_dsS7qbmoF0NOYpEEyayBKbUfKPbqw@mail.gmail.com
обсуждение исходный текст
Ответ на Extending BASE_BACKUP in replication protocol: incremental backup and backup format  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: Extending BASE_BACKUP in replication protocol: incremental backup and backup format  (Andres Freund <andres@2ndquadrant.com>)
Re: Extending BASE_BACKUP in replication protocol: incremental backup and backup format  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-hackers
On Tue, Jan 14, 2014 at 1:47 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
Hi all,

As of today, replication protocol has a command called BASE_BACKUP to
allow a client connecting with the replication protocol to retrieve a
full backup from server through a connection stream. The description
of its current options are here:
http://www.postgresql.org/docs/9.3/static/protocol-replication.html

This command is in charge to put the server in start backup by using
do_pg_start_backup, then it sends the backup, and finalizes the backup
with do_pg_stop_backup. Thanks to that it is as well possible to get
backups from even standby nodes as the stream contains the
backup_label file necessary for recovery. Full backup is sent in tar
format for obvious performance reasons to limit the amount of data
sent through the stream, and server contains necessary coding to send
the data in correct format. This forces the client as well to perform
some decoding if the output of the base backup received needs to be
analyzed on the fly but doing something similar to what now
pg_basebackup does when the backup format is plain.

I would like to propose the following things to extend BASE_BACKUP to
retrieve a backup from a stream:
- Addition of an option FORMAT, to control the output format of
backup, with possible options as 'plain' and 'tar'. Default is tar for
backward compatibility purposes. The purpose of this option is to make
easier for backup tools playing with postgres to retrieve and backup
and analyze it on the fly, the purpose being to filter and analyze the
data while it is being received without all the tar decoding
necessary, what would consist in copying portions of pg_basebackup
code more or less.

How would this be different/better than the tar format? pg_basebackup already does this analysis, for example, when it comes to recovery.conf. 
The tar format is really easy to analyze as a stream, that's one of the reasons we picked it...

 
- Addition of an option called INCREMENTAL to send an incremental
backup to the client. This option uses as input an LSN, and sends back
to client relation pages (in the shape of reduced relation files) that
are newer than the LSN specified by looking at pd_lsn of
PageHeaderData. In this case the LSN needs to be determined by client
based on the latest full backup taken. This option is particularly
interesting to reduce the amount of data taken between two backups,
even if it increases the restore time as client needs to reconstitute
a base backup depending on the recovery target and the pages modified.
Client would be in charge of rebuilding pages from incremental backup
by scanning all the blocks that need to be updated based on the full
backup as the LSN from which incremental backup is taken is known. But
this is not really something the server cares about... Such things are
actually done by pg_rman as well.

This sounds a lot like DIFFERENTIAL in other databases? Or I guess it's the same underlying technology, depending only on if you go back to the full base backup, or to the last incremental one.

But if you look at the terms otherwise, I think incremental often refers to what we call WAL.

Either way - if we can do this in a safe way, it sounds like a good idea. It would be sort of like rsync, except relying on the fact that we can look at the LSN and don't have to compare the actual files, right?


As a next step, I would imagine that pg_basebackup could be extended
to take incremental backups as well. Having another tool able to
rebuild backups based on a full backup with incremental information
would be nice as well.

I would say those are requirements, not just next step and "nice as well" :)


--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Extending BASE_BACKUP in replication protocol: incremental backup and backup format
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Extending BASE_BACKUP in replication protocol: incremental backup and backup format