Re: [RFC] Incremental backup v3: incremental PoC

Поиск
Список
Период
Сортировка
От Gabriele Bartolini
Тема Re: [RFC] Incremental backup v3: incremental PoC
Дата
Msg-id CAHNtfO4LVNNn+=K409UA29Y=-h+5vw8Luv3g947Z99tCbHbhvQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [RFC] Incremental backup v3: incremental PoC  (Marco Nenciarini <marco.nenciarini@2ndquadrant.it>)
Ответы Re: [RFC] Incremental backup v3: incremental PoC  (Marco Nenciarini <marco.nenciarini@2ndquadrant.it>)
File based Incremental backup v7  (Marco Nenciarini <marco.nenciarini@2ndquadrant.it>)
Список pgsql-hackers
Hi Marco,

  thank you for sending an updated patch. I am writing down a report of this initial (and partial) review.

IMPORTANT: This patch is not complete, as stated by Marco. See the "Conclusions" section for my proposed TODO list.

== Patch application

I have been able to successfully apply your patch and compile it.
Regression tests passed.

== Initial run

I have created a fresh new instance of PostgreSQL and activated streaming replication to be used by pg_basebackup. I have done a pgbench run with scale 100.

I have taken a full consistent backup with pg_basebackup (in plain format):
pg_basebackup -v -F p -D $BACKUPDIR/backup-$(date '+%s') -x

I have been able to verify that the backup_profile is correctly placed in the destination PGDATA directory. Here is an excerpt:
POSTGRESQL BACKUP PROFILE 1
START WAL LOCATION: 0/3000058 (file 000000010000000000000003)
CHECKPOINT LOCATION: 0/300008C
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2015-01-14 10:07:07 CET
LABEL: pg_basebackup base backup
FILE LIST
\N      \N      t       1421226427      206     backup_label
\N      \N      t       1421225508      88      postgresql.auto.conf
...

As suggested by Marco, I have manually taken the LSN from this file (next version must do this automatically).
I have then executed pg_basebackup and activated the incremental feature by using the LSN from the previous backup, as follows:

LSN=$(awk '/^START WAL/{print $4}' backup_profile)
pg_basebackup -v -F p -D $BACKUPDIR/backup-$(date '+%s') -I $LSN -x

The time taken by this operation has been much lower than the previous one and the size is much lower (I have not done any operation in the meantime):
du -hs backup-1421226*
1,5G    backup-1421226427
17M     backup-1421226427

I have done some checks on the file system and then used the prototype of recovery script in Python written by Marco.

./recover.py backup-1421226427 backup-1421226427 new-data

The cluster started successfully. I have then run a pg_dump of the pgbench database and were able to reload it on the initial cluster.

== Conclusions

The first run of this patch seems promising.

While the discussion on the LSN map continues (which is mainly an optimisation of this patch), I would really like to see this patch progress as it would be a killer feature in several contexts (not in every context).

Just in this period we are releasing file based incremental backup for Barman and customers using the alpha version are experiencing on average a deduplication ratio between 50% to 70%. This is for example an excerpt of "barman show-backup" from one of our customers (a daily saving of 550GB is not bad):

 Base backup information:
    Disk usage           : 1.1 TiB (1.1 TiB with WALs)
    Incremental size     : 564.6 GiB (-50.60%)
    ...

My opinion, Marco, is that for version 5 of this patch, you:

1) update the information on the wiki (it is outdated - I know you have been busy with LSN map optimisation)
2) modify pg_basebackup in order to accept a directory (or tar file) and automatically detect the LSN from the backup profile
3) add the documentation regarding the backup profile and pg_basebackup

Once we have all of this, we can continue trying the patch. Some unexplored paths are:

* tablespace usage
* tar format
* performance impact (in both "read-only" and heavily updated contexts)
* consistency checks

I would then leave for version 6 the pg_restorebackup utility (unless you want to do everything at once).

One limitation of the current recovery script is that it cannot accept multiple incremental backups (it just accepts three parameters: base backup, incremental backup and merge destination). Maybe you can change the syntax as follows:

./recover.py DESTINATION BACKUP_1 BACKUP_2 [BACKUP_3, ...]

Thanks a lot for working on this.

I am looking forward to continuing the review.

Ciao,
Gabriele
--
 Gabriele Bartolini - 2ndQuadrant Italia - Managing Director
 PostgreSQL Training, Services and Support
 gabriele.bartolini@2ndQuadrant.it | www.2ndQuadrant.it

2015-01-13 17:21 GMT+01:00 Marco Nenciarini <marco.nenciarini@2ndquadrant.it>:
Il 13/01/15 12:53, Gabriele Bartolini ha scritto:
> Hi Marco,
>
>   could you please send an updated version the patch against the current
> HEAD in order to facilitate reviewers?
>

Here is the updated patch for incremental file based backup.

It is based on the current HEAD.

I'm now working to the client tool to rebuild a full backup starting
from a file based incremental backup.

Regards,
Marco

--
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Merging postgresql.conf and postgresql.auto.conf
Следующее
От: Sawada Masahiko
Дата:
Сообщение: Re: Merging postgresql.conf and postgresql.auto.conf