Re: Proposal: Incremental Backup

Поиск
Список
Период
Сортировка
От Claudio Freire
Тема Re: Proposal: Incremental Backup
Дата
Msg-id CAGTBQpZue5-n7fXz+zh8vPxt4tjPff_xCtVopGYEVU+QZ=MKYw@mail.gmail.com
обсуждение исходный текст
Ответ на Proposal: Incremental Backup  (Marco Nenciarini <marco.nenciarini@2ndquadrant.it>)
Ответы Re: Proposal: Incremental Backup  (Robert Haas <robertmhaas@gmail.com>)
Re: Proposal: Incremental Backup  (Marco Nenciarini <marco.nenciarini@2ndquadrant.it>)
Список pgsql-hackers
On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:
> 1. Proposal
> =================================
> Our proposal is to introduce the concept of a backup profile. The backup
> profile consists of a file with one line per file detailing tablespace,
> path, modification time, size and checksum.
> Using that file the BASE_BACKUP command can decide which file needs to
> be sent again and which is not changed. The algorithm should be very
> similar to rsync, but since our files are never bigger than 1 GB per
> file that is probably granular enough not to worry about copying parts
> of files, just whole files.

That wouldn't nearly as useful as the LSN-based approach mentioned before.

I've had my share of rsyncing live databases (when resizing
filesystems, not for backup, but the anecdotal evidence applies
anyhow) and with moderately write-heavy databases, even if you only
modify a tiny portion of the records, you end up modifying a huge
portion of the segments, because the free space choice is random.

There have been patches going around to change the random nature of
that choice, but none are very likely to make a huge difference for
this application. In essence, file-level comparisons get you only a
mild speed-up, and are not worth the effort.

I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids
the I/O of inspecting the LSN of entire segments (necessary
optimization for huge multi-TB databases) and backups only the
portions modified when segments do contain changes, so it's the best
of both worlds. Any partial implementation would either require lots
of I/O (LSN only) or save very little (file only) unless it's an
almost read-only database.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: implement subject alternative names support for SSL connections
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Proposal: Incremental Backup