Re: Implementing incremental backup

Поиск
Список
Период
Сортировка
От Claudio Freire
Тема Re: Implementing incremental backup
Дата
Msg-id CAGTBQpbk7c2CbJd9PJ1z_ObtDTL9=6_7=1pf3i904iumL31nVA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Implementing incremental backup  (Jim Nasby <jim@nasby.net>)
Список pgsql-hackers
On Wed, Jun 19, 2013 at 3:54 PM, Jim Nasby <jim@nasby.net> wrote:
> On 6/19/13 11:02 AM, Claudio Freire wrote:
>>
>> On Wed, Jun 19, 2013 at 7:13 AM, Tatsuo Ishii <ishii@postgresql.org>
>> wrote:
>>>
>>>
>>> For now, my idea is pretty vague.
>>>
>>> - Record info about modified blocks. We don't need to remember the
>>>    whole history of a block if the block was modified multiple times.
>>>    We just remember that the block was modified since the last
>>>    incremental backup was taken.
>>>
>>> - The info could be obtained by trapping calls to mdwrite() etc. We need
>>>    to be careful to avoid such blocks used in xlogs and temporary
>>>    tables to not waste resource.
>>>
>>> - If many blocks were modified in a file, we may be able to condense
>>>    the info as "the whole file was modified" to reduce the amount of
>>>    info.
>>>
>>> - How to take a consistent incremental backup is an issue. I can't
>>>    think of a clean way other than "locking whole cluster", which is
>>>    obviously unacceptable. Maybe we should give up "hot backup"?
>>
>>
>>
>> I don't see how this is better than snapshotting at the filesystem
>> level. I have no experience with TB scale databases (I've been limited
>> to only hundreds of GB), but from my limited mid-size db experience,
>> filesystem snapshotting is pretty much the same thing you propose
>> there (xfs_freeze), and it works pretty well. There's even automated
>> tools to do that, like bacula, and they can handle incremental
>> snapshots.
>
>
> A snapshot is not the same as an incremental backup; it presents itself as a
> full copy of the filesystem. Actually, since it's on the same underlying
> storage a snapshot isn't really a good backup at all.

Read on bacula[0], which is huge and thus this info may be hard to
find, you can take that snapshot, which will be on the same filesystem
of course, and *then* back it up. So you get a consistent snapshot on
your backup, which means a correct backup, and the backup certainly
doesn't have to be on the same filesystem. It even works for ext3 if
you install the right kernel modules.

Yes, it's a snapshot of the entire filesystem. So it's not the same as
a database-only backup. But it does have a huge overlap don't you
think?

When WAL archiving can get you PITR, and bacula-like tools can get you
incremental and consistent full-FS-snapshot backups, what does the
proposed feature add? I don't think you can get PITR with the proposed
feature, as it takes a snapshot only when told to, and it can't take
multiple snapshots. The only way to get PITR AFAIK is with WAL
archiving, so whether it's viable or not for TB-sized databases is
moot, if it's the only option.

And it will add an overhead. A considerable overhead. Even if you only
have to flip a bit on some page map, it amplifies writes twofold
(unless writes can be coalesced, of which there is no guarantee).

In the end, it may be preferrable to just alter PG's behavior slightly
to make bacula, rsync or whichever tool's job easier. Like trying hard
not to write to cold segments, so entire segments can be skipped by
quick mtime checks.

[0] http://www.bacula.org/en/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Gražvydas Valeika
Дата:
Сообщение: Re: pg_dump cosmetic problem while dumping/restoring rules
Следующее
От: Dimitri Fontaine
Дата:
Сообщение: Re: [PATCH] Remove useless USE_PGXS support in contrib