Re: Proposal: Incremental Backup

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Proposal: Incremental Backup
Дата
Msg-id CA+TgmoaOCjx0T_DN+-ybscO-pbS00DJxLk3nndug1CSi=KD+jg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Proposal: Incremental Backup  (Marco Nenciarini <marco.nenciarini@2ndquadrant.it>)
Ответы Re: Proposal: Incremental Backup  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Tue, Jul 29, 2014 at 12:35 PM, Marco Nenciarini
<marco.nenciarini@2ndquadrant.it> wrote:
>> I agree with much of that.  However, I'd question whether we can
>> really seriously expect to rely on file modification times for
>> critical data-integrity operations.  I wouldn't like it if somebody
>> ran ntpdate to fix the time while the base backup was running, and it
>> set the time backward, and the next differential backup consequently
>> omitted some blocks that had been modified during the base backup.
>
> Our proposal doesn't rely on file modification times for data integrity.

Good.

> We are using the file mtime only as a fast indication that the file has
> changed, and transfer it again without performing the checksum.
> If timestamp and size match we rely on *checksums* to decide if it has
> to be sent.

So an incremental backup reads every block in the database and
transfers only those that have changed?  (BTW, I'm just asking.
That's OK with me for a first version; we can make improve it, shall
we say, incrementally.)

Why checksums (which have an arbitrarily-small chance of indicating a
match that doesn't really exist) rather than LSNs (which have no
chance of making that mistake)?

> In "SMART MODE" we would use the file mtime to skip the checksum check
> in some cases, but it wouldn't be the default operation mode and it will
> have all the necessary warnings attached. However the "SMART MODE" isn't
> a core part of our proposal, and can be delayed until we agree on the
> safest way to bring it to the end user.

That's not a mode I'd feel comfortable calling "smart".  More like
"roulette mode".

IMV, the way to eventually make this efficient is to have a background
process that reads the WAL and figures out which data blocks have been
modified, and tracks that someplace.  Then we can send a precisely
accurate backup without relying on either modification times or
reading the full database.  If Heikki's patch to standardize the way
this kind of information is represented in WAL gets committed, this
should get a lot easier to implement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: B-Tree support function number 3 (strxfrm() optimization)
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: pgaudit - an auditing extension for PostgreSQL