Re: [PERFORM] Backup taking long time !!!

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: [PERFORM] Backup taking long time !!!
Дата
Msg-id 20170123174737.GK18360@tamriel.snowman.net
обсуждение исходный текст
Ответ на Re: [PERFORM] Backup taking long time !!!  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-performance
* Jeff Janes (jeff.janes@gmail.com) wrote:
> On Mon, Jan 23, 2017 at 7:28 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> > On 1/22/17 11:32 AM, Stephen Frost wrote:
> >> The 1-second window concern is regarding the validity of a subsequent
> >> incremental backup.
> >
> > BTW, there's a simpler scenario here:
> >
> > Postgres touches file.
> > rsync notices file has different timestamp, starts copying.
> > Postgres touches file again.
> >
> > If those 3 steps happen in the same second, you now have an invalid
> > backup. There's probably other scenarios as well.

Ah, yeah, I think the outline I had was why we decided that even a file
with the same timestamp as the start of the backup couldn't be trusted.

> To be clear, you don't have an invalid backup *now*, as replay of the WAL
> will fix it up.  You will have an invalid backup next time you take a
> backup, using a copy of the backup you just took now as the rsync
> destination of that future backup.

Correct.

> If you were to actually fire up a copy of the backup and go through
> recovery, then shut it down, and then use that post-recovery copy as the
> destination of the rsync, would that eliminate the risk (barring clock skew
> between systems)?

I believe it would *change* things, but not eliminate the risk- consider
this: what's the timestamp going to be on the files that were modified
through WAL recovery?  It would be *after* the backup was done.  I
believe (but not sure) that rsync will still copy the file if there's
any difference in timestamp, but it's technically possible that you
could get really unlikely and have the same post-backup timestamp as the
file ends up having when the following backup is done, meaning that the
file isn't copied even though its contents are no longer the same (the
primary server's copy has whatever was written to that file in the same
second that the restored server was writing the WAL replay into the
file).

Admittedly, that's pretty unlikely, but it's not impossible and that's
where you can get into *serious* trouble because it becomes darn near
impossible to figure out what the heck went wrong, and that's just not
cool with backups.

Do it properly, or use something that does.  This isn't where you want
to be playing fast-and-loose.

> > In short, if you're using rsync, it's *critical* that you give it the
> > --checksum option, which tells rsync to ignore file size and timestamp.
>
> Which unfortunately obliterates much of the point of using rsync for many
> people.  You can still save on bandwidth, but not on local IO on each end.

No, it means that rsync is *not* a good tool for doing incremental
backups of PG.  Would be great if we could get more people to understand
that.

'cp' is an equally inappropriate and bad tool for doing WAL archiving,
btw.  Would be great if our docs were clear on that.

Thanks!

Stephen

Вложения

В списке pgsql-performance по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: [PERFORM] Backup taking long time !!!
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: [PERFORM] Backup taking long time !!!