Re: Updated backup APIs for non-exclusive backups

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: Updated backup APIs for non-exclusive backups
Дата
Msg-id CAOuzzgpYa3THUAB8SmhoMB6=ca--WG7PNt161ON=COvxr6go8w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Updated backup APIs for non-exclusive backups  (Laurenz Albe <laurenz.albe@cybertec.at>)
Ответы Re: Updated backup APIs for non-exclusive backups  (Laurenz Albe <laurenz.albe@cybertec.at>)
Список pgsql-hackers
Greetings,

On Sun, Nov 25, 2018 at 14:17 Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Sun, 2018-11-25 at 13:50 -0500, Stephen Frost wrote:
> I don't see any compelling argument for trying to do something half-way
> any more today than I did two years ago when this was being discussed.

That may well be so.  It may be better to make users unhappy than to
make them very unhappy...

But I find the following points unconvincing:

> > I would say the typical use case for the exclusive backup method is
> > the following (and I have seen it often):
> >
> > You have some kind of backup software that does file system backups
> > and is able to run a "pre-backup" and "post-backup" script.
> > The backup is triggered by the backup software.
>
> Seeing it often doesn't make it a good solution.  Running just
> pre-backup and post-backup scripts and copying the filesystem isn't
> enough to perform an online PostgreSQL backup- the WAL needs to be
> collected as well, and you need to make sure that you have all of the
> WAL before the backup can be considered complete.

Yes, that's why "pg_stop_backup" has the "wait_for_archive" parameter.
So this is not a problem.

That doesn’t actually make sure you have all of the WAL reliably saved across the backup, it just cares what archive command returns, which is sadly often a bad thing to depend on.  I certainly wouldn’t rely on only that for any system I cared about. 

> On restore, you're
> going to need to create a recovery.conf (at least in released versions)
> which provides a restore command (needed even in HEAD today) to get the
> old WAL, so having to also create the backup_label file shouldn't be
> that difficult.

You write "recovery.conf" upon recovery, when you have the restored
backup, so you have it on a file system.  No problem adding a file then.

This is entirely different from adding a "backup_label" file to
a backup that has been taken by a backup software in some arbitrary
format in some arbitrary location (think snapshot).

There isn’t any need to write the backup label before you restore the database, just as you write recovery.conf then.

> Lastly, if you really want, you can extract out the data from
> pg_stop_backup in whatever your post-backup script is.

Come on, now.
You usually use backup techniques like that because you can't get
your large database backed up in the available time window otherwise.

I’m not following what you’re trying to get at here, why can’t you extract the data for the backup label from pg_stop_backup..?  Certainly other tools do, even ones that do extremely fast parallel backups..  the two are completely independent.

Did you think I meant pg_basebackup..?  I certaily didn’t.

> > Another thing that is problematic with non-exclusive backups is that
> > you have to write the backup_label file into the backup after the
> > backup has been taken.  With a technique like the above, you cannot
> > easily do that.
>
> ... why not?  You need to create the recovery.conf anyway, and you need
> to be archiving WAL somewhere, so it certainly seems like you could put
> the backup_label there too.

As I said above, you don't add "recovery.conf" to the backup right away,
so these two cases don't compare.

There’s no requirement that you add the backup label contents immediately either, you just need to keep track of it and restore it when you restore the database and create the recovery.conf file.

> > I expect that that will make a lot of users unhappy.
>
> If it means that they implement a better backup strategy, then it's
> probably a good thing, which is the goal of this.

I thought our goal is to provide convenient backup methods...

Correctness would be first and having a broken system because of a crash during a backup isn’t correct.

Ignoring "backup_label" on restart, as I suggested in my previous message,
probably isn't such a hot idea.

Agreed. 

But what's wrong with retaining the exclusive backup method and just
sticking a big "Warning: this may cause a restart to fail after a crash"
on it?  That sure wouldn't be unsafe.

I haven’t seen anyone pushing for it to be removed immediately, but users should not use it and newcomers would be much better served by using the non exclusive api.  There is a reason it was deprecated and it’s because it simply isn’t a good API. Coming along a couple years later and saying that it’s a good API while ignoring the issues that it has doesn’t change that.

Thanks!

Stephen

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Support custom socket directory in pg_upgrade
Следующее
От: Dmitry Dolgov
Дата:
Сообщение: Re: [HACKERS] advanced partition matching algorithm forpartition-wise join