Re: Add notes to pg_combinebackup docs

Поиск

Список

Период

Сортировка

От	Magnus Hagander
Тема	Re: Add notes to pg_combinebackup docs
Дата	12 апреля 2024 г. 09:09:11
Msg-id	CABUevEzJC8VTFuJg5=4Cdcf4gH_OAZH5ozOoa=AKgrc7qwP=-Q@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Add notes to pg_combinebackup docs (David Steele <david@pgmasters.net>)
Ответы	Re: Add notes to pg_combinebackup docs
Список	pgsql-hackers

Дерево обсуждения

On Fri, Apr 12, 2024 at 12:14 AM David Steele <david@pgmasters.net> wrote:

On 4/11/24 20:51, Tomas Vondra wrote:
> On 4/11/24 02:01, David Steele wrote:
>>
>> I have a hard time seeing this feature as being very useful, especially
>> for large databases, until pg_combinebackup works on tar (and compressed
>> tar). Right now restoring an incremental requires at least twice the
>> space of the original cluster, which is going to take a lot of users by
>> surprise.
>
> I do agree it'd be nice if pg_combinebackup worked with .tar directly,
> without having to extract the directories first. No argument there, but
> as I said in the other thread, I believe that's something we can add
> later. That's simply how incremental development works.

OK, sure, but if the plan is to make it practical later doesn't that
make the feature something to be avoided now?

That could be said for any feature. When we shipped streaming replication, the plan was to support synchronous in the future. Should we not have shipped it, or told people to avoid it?

Sure, the current state limits it's uses in some cases. But it still leaves a bunch of other cases where it works just fine.

>> I know you have made some improvements here for COW filesystems, but my
>> experience is that Postgres is generally not run on such filesystems,
>> though that is changing a bit.
>
> I'd say XFS is a pretty common choice, for example. And it's one of the
> filesystems that work great with pg_combinebackup.

XFS has certainly advanced more than I was aware.

And it happens to be the default on at least one of our most common platforms.

> However, who says this has to be the filesystem the Postgres instance
> runs on? Who in their right mind put backups on the same volume as the
> instance anyway? At which point it can be a different filesystem, even
> if it's not ideal for running the database.

My experience is these days backups are generally placed in object
stores. Sure, people are still using NFS but admins rarely have much
control over those volumes. They may or not be COW filesystems.

If it's mounted through NFS I assume pg_combinebackup won't actually be able to use the COW features? Or does that actually work through NFS?

Mounted LUNs on a SAN I find more common today though, and there it would do a fine job.

> FWIW I think it's fine to tell users that to minimize the disk space
> requirements, they should use a CoW filesystem and --copy-file-range.
> The docs don't say that currently, that's true.

That would probably be a good addition to the docs.

+1, that would be a good improvement.

> All of this also depends on how people do the restore. With the CoW
> stuff they can do a quick (and small) copy on the backup server, and
> then copy the result to the actual instance. Or they can do restore on
> the target directly (e.g. by mounting a r/o volume with backups), in
> which case the CoW won't really help.

And again, this all requires a significant amount of setup and tooling.
Obviously I believe good backup requires effort but doing this right
gets very complicated due to the limitations of the tool.

It clearly needs to be documented that there are space needs. But temporarily getting space for something like that is not very complicated in most environments. But you do have to be aware of it.

Generally speaking it's already the case that the "restore experience" with pg_basebackup is far from great. We don't have a "pg_baserestore". You still have to deal with archive_command and restore_command, which we all know can be easy to get wrong. I don't see how this is fundamentally worse than that.

Personally, I tend to recommend that "if you want PITR and thus need to mess with archive_command etc, you should use a backup tool like pg_backrest. If you're fine with just daily backups or whatnot, use pg_basebackup". The incremental backup story fits somewhere in between, but I'd still say this is (today) primarily a tool directed at those that don't need full PITR.

> But yeah, having to keep the backups as expanded directories is not
> great, I'd love to have .tar. Not necessarily because of the disk space
> (in my experience the compression in filesystems works quite well for
> this purpose), but mostly because it's more compact and allows working
> with backups as a single piece of data (e.g. it's much cleared what the
> checksum of a single .tar is, compared to a directory).

But again, object stores are commonly used for backup these days and
billing is based on data stored rather than any compression that can be
done on the data. Of course, you'd want to store the compressed tars in
the object store, but that does mean storing an expanded copy somewhere
to do pg_combinebackup.

Object stores are definitely getting more common. I wish they were getting a lot more common than they actually are, because they simplify a lot. But they're in my experience still very far from being a majority.

But if the argument is that all this can/will be fixed in the future, I
guess the smart thing for users to do is wait a few releases for
incremental backups to become a practical feature.

There's always going to be another set of goalposts further ahead. I think it can still be practical for quite a few people.

I'm more worried about the issue you raised in the other thread about missing files, for example...

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Add notes to pg_combinebackup docs