Re: pg_combinebackup fails on file named INCREMENTAL.*

Поиск
Список
Период
Сортировка
От David Steele
Тема Re: pg_combinebackup fails on file named INCREMENTAL.*
Дата
Msg-id 44b65bfe-847e-4710-847f-10585cc74eba@pgmasters.net
обсуждение исходный текст
Ответ на Re: pg_combinebackup fails on file named INCREMENTAL.*  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: pg_combinebackup fails on file named INCREMENTAL.*  (Stefan Fercot <stefan.fercot@protonmail.com>)
Re: pg_combinebackup fails on file named INCREMENTAL.*  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 4/16/24 06:33, Robert Haas wrote:
> On Sun, Apr 14, 2024 at 11:53 PM David Steele <david@pgmasters.net> wrote:
>> Since incremental backup is using INCREMENTAL as a keyword (rather than
>> storing incremental info in the manifest) it is vulnerable to any file
>> in PGDATA with the pattern INCREMENTAL.*.
> 
> Yeah, that's true. I'm not greatly concerned about this, because I
> think anyone who creates a file called INCREMENTAL.CONFIG is just
> being perverse. 

Well, it's INCREMENTAL.* and you might be surprised (though I doubt it) 
at what users will do. We've been caught out by wacky file names (and 
database names) a few times.

> However, we could ignore those files just in subtrees
> where they're not expected to occur i.e. only pay attention to them
> under base, global, and pg_tblspc. I didn't do this because I thought
> someone might eventually want to do something like incremental backup
> of SLRU files, and then it would be annoying if there were a bunch of
> random pathname restrictions. But if we're really concerned about
> this, I think it would be a reasonable fix; you're not really supposed
> to decide to store your configuration files in directories that exist
> for the purpose of storing relation data files.

I think it would be reasonable to restrict what can be put in base/ and 
global/ but users generally feel free to create whatever they want in 
the root of PGDATA, despite being strongly encouraged not to.

Anyway, I think it should be fixed or documented as a caveat since it 
causes a hard failure on restore.

>> We could fix the issue by forbidding this file pattern in PGDATA, i.e.
>> error when it is detected during pg_basebackup, but in my view it would
>> be better (at least eventually) to add incremental info to the manifest.
>> That would also allow us to skip storing zero-length files and
>> incremental stubs (with no changes) as files.
> 
> I did think about this, and I do like the idea of being able to elide
> incremental stubs. If you have a tremendous number of relations and
> very few of them have changed at all, the incremental stubs might take
> up a significant percentage of the space consumed by the incremental
> backup, which kind of sucks, even if the incremental backup is still
> quite small compared to the original database. And, like you, I had
> the idea of trying to use the backup_manifest to do it.
> 
> But ... I didn't really end up feeling very comfortable with it. Right
> now, the backup manifest is something we only use to verify the
> integrity of the backup. If we were to do this, it would become a
> critical part of the backup. 

For my 2c the manifest is absolutely a critical part of the backup. I'm 
having a hard time even seeing why we have the --no-manifest option for 
pg_combinebackup at all. If the manifest is missing who knows what else 
might be missing as well? If present, why wouldn't we use it?

I know Tomas added some optimizations that work best with --no-manifest 
but if we can eventually read compressed tars (which I expect to be the 
general case) then those optimizations are not very useful.

> I don't think I like the idea that
> removing the backup_manifest should be allowed to, in effect, corrupt
> the backup. But I think I dislike even more the idea that the data
> that is used to verify the backup gets mushed together with the backup
> data itself. Maybe in practice it's fine, but it doesn't seem very
> conceptually clean.

I don't think this is a problem. The manifest can do more than store 
verification info, IMO.

> There are also some practical considerations that made me not want to
> go in this direction: we'd need to make sure that the relevant
> information from the backup manifest was available to the
> reconstruction process at the right part of the code. When iterating
> over a directory, it would need to consider all of the files actually
> present in that directory plus any "hallucinated" incremental stubs
> from the manifest. I didn't feel confident of my ability to implement
> that in the available time without messing anything up.

I think it would be better to iterate over the manifest than files in a 
directory. In any case we still need to fix [1], which presents more or 
less the same problem.

> I think we should consider other possible designs here. For example,
> instead of including this in the manifest, we could ship one
> INCREMENTAL.STUBS file per directory that contains a list of all of
> the incremental stubs that should be created in that directory. My
> guess is that something like this will turn out to be way simpler to
> code.

So we'd store a mini manifest per directory rather than just put the 
info in the main manifest? Sounds like unnecessary complexity to me.

Regards,
-David

[1] 
https://www.postgresql.org/message-id/flat/9badd24d-5bd9-4c35-ba85-4c38a2feb73e%40pgmasters.net



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Stability of queryid in minor versions
Следующее
От: "Hayato Kuroda (Fujitsu)"
Дата:
Сообщение: RE: Slow catchup of 2PC (twophase) transactions on replica in LR