Re: pg_combinebackup fails on file named INCREMENTAL.*
От | David Steele |
---|---|
Тема | Re: pg_combinebackup fails on file named INCREMENTAL.* |
Дата | |
Msg-id | 44b65bfe-847e-4710-847f-10585cc74eba@pgmasters.net обсуждение исходный текст |
Ответ на | Re: pg_combinebackup fails on file named INCREMENTAL.* (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: pg_combinebackup fails on file named INCREMENTAL.*
(Stefan Fercot <stefan.fercot@protonmail.com>)
Re: pg_combinebackup fails on file named INCREMENTAL.* (Robert Haas <robertmhaas@gmail.com>) |
Список | pgsql-hackers |
On 4/16/24 06:33, Robert Haas wrote: > On Sun, Apr 14, 2024 at 11:53 PM David Steele <david@pgmasters.net> wrote: >> Since incremental backup is using INCREMENTAL as a keyword (rather than >> storing incremental info in the manifest) it is vulnerable to any file >> in PGDATA with the pattern INCREMENTAL.*. > > Yeah, that's true. I'm not greatly concerned about this, because I > think anyone who creates a file called INCREMENTAL.CONFIG is just > being perverse. Well, it's INCREMENTAL.* and you might be surprised (though I doubt it) at what users will do. We've been caught out by wacky file names (and database names) a few times. > However, we could ignore those files just in subtrees > where they're not expected to occur i.e. only pay attention to them > under base, global, and pg_tblspc. I didn't do this because I thought > someone might eventually want to do something like incremental backup > of SLRU files, and then it would be annoying if there were a bunch of > random pathname restrictions. But if we're really concerned about > this, I think it would be a reasonable fix; you're not really supposed > to decide to store your configuration files in directories that exist > for the purpose of storing relation data files. I think it would be reasonable to restrict what can be put in base/ and global/ but users generally feel free to create whatever they want in the root of PGDATA, despite being strongly encouraged not to. Anyway, I think it should be fixed or documented as a caveat since it causes a hard failure on restore. >> We could fix the issue by forbidding this file pattern in PGDATA, i.e. >> error when it is detected during pg_basebackup, but in my view it would >> be better (at least eventually) to add incremental info to the manifest. >> That would also allow us to skip storing zero-length files and >> incremental stubs (with no changes) as files. > > I did think about this, and I do like the idea of being able to elide > incremental stubs. If you have a tremendous number of relations and > very few of them have changed at all, the incremental stubs might take > up a significant percentage of the space consumed by the incremental > backup, which kind of sucks, even if the incremental backup is still > quite small compared to the original database. And, like you, I had > the idea of trying to use the backup_manifest to do it. > > But ... I didn't really end up feeling very comfortable with it. Right > now, the backup manifest is something we only use to verify the > integrity of the backup. If we were to do this, it would become a > critical part of the backup. For my 2c the manifest is absolutely a critical part of the backup. I'm having a hard time even seeing why we have the --no-manifest option for pg_combinebackup at all. If the manifest is missing who knows what else might be missing as well? If present, why wouldn't we use it? I know Tomas added some optimizations that work best with --no-manifest but if we can eventually read compressed tars (which I expect to be the general case) then those optimizations are not very useful. > I don't think I like the idea that > removing the backup_manifest should be allowed to, in effect, corrupt > the backup. But I think I dislike even more the idea that the data > that is used to verify the backup gets mushed together with the backup > data itself. Maybe in practice it's fine, but it doesn't seem very > conceptually clean. I don't think this is a problem. The manifest can do more than store verification info, IMO. > There are also some practical considerations that made me not want to > go in this direction: we'd need to make sure that the relevant > information from the backup manifest was available to the > reconstruction process at the right part of the code. When iterating > over a directory, it would need to consider all of the files actually > present in that directory plus any "hallucinated" incremental stubs > from the manifest. I didn't feel confident of my ability to implement > that in the available time without messing anything up. I think it would be better to iterate over the manifest than files in a directory. In any case we still need to fix [1], which presents more or less the same problem. > I think we should consider other possible designs here. For example, > instead of including this in the manifest, we could ship one > INCREMENTAL.STUBS file per directory that contains a list of all of > the incremental stubs that should be created in that directory. My > guess is that something like this will turn out to be way simpler to > code. So we'd store a mini manifest per directory rather than just put the info in the main manifest? Sounds like unnecessary complexity to me. Regards, -David [1] https://www.postgresql.org/message-id/flat/9badd24d-5bd9-4c35-ba85-4c38a2feb73e%40pgmasters.net
В списке pgsql-hackers по дате отправления:
Следующее
От: "Hayato Kuroda (Fujitsu)"Дата:
Сообщение: RE: Slow catchup of 2PC (twophase) transactions on replica in LR