Re: block-level incremental backup
От | Robert Haas |
---|---|
Тема | Re: block-level incremental backup |
Дата | |
Msg-id | CA+TgmobCumfTmpoiy-cVzEcabEhPinhJ6KpOAg-MfP4d73b+TQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: block-level incremental backup (Stephen Frost <sfrost@snowman.net>) |
Ответы |
Re: block-level incremental backup
(Stephen Frost <sfrost@snowman.net>)
|
Список | pgsql-hackers |
On Mon, Sep 16, 2019 at 3:38 PM Stephen Frost <sfrost@snowman.net> wrote: > As discussed nearby, not everything that needs to be included in the > backup is actually going to be in the WAL though, right? How would that > ever be able to handle the case where someone starts the server under > wal_level = logical, takes a full backup, then restarts with wal_level = > minimal, writes out a bunch of new data, and then restarts back to > wal_level = logical and takes an incremental? Fair point. I think the WAL-scanning approach can only work if wal_level > minimal. But, I also think that few people run with wal_level = minimal in this era where the default has been changed to replica; and I think we can detect the WAL level in use while scanning WAL. It can only change at a checkpoint. > On larger systems, so many of the files are 1GB in size that checking > the file size is quite close to meaningless. Yes, having to checksum > all of the files definitely adds to the cost of taking the backup, but > to avoid it we need strong assurances that a given file hasn't been > changed since our last full backup. WAL, today at least, isn't quite > that, and timestamps can possibly be fooled with, so if you'd like to be > particularly careful, there doesn't seem to be a lot of alternatives. I see your points, but it feels like you're trying to talk down the WAL-based approach over what seem to me to be fairly manageable corner cases. > I'm not asking you to be an expert on those systems, just to help me > understand the statements you're making. How is backing up to a > pgbackrest repo different than running a pg_basebackup in the context of > using some other Enterprise backup system? In both cases, you'll have a > full copy of the backup (presumably compressed) somewhere out on a disk > or filesystem which is then backed up by the Enterprise tool. Well, I think that what people really want is to be able to backup straight into the enterprise tool, without an intermediate step. My basic point here is: As with practically all PostgreSQL development, I think we should try to expose capabilities and avoid making policy on behalf of users. I'm not objecting to the idea of having tools that can help users figure out how much WAL they need to retain -- but insofar as we can do it, such tools should work regardless of where that WAL is actually stored. I dislike the idea that PostgreSQL would provide something akin to a "pgbackrest repository" in core, or I at least I think it would be important that we're careful about how much functionality gets tied to the presence and use of such a thing, because, at least based on my experience working at EnterpriseDB, larger customers often don't want to do it that way. > That's not great, of course, which is why there are trade-offs to be > made, one of which typically involves using timestamps, but doing so > quite carefully, to perform the file exclusion. Other ideas are great > but it seems like WAL isn't really a great idea unless we make some > changes there and we, as in PG, haven't got a robust "we know this file > changed as of this point" to work from. I worry that we're putting too > much faith into a system to do something independent of what it was > actually built and designed to do, and thinking that because we could > trust it for X, we can trust it for Y. That seems like a considerable overreaction to me based on the problems reported thus far. The fact is, WAL was originally intended for crash recovery and has subsequently been generalized to be usable for point-in-time recovery, standby servers, and logical decoding. It's clearly established at this point as the canonical way that you know what in the database has changed, which is the same need that we have for incremental backup. At any rate, the same criticism can be leveled - IMHO with a lot more validity - at timestamps. Last-modification timestamps are completely outside of our control; they are owned by the OS and various operating systems can and do have varying behavior. They can go backwards when things have changed; they can go forwards when things have not changed. They were clearly not intended to meet this kind of requirement. Even, they were intended for that purpose much less so than WAL, which was actually designed for a requirement in this general ballpark, if not this thing precisely. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: