Re: trying again to get incremental backup
От | Robert Haas |
---|---|
Тема | Re: trying again to get incremental backup |
Дата | |
Msg-id | CA+TgmoafkMay-O7KFyxwYxaT5pZ+ku+qM8hLM88ZzEyq2g-LtQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: trying again to get incremental backup (Alvaro Herrera <alvherre@alvh.no-ip.org>) |
Ответы |
Re: trying again to get incremental backup
Re: trying again to get incremental backup Re: trying again to get incremental backup Re: trying again to get incremental backup |
Список | pgsql-hackers |
On Mon, Nov 13, 2023 at 11:25 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Great stuff you got here. I'm doing a first pass trying to grok the > whole thing for more substantive comments, but in the meantime here are > some cosmetic ones. Thanks, thanks, and thanks. I've fixed some things that you mentioned in the attached version. Other comments below. > In blkreftable.c, I think the definition of SH_EQUAL should have an > outer layer of parentheses. Also, it would be good to provide and use a > function to initialize a BlockRefTableKey from the RelFileNode and > forknum components, and ensure that any padding bytes are zeroed. > Otherwise it's not going to be a great hash key. On my platform there > aren't any (padding bytes), but I think it's unwise to rely on that. I'm having trouble understanding the second part of this suggestion. Note that in a frontend context, SH_RAW_ALLOCATOR is pg_malloc0, and in a backend context, we get the default, which is MemoryContextAllocZero. Maybe there's some case this doesn't cover, though? > These forward struct declarations are not buying you anything, I'd > remove them: I've had problems from time to time when I don't do this. I'll remove it here, but I'm not convinced that it's always useless. > I don't much like the way header files in src/bin/pg_combinebackup files > are structured. Particularly, causing a simplehash to be "instantiated" > just because load_manifest.h is included seems poised to cause pain. I > think there should be a file with the basic struct declarations (no > simplehash); and then maybe since both pg_basebackup and > pg_combinebackup seem to need the same simplehash, create a separate > header file containing just that.. But, did you notice that anything > that includes reconstruct.h will instantiate the simplehash stuff, > because it includes load_manifest.h? It may be unwise to have the > simplehash in a header file. Maybe just declare it in each .c file that > needs it. The duplicity is not that large. I think that I did this correctly. AIUI, if you're defining a simplehash that only one source file needs, you make the scope "static" and do both SH_DECLARE and SH_DEFILE it in that file. If you need it to be shared by multiple files, you make it "extern" in the header file, do SH_DECLARE there, and SH_DEFINE in one of those source files. Or you could make the scope "static inline" in the header file and then you'd both SH_DECLARE and SH_DEFINE it in the header file. If I were to do as you suggest here, I think I'd end up with 2 copies of the compiled code for this instead of one, and if they ever got out of sync everything would break silently. > Why leave unnamed arguments in function declarations? For example, in > > static void manifest_process_file(JsonManifestParseContext *, > char *pathname, > size_t size, > pg_checksum_type checksum_type, > int checksum_length, > uint8 *checksum_payload); > the first argument lacks a name. Is this just an oversight, I hope? I mean, I've changed it now, but I don't think it's worth getting too excited about. "int checksum_length" is much better documentation than just "int," but "JsonManifestParseContext *context" is just noise, IMHO. You can argue that it's better for consistency that way, but whatever. > In GetFileBackupMethod(), which arguments are in and which are out? > The comment doesn't say, and it's not obvious why we pass both the file > path as well as the individual constituent pieces for it. The header comment does document which values are potentially set on return. I guess I thought it was clear enough that the stuff not documented to be output parameters was input parameters. Most of them aren't even pointers, so they have to be input parameters. The only exception is 'path', which I have some difficulty thinking that anyone is going to imagine to be an input pointer. Maybe you could propose a more specific rewording of this comment? FWIW, I'm not altogether sure whether this function is going to get more heavily adjusted in a rev or three of the patch set, so maybe we want to wait to sort this out until this is closer to final, but OTOH if I know what you have in mind for the current version, I might be more likely to keep it in a good place if I end up changing it. > DO_NOT_BACKUP_FILE appears not to be set anywhere. Do you expect to use > this later? If not, maybe remove it. Woops, that was a holdover from an earlier version. > There are two functions named record_manifest_details_for_file() in > different programs. I think this sort of arrangement is not great, as > it is confusing confusing to follow. It would be better if those two > routines were called something like, say, verifybackup_perfile_cb and > combinebackup_perfile_cb instead; then in the function comment say > something like > /* > * JsonManifestParseContext->perfile_cb implementation for pg_combinebackup. > * > * Record details extracted from the backup manifest for one file, > * because we like to keep things tracked or whatever. > */ > so it's easy to track down what does what and why. Same with > perwalrange_cb. "perfile" looks bothersome to me as a name entity. Why > not per_file_cb? and per_walrange_cb? I had trouble figuring out how to name this stuff. I did notice the awkwardness, but surely nobody can think that two functions with the same name in different binaries can be actually the same function. If we want to inject more underscores here, my vote is to go all the way and make it per_wal_range_cb. > In walsummarizer.c, HandleWalSummarizerInterrupts is called in > summarizer_read_local_xlog_page but SummarizeWAL() doesn't do that. > Maybe it should? I replaced all the CHECK_FOR_INTERRUPTS() in that file with HandleWalSummarizerInterrupts(). Does that seem right? > I think this path is not going to be very human-likeable. > snprintf(final_path, MAXPGPATH, > XLOGDIR "/summaries/%08X%08X%08X%08X%08X.summary", > tli, > LSN_FORMAT_ARGS(summary_start_lsn), > LSN_FORMAT_ARGS(summary_end_lsn)); > Why not add a dash between the TLI and between both LSNs, or something > like that? (Also, are we really printing TLIs as 8-byte hexs?) Dealing with the last part first, we already do that in every WAL file name. I actually think these file names are easier to work with than WAL file names, because 000000010000000000000020 is not the WAL starting at 0/20, but rather the WAL starting at 0/20000000. To know at what LSN a WAL file starts, you have to mentally delete characters 17 through 22, which will always be zero, and instead add six zeroes at the end. I don't think whoever came up with that file naming convention deserves an award, unless it's a raspberry award. With these names, you get something like 0000000100000000015125B800000000015128F0.summary and you can sort of see that 1512 repeats so the LSN went from something ending in 5B8 to something ending in 8F0. I actually think it's way better. But I have a hard time arguing that it wouldn't be more readable still if we put some separator characters in there. I didn't do that because then they'd look less like WAL file names, but maybe that's not really a problem. A possible reason not to bother is that these files are less necessary for humans to care about than WAL files, since they don't need to be archived or transported between nodes in any way. Basically I think this is probably fine the way it is, but if you or others think it's really important to change it, I can do that. Just as long as we don't spend 50 emails arguing about which separator character to use. -- Robert Haas EDB: http://www.enterprisedb.com
Вложения
- v9-0005-Add-new-pg_walsummary-tool.patch
- v9-0001-Change-how-a-base-backup-decides-which-files-have.patch
- v9-0002-Move-src-bin-pg_verifybackup-parse_manifest.c-int.patch
- v9-0004-Add-support-for-incremental-backup.patch
- v9-0003-Add-a-new-WAL-summarizer-process.patch
- v9-0006-Test-patch-Enable-summarize_wal-by-default.patch
В списке pgsql-hackers по дате отправления:
Предыдущее
От: David ChristensenДата:
Сообщение: Re: [PATCHES] Post-special page storage TDE support
Следующее
От: Andres FreundДата:
Сообщение: Re: Split index and table statistics into different types of stats