Re: pg_waldump: support decoding of WAL inside tarfile
От | Robert Haas |
---|---|
Тема | Re: pg_waldump: support decoding of WAL inside tarfile |
Дата | |
Msg-id | CA+TgmoZjhWDG_AR1i+L1yss-wbuWvxrdRwSdVUUUnVPrJV2CnQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: pg_waldump: support decoding of WAL inside tarfile (Amul Sul <sulamul@gmail.com>) |
Ответы |
Re: pg_waldump: support decoding of WAL inside tarfile
|
Список | pgsql-hackers |
Here are some review comments on v3-0004: In general, I think this looks pretty nice, but I think it needs more cleanup and polishing. There doesn't seem to be any reason for astreamer_waldump_content_new() to take an astreamer *next argument. If you look at astreamer.h, you'll see that some astreamer_BLAH_new() functions take such an argument, and others don't. The ones that do forward their input to another astreamer; the ones that don't, like astreamer_plain_writer_new(), send it somewhere else. AFAICT, this astreamer is never going to send its output to another astreamer, so there's no reason for this argument. I'm also a little confused by the choice of the name astreamer_waldump_content_new(). I would have thought this would be something like astreamer_waldump_new() or astreamer_xlogreader_new(). The word "content" doesn't seem to me to be adding much here, and it invites confusion with the "content" callback. I think you can merge setup_astreamer() into init_tar_archive_reader(). The only other caller is verify_tar_archive(), but that does exactly the same additional steps as init_tar_archive_reader(), as far as I can see. The return statement for astreamer_wal_read is really odd: + return (count - nbytes) ? (count - nbytes) : -1; Since 0 is false in C, this is equivalent to: count != nbytes ? count - nbytes : -1, but it's a strange way to write it. What makes it even stranger is that it seems as though the intention here is to count the number of bytes read, but you do that by taking the number of bytes requested (count) and subtracting the number of bytes we didn't manage to read (nbytes); and then you just up and return -1 instead of 0 whenever the answer would have been zero. This is all lacking in comments and seems a bit more confusing than it needs to be. So my suggestions are: 1. Consider redefining nbytes to be the number of bytes that you have read instead of the number of bytes you haven't read. So the loop in this function would be while (nbytes < count) instead of while (nbytes > 0). 2. If you need to map 0 to -1, consider having the caller do this instead of putting that inside this function. 3. Add a comment saying what the return value is supposed to be". If you do both 1 and 2, then the return statement can just say "return nbytes;" and the comment can say "Returns the number of bytes successfully read." I would suggest changing the name of the variable from "readBuff" to "readBuf". There are no existing uses of readBuff in the code base. I think this comment also needs improvement: + /* + * Ignore existing data if the required target page has not yet been + * read. + */ + if (recptr >= endPtr) + { + len = 0; + + /* Reset the buffer */ + resetStringInfo(astreamer_buf); + } This comment is problematic for a few reasons. First, we're not ignoring the existing data: we're throwing it out. Second, the comment doesn't say why we're doing what we're doing, only that we're doing it. Here's my guess at the actual explanation -- please correct me if I'm wrong: "pg_waldump never reads the same WAL bytes more than once, so if we're now being asked for data beyond the end of what we've already read, that means none of the data we currently have in the buffer will ever be consulted again. So, we can discard the existing buffer contents and start over." By the way, if this explanation is correct, it might be nice to add an assertion someplace that verifies it, like asserting that we're always reading from an LSN greater than or equal to (or exactly equal to?) the LSN immediately following the last data we read. In general, I wonder whether there's a way to make the separation of concerns between astreamer_wal_read() and TarWALDumpReadPage() cleaner. Right now, the latter is basically a stub, but I'm not sure that is the best thing here. I already mentioned one example of how to do this: make the responsibility for 0 => -1 translation the job of TarWALDumpReadPage() rather than astreamer_wal_read(). But I think there might be a little more we can do. In particular, I wonder whether we could say that astreamer_wal_read() is only responsible for filling the buffer, and the caller, TarWALDumpReadPage() in this case, needs to empty it. That seems like it might produce a cleaner separation of duties. Another thing that isn't so nice right now is that verify_tar_archive() has to open and close the archive only for init_tar_archive_reader() to be called to reopen it again just moments later. It would be nicer to open the file just once and then keep it open. Here again, I wonder if the separation of duties could be a bit cleaner. Is there a real need to pass XLogDumpPrivate to astreamer_wal_read or astreamer_archive_read? The only things that they need are archive_fd, archive_name, archive_streamer, archive_streamer_buf, and archive_streamer_read_ptr. In other words, they really don't care about any of the *existing* things that are in XLogDumpPrivate. This makes me wonder whether we should actually try to make this new astreamer completely independent of xlogreader. In other words, instead of calling it astreamer_waldump() or astreamer_xlogreader() as I proposed above, maybe it could be a completely generic astreamer, say astreamer_stringinfo_new(StringInfo *buf) that just appends to the buffer. That would require also moving the stuff out of astreamer_wal_read() that knows about XLogRecPtr, but why does that function need to know about XLogRecPtr? Couldn't the caller figure out that part and just tell this function how many bytes are needed? -- Robert Haas EDB: http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: