Re: block-level incremental backup

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: block-level incremental backup
Дата
Msg-id CA+TgmobNsVKCEcU_3MOjVmrHf4nLTZj6KHs=-nwaptHrTUOz+w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: block-level incremental backup  (Jeevan Chalke <jeevan.chalke@enterprisedb.com>)
Ответы Re: block-level incremental backup  (Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>)
Re: block-level incremental backup  (Jeevan Chalke <jeevan.chalke@enterprisedb.com>)
Список pgsql-hackers
On Fri, Aug 16, 2019 at 6:23 AM Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
> [ patches ]

Reviewing 0002 and 0003:

- Commit message for 0003 claims magic number and checksum are 0, but
that (fortunately) doesn't seem to be the case.

- looks_like_rel_name actually checks whether it looks like a
*non-temporary* relation name; suggest adjusting the function name.

- The names do_full_backup and do_incremental_backup are quite
confusing because you're really talking about what to do with one
file.  I suggest sendCompleteFile() and sendPartialFile().

- Is there any good reason to have 'refptr' as a global variable, or
could we just pass the LSN around via function arguments?  I know it's
just mimicking startptr, but storing startptr in a global variable
doesn't seem like a great idea either, so if it's not too annoying,
let's pass it down via function arguments instead.  Also, refptr is a
crappy name (even worse than startptr); whether we end up with a
global variable or a bunch of local variables, let's make the name(s)
clear and unambiguous, like incremental_reference_lsn.  Yeah, I know
that's long, but I still think it's better than being unclear.

- do_incremental_backup looks like it can never report an error from
fread(), which is bad.  But I see that this is just copied from the
existing code which has the same problem, so I started a separate
thread about that.

- I think that passing cnt and blkindex to verify_page_checksum()
doesn't look very good from an abstraction point of view.  Granted,
the existing code isn't great either, but I think this makes the
problem worse.  I suggest passing "int backup_distance" to this
function, computed as cnt - BLCKSZ * blkindex.  Then, you can
fseek(-backup_distance), fread(BLCKSZ), and then fseek(backup_distance
- BLCKSZ).

- While I generally support the use of while and for loops rather than
goto for flow control, a while (1) loop that ends with a break is
functionally a goto anyway.  I think there are several ways this could
be revised.  The most obvious one is probably to use goto, but I vote
for inverting the sense of the test: if (PageIsNew(page) ||
PageGetLSN(page) >= startptr) break; This approach also saves a level
of indentation for more than half of the function.

- I am not sure that it's a good idea for sendwholefile = true to
result in dumping the entire file onto the wire in a single CopyData
message.  I don't know of a concrete problem in typical
configurations, but someone who increases RELSEG_SIZE might be able to
overflow CopyData's length word.  At 2GB the length word would be
negative, which might break, and at 4GB it would wrap around, which
would certainly break.  See CopyData in
https://www.postgresql.org/docs/12/protocol-message-formats.html  To
avoid this issue, and maybe some others, I suggest defining a
reasonably large chunk size, say 1MB as a constant in this file
someplace, and sending the data as a series of chunks of that size.

- I don't think that the way concurrent truncation is handled is
correct for partial files.  Right now it just falls through to code
which appends blocks of zeroes in either the complete-file or
partial-file case.  I think that logic should be moved into the
function that handles the complete-file case.  In the partial-file
case, the blocks that we actually send need to match the list of block
numbers we promised to send.  We can't just send the promised blocks
and then tack a bunch of zero-filled blocks onto the end that the file
header doesn't know about.

- For reviewer convenience, please use the -v option to git
format-patch when posting and reposting a patch series.  Using -v2,
-v3, etc. on successive versions really helps.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: basebackup.c's sendFile() ignores read errors
Следующее
От: Jaime Casanova
Дата:
Сообщение: Re: PostgreSQL and Real Application Testing (RAT)