Re: [PATCH] Verify Checksums during Basebackups

Поиск
Список
Период
Сортировка
От Magnus Hagander
Тема Re: [PATCH] Verify Checksums during Basebackups
Дата
Msg-id CABUevEz4GCyrOr_o48tN-Yvpu61WN4vTLO2W46TXvRKi5L04Qw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Verify Checksums during Basebackups  (Michael Banck <michael.banck@credativ.de>)
Ответы Re: [PATCH] Verify Checksums during Basebackups  (Magnus Hagander <magnus@hagander.net>)
Список pgsql-hackers


On Mon, Apr 2, 2018 at 2:48 PM, Michael Banck <michael.banck@credativ.de> wrote:
Hi!

On Sun, Apr 01, 2018 at 05:27:21PM +0200, Magnus Hagander wrote:
> It might be a micro-optimisation, but ISTM that we shouldn't do the
> basename(palloc(fn)) in is_heap_file() unless we actually need it -- so not
> at the top of the function. (And surely "." and ".." should not occur here?
> I think that's a result of copy/paste from the online checksum patch?
>
> We also do exactly the same basename(palloc(fn)) in sendFile().  Can we
> find a way to reuse that duplication? Perhaps we want to split it into the
> two pieces already out in sendFile() and pass it in as separate parameters?

I've done the latter now, although it looks a bit weird that the second
argument data is a subset of the first.  I couldn't find another way to
not have it done twice, though.

I agree, but I think it's still cleaner.

On further look, there is actually no need to pstrdup() at all -- we never used the modified part of the string anyway, because we don't care about the oid (unlike pg_verify_checksums).

So I adjusted the patch by that.


> If not then this second one should at least only be done inside the if
> (verify_checksums).

We can't have both, as we need to call the is_heap_file() function in
order to determine whether we should verify the checksums.

Right. I realize that -- thus the "if not". But I guess I was not clear in what I meant -- see attached file for it.


> There is a bigger problem next to that though -- I believe  basename() does
> not exist on Win32. I haven't tested it, but there is zero documentation of
> it existing, which usually indicates it doesn't. That's the part that
> definitely needs to get fixed.
>
> I think you need to look into the functionality in port/path.c, in
> particular last_dir_separator()?

Thanks for the pointer, I've used that now; I mentioned before that
basename() might be a portability hazard, but couldn't find a good
substitute myself.

Yeah, I have a recollection somewhere of running into this before, but I couldn't find any references. But the complete lack of docs about it on msdn.microsoft.com is a clear red flag :)

 

V6 of the patch is attached.

 
Excellent. I've done some mangling on it:

* Changed the is_heap_file to is_checksummed_file (and the associtaed struct name), because this is really what it's about (we for example verify checksums on indexes, which are clearly not heaps)
* Moved the list of files to the top of the file next to the other lists of files/directories
* Added missing function prototype at the top, and changed the parameter names to be a bit more clear
* Added some comments
* Changed the logic around the checksum-check to avoid the pstrdup() and to not call the path functions unless necessary (per comment above)
* "filen" -> "file" in message
* xlog.h does not need to be included
* pgindent

Remaining question:

The check for (cnt % BLCKSZ != 0) currently does "continue", which means that this block of data isn't actually sent to the client at all, which seems completely wrong. We only want to prevent checksum validations.

I have moved the check up a bit, and refactored it so it continues to do the actual transmission of the file if this path is hit. 

I have pushed an updated patch with those changes. Please review the result and let me know I broke something :)


Thanks!!

--

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dmitry Ivanov
Дата:
Сообщение: Re: new function for tsquery creartion
Следующее
От: Arthur Zakirov
Дата:
Сообщение: Re: [PROPOSAL] Shared Ispell dictionaries