Re: [PATCH] Verify Checksums during Basebackups

Поиск
Список
Период
Сортировка
От Magnus Hagander
Тема Re: [PATCH] Verify Checksums during Basebackups
Дата
Msg-id CABUevEy-abw5HyjpQvJJQKEv3r46wGHesgfqxqpYrhucSKnbyA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [PATCH] Verify Checksums during Basebackups  (Magnus Hagander <magnus@hagander.net>)
Список pgsql-hackers
On Tue, Apr 3, 2018 at 1:52 PM, Magnus Hagander <magnus@hagander.net> wrote:


On Mon, Apr 2, 2018 at 2:48 PM, Michael Banck <michael.banck@credativ.de> wrote:
Hi!

On Sun, Apr 01, 2018 at 05:27:21PM +0200, Magnus Hagander wrote:
> It might be a micro-optimisation, but ISTM that we shouldn't do the
> basename(palloc(fn)) in is_heap_file() unless we actually need it -- so not
> at the top of the function. (And surely "." and ".." should not occur here?
> I think that's a result of copy/paste from the online checksum patch?
>
> We also do exactly the same basename(palloc(fn)) in sendFile().  Can we
> find a way to reuse that duplication? Perhaps we want to split it into the
> two pieces already out in sendFile() and pass it in as separate parameters?

I've done the latter now, although it looks a bit weird that the second
argument data is a subset of the first.  I couldn't find another way to
not have it done twice, though.

I agree, but I think it's still cleaner.

On further look, there is actually no need to pstrdup() at all -- we never used the modified part of the string anyway, because we don't care about the oid (unlike pg_verify_checksums).

So I adjusted the patch by that.


> If not then this second one should at least only be done inside the if
> (verify_checksums).

We can't have both, as we need to call the is_heap_file() function in
order to determine whether we should verify the checksums.

Right. I realize that -- thus the "if not". But I guess I was not clear in what I meant -- see attached file for it.


> There is a bigger problem next to that though -- I believe  basename() does
> not exist on Win32. I haven't tested it, but there is zero documentation of
> it existing, which usually indicates it doesn't. That's the part that
> definitely needs to get fixed.
>
> I think you need to look into the functionality in port/path.c, in
> particular last_dir_separator()?

Thanks for the pointer, I've used that now; I mentioned before that
basename() might be a portability hazard, but couldn't find a good
substitute myself.

Yeah, I have a recollection somewhere of running into this before, but I couldn't find any references. But the complete lack of docs about it on msdn.microsoft.com is a clear red flag :)

 

V6 of the patch is attached.

 
Excellent. I've done some mangling on it:

* Changed the is_heap_file to is_checksummed_file (and the associtaed struct name), because this is really what it's about (we for example verify checksums on indexes, which are clearly not heaps)
* Moved the list of files to the top of the file next to the other lists of files/directories
* Added missing function prototype at the top, and changed the parameter names to be a bit more clear
* Added some comments
* Changed the logic around the checksum-check to avoid the pstrdup() and to not call the path functions unless necessary (per comment above)
* "filen" -> "file" in message
* xlog.h does not need to be included
* pgindent

Remaining question:

The check for (cnt % BLCKSZ != 0) currently does "continue", which means that this block of data isn't actually sent to the client at all, which seems completely wrong. We only want to prevent checksum validations.

I have moved the check up a bit, and refactored it so it continues to do the actual transmission of the file if this path is hit. 


And of course I forgot that particular part in the first push, so I've pushed it as a separate commit. 


--

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Arthur Zakirov
Дата:
Сообщение: Re: [PROPOSAL] Shared Ispell dictionaries
Следующее
От: Amit Langote
Дата:
Сообщение: Re: [HACKERS] path toward faster partition pruning