Re: basebackup.c's sendFile() ignores read errors

Поиск
Список
Период
Сортировка
От Jeevan Ladhe
Тема Re: basebackup.c's sendFile() ignores read errors
Дата
Msg-id CAOgcT0Og1NfVu_Z68gdss_bPZpCR6XyHkYFTfYpCzEF8QOs2_A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: basebackup.c's sendFile() ignores read errors  (Jeevan Chalke <jeevan.chalke@enterprisedb.com>)
Ответы Re: basebackup.c's sendFile() ignores read errors  (Jeevan Chalke <jeevan.chalke@enterprisedb.com>)
Список pgsql-hackers
Hi Jeevan,

On Wed, Aug 28, 2019 at 10:26 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:


On Tue, Aug 27, 2019 at 10:33 PM Robert Haas <robertmhaas@gmail.com> wrote:
While reviewing a proposed patch to basebackup.c this morning, I found
myself a bit underwhelmed by the quality of the code and comments in
basebackup.c's sendFile(). I believe it's already been pointed out
that the the retry logic here is not particularly correct, and the
comments demonstrate a pretty clear lack of understanding of how the
actual race conditions that are possible here might operate.

However, I then noticed another problem which I think is significantly
more serious: this code totally ignores the possibility of a failure
in fread().  It just assumes that if fread() fails to return a
positive value, it's reached the end of the file, and if that happens,
it just pads out the rest of the file with zeroes.  So it looks to me
like if fread() encountered, say, an I/O error while trying to read a
data file, and if that error were on the first data block, then the
entire contents of that file would be replaced with zero bytes in the
backup, and no error or warning of any kind would be given to the
user.  If it hit the error later, everything from the point of the
error onward would just get replaced with zero bytes.  To be clear, I
think it is fine for basebackup.c to fill out the rest of the expected
length with zeroes if in fact the file has been truncated during the
backup; recovery should fix it.  But recovery is not going to fix data
that got eaten because EIO was encountered while copying a file.

Per fread(3) manpage, we cannot distinguish between an error or EOF. So to
check for any error we must use ferror() after fread(). Attached patch which
tests ferror() and throws an error if it returns true. 

You are right. This seems the right approach to me. I can see at least couple
of places already using ferror() to guard errors of fread() like CopyGetData(),
read_backup_label() etc.
 
However, I think
fread()/ferror() both do not set errno (per some web reference) and thus
throwing a generic error here. I have manually tested it.

If we are interested in getting the errno, then instead of fread(), we can
use read(). But, here, in particular, we are not decision making anything
depending on errno so I think we should be fine with your current patch.
 

The logic that rereads a block appears to have the opposite problem.
Here, it will detect an error, but it will also fail in the face of a
concurrent truncation, which it shouldn't.

For this, I have checked for feof() assuming that when the file gets truncated
we reach EOF. And if yes, getting out of the loop instead of throwing an error.
I may be wrong as I couldn't reproduce this issue.

I had a look at the patch and this seems correct to me.

Few minor comments:

+ /* Check fread() error. */
+ CHECK_FREAD_ERROR(fp, pathbuf);
+

The comments above the macro call at both the places are not necessary as
your macro name itself is self-explanatory.

----------
+ /*
+ * If file is truncated, then we will hit
+ * end-of-file error in which case we don't
+ * want to error out, instead just pad it with
+ * zeros.
+ */
+ if (feof(fp))

The if block does not do the truncation right away, so I think the comment
above can be reworded to explain why we reset cnt?

Regards,
Jeevan Ladhe

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Richard Guo
Дата:
Сообщение: Re: A problem about partitionwise join
Следующее
От: Sergei Kornilov
Дата:
Сообщение: Re: pg_get_databasebyid(oid)