On Sat, Mar 1, 2014 at 12:27:19PM -0800, Sean Chittenden wrote:
> The attached patch fixes the case when `pg_dump -Fd …` is called
> on a partition where write(2) fails for some reason or another. In
> this case, backup jobs were returning with a successful exit code even
> though most of the files in the dump directory were all zero length.
>
> I haven’t tested this patch’s failure conditions but the fix seems
> simple enough: cfwrite() needs to have its return status checked
> everywhere and exit_horribly() upon any failure. In this case,
> callers of _WriteData() were not checking the return status and were
> discarding the negative return status (e.g. ENOSPC).
>
> I made a cursory pass over the code and found one other instance where
> write status wasn’t being checked and also included that.
As is often the case with pg_dump, the problems you saw are a small part
of a larger set of problems in that code --- there is general ignoring of
read and write errors.
I have developed a comprehensive patch that addresses all the issues I
could find. The use of function pointers and the calling of functions
directly and through function pointers makes the fix quite complex. I
ended up placing checks at the lowest level and removing checks at
higher layers where they were sporadically placed.
Patch attached. I have tested dump/restore of all four dump output
formats with the 9.3 regression database and all tests passed. I
believe this patch is too complex to backpatch, and I don't know how it
could be fixed more simply.
--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com
+ Everyone has their own god. +