When pg_basebackup copies data files, it does basically this:
> if (lstat(pathbuf, &statbuf) != 0)
> {
> if (errno != ENOENT)
> ereport(ERROR,
> (errcode_for_file_access(),
> errmsg("could not stat file or directory \"%s\": %m",
> pathbuf)));
>
> /* If the file went away while scanning, it's no error. */
> continue;
> }
> ...
> sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf);
There's a race condition there. If the file is removed after the lstat
call, and before sendFile opens the file, the backup fails with an
error. It's a fairly tight window, so it's difficult to run into by
accident, but by putting a breakpoint with a debugger there it's quite
easy to reproduce, by e.g doing a VACUUM FULL on the table about to be
copied.
A straightforward fix is to allow sendFile() to ignore ENOENT. Patch
attached.
- Heikki