Обсуждение: pg_basebackup fails if a data file is removed
When pg_basebackup copies data files, it does basically this: > if (lstat(pathbuf, &statbuf) != 0) > { > if (errno != ENOENT) > ereport(ERROR, > (errcode_for_file_access(), > errmsg("could not stat file or directory \"%s\": %m", > pathbuf))); > > /* If the file went away while scanning, it's no error. */ > continue; > } > ... > sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf); There's a race condition there. If the file is removed after the lstat call, and before sendFile opens the file, the backup fails with an error. It's a fairly tight window, so it's difficult to run into by accident, but by putting a breakpoint with a debugger there it's quite easy to reproduce, by e.g doing a VACUUM FULL on the table about to be copied. A straightforward fix is to allow sendFile() to ignore ENOENT. Patch attached. - Heikki
Вложения
On Fri, Dec 21, 2012 at 2:28 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > When pg_basebackup copies data files, it does basically this: > >> if (lstat(pathbuf, &statbuf) != 0) >> { >> if (errno != ENOENT) >> ereport(ERROR, >> (errcode_for_file_access(), >> errmsg("could not stat file or directory >> \"%s\": %m", >> pathbuf))); >> >> /* If the file went away while scanning, it's no error. */ >> continue; >> } > >> ... >> sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf); > > There's a race condition there. If the file is removed after the lstat call, > and before sendFile opens the file, the backup fails with an error. It's a > fairly tight window, so it's difficult to run into by accident, but by > putting a breakpoint with a debugger there it's quite easy to reproduce, by > e.g doing a VACUUM FULL on the table about to be copied. > > A straightforward fix is to allow sendFile() to ignore ENOENT. Patch > attached. Looks good to me. Nice spot - don't tell me you actually ran into it during testing? :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On 21.12.2012 15:30, Magnus Hagander wrote: > On Fri, Dec 21, 2012 at 2:28 PM, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: >> When pg_basebackup copies data files, it does basically this: >> >>> if (lstat(pathbuf,&statbuf) != 0) >>> { >>> if (errno != ENOENT) >>> ereport(ERROR, >>> (errcode_for_file_access(), >>> errmsg("could not stat file or directory >>> \"%s\": %m", >>> pathbuf))); >>> >>> /* If the file went away while scanning, it's no error. */ >>> continue; >>> } >> >>> ... >>> sendFile(pathbuf, pathbuf + basepathlen + 1,&statbuf); >> >> There's a race condition there. If the file is removed after the lstat call, >> and before sendFile opens the file, the backup fails with an error. It's a >> fairly tight window, so it's difficult to run into by accident, but by >> putting a breakpoint with a debugger there it's quite easy to reproduce, by >> e.g doing a VACUUM FULL on the table about to be copied. >> >> A straightforward fix is to allow sendFile() to ignore ENOENT. Patch >> attached. > > Looks good to me. Ok, committed. > Nice spot - don't tell me you actually ran into it > during testing? :) Heh, no, eyeballing the code. - Heikki