Обсуждение: pg_basebackup fails if a data file is removed

Поиск
Список
Период
Сортировка

pg_basebackup fails if a data file is removed

От
Heikki Linnakangas
Дата:
When pg_basebackup copies data files, it does basically this:

> if (lstat(pathbuf, &statbuf) != 0)
> {
>     if (errno != ENOENT)
>         ereport(ERROR,
>                 (errcode_for_file_access(),
>                  errmsg("could not stat file or directory \"%s\": %m",
>                     pathbuf)));
>
>     /* If the file went away while scanning, it's no error. */
>     continue;
> }
 > ...
 > sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf);

There's a race condition there. If the file is removed after the lstat
call, and before sendFile opens the file, the backup fails with an
error. It's a fairly tight window, so it's difficult to run into by
accident, but by putting a breakpoint with a debugger there it's quite
easy to reproduce, by e.g doing a VACUUM FULL on the table about to be
copied.

A straightforward fix is to allow sendFile() to ignore ENOENT. Patch
attached.

- Heikki

Вложения

Re: pg_basebackup fails if a data file is removed

От
Magnus Hagander
Дата:
On Fri, Dec 21, 2012 at 2:28 PM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> When pg_basebackup copies data files, it does basically this:
>
>> if (lstat(pathbuf, &statbuf) != 0)
>> {
>>         if (errno != ENOENT)
>>                 ereport(ERROR,
>>                                 (errcode_for_file_access(),
>>                                  errmsg("could not stat file or directory
>> \"%s\": %m",
>>                                         pathbuf)));
>>
>>         /* If the file went away while scanning, it's no error. */
>>         continue;
>> }
>
>> ...
>> sendFile(pathbuf, pathbuf + basepathlen + 1, &statbuf);
>
> There's a race condition there. If the file is removed after the lstat call,
> and before sendFile opens the file, the backup fails with an error. It's a
> fairly tight window, so it's difficult to run into by accident, but by
> putting a breakpoint with a debugger there it's quite easy to reproduce, by
> e.g doing a VACUUM FULL on the table about to be copied.
>
> A straightforward fix is to allow sendFile() to ignore ENOENT. Patch
> attached.

Looks good to me. Nice spot - don't tell me you actually ran into it
during testing? :)

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: pg_basebackup fails if a data file is removed

От
Heikki Linnakangas
Дата:
On 21.12.2012 15:30, Magnus Hagander wrote:
> On Fri, Dec 21, 2012 at 2:28 PM, Heikki Linnakangas
> <hlinnakangas@vmware.com>  wrote:
>> When pg_basebackup copies data files, it does basically this:
>>
>>> if (lstat(pathbuf,&statbuf) != 0)
>>> {
>>>          if (errno != ENOENT)
>>>                  ereport(ERROR,
>>>                                  (errcode_for_file_access(),
>>>                                   errmsg("could not stat file or directory
>>> \"%s\": %m",
>>>                                          pathbuf)));
>>>
>>>          /* If the file went away while scanning, it's no error. */
>>>          continue;
>>> }
>>
>>> ...
>>> sendFile(pathbuf, pathbuf + basepathlen + 1,&statbuf);
>>
>> There's a race condition there. If the file is removed after the lstat call,
>> and before sendFile opens the file, the backup fails with an error. It's a
>> fairly tight window, so it's difficult to run into by accident, but by
>> putting a breakpoint with a debugger there it's quite easy to reproduce, by
>> e.g doing a VACUUM FULL on the table about to be copied.
>>
>> A straightforward fix is to allow sendFile() to ignore ENOENT. Patch
>> attached.
>
> Looks good to me.

Ok, committed.

> Nice spot - don't tell me you actually ran into it
> during testing? :)

Heh, no, eyeballing the code.

- Heikki