Hi Hackers,
We found a race condition in pg_mkdir_p(), here is a simple reproducer:
#!/bin/sh
basedir=pgdata
datadir=$basedir/a/b/c/d/e/f/g/h/i/j/k/l/m/n/o/p/q/r/s/t/u/v/w/x/y/z
logdir=$basedir/logs
n=2
rm -rf $basedir
mkdir -p $logdir
# init databases concurrently, they will all try to create the parent dirs
for i in `seq $n`; do
initdb -D $datadir/$i >$logdir/$i.log 2>&1 &
done
wait
# there is a chance one of the initdb commands failed to create the datadir
grep 'could not create directory' $logdir/*
The logic in pg_mkdir_p() is as below:
/* check for pre-existing directory */
if (stat(path, &sb) == 0)
{
if (!S_ISDIR(sb.st_mode))
{
if (last)
errno = EEXIST;
else
errno = ENOTDIR;
retval = -1;
break;
}
}
else if (mkdir(path, last ? omode : S_IRWXU | S_IRWXG | S_IRWXO) < 0)
{
retval = -1;
break;
}
This seems buggy as it first checks the existence of the dir and makes the dir if it does not exist yet, however when executing concurrently a possible race condition can be as below:
A: does a/ exists? no
B: does a/ exists? no
A: try to create a/, succeed
B: try to create a/, failed as it already exists
By the way, some callers of pg_mkdir_p() checks for EEXIST explicitly, such as in pg_basebackup.c:
if (pg_mkdir_p(statusdir, pg_dir_create_mode) != 0 && errno != EEXIST)
{
pg_log_error("could not create directory \"%s\": %m", statusdir);
exit(1);
}
This is still wrong with current code logic, because when the statusdir is a file the errno is also EEXIST, but it can pass the check here. Even if we fix pg_mkdir_p() by following the `mkdir -p` way the errno check here is still wrong.
Best Regards
Ning