On 2014-01-22 10:14:27 -0500, Robert Haas wrote:
> On Wed, Jan 22, 2014 at 9:48 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2014-01-18 08:35:47 -0500, Robert Haas wrote:
> >> > I am not sure I understand that point. We can either update the
> >> > in-memory bit before performing the on-disk operations or
> >> > afterwards. Either way, there's a way to be inconsistent if the disk
> >> > operation fails somewhere inbetween (it might fail but still have
> >> > deleted the file/directory!). The normal way to handle that in other
> >> > places is PANICing when we don't know so we recover from the on-disk
> >> > state.
> >> > I really don't see the problem here? Code doesn't get more robust by
> >> > doing s/PANIC/ERROR/, rather the contrary. It takes extra smarts to only
> >> > ERROR, often that's not warranted.
> >>
> >> People get cranky when the database PANICs because of a filesystem
> >> failure. We should avoid that, especially when it's trivial to do so.
> >> The update to shared memory should be done second and should be set
> >> up to be no-fail.
> >
> > I don't see how that would help. If we fail during unlink/rmdir, we
> > don't really know at which point we failed.
>
> This doesn't make sense to me. unlink/rmdir are atomic operations.
Yes, individual operations should be, but you cannot be sure whether a
rename()/unlink() will survive a crash until the directory is
fsync()ed. So, what is one going to do if the unlink suceeded, but the
fsync didn't?
Deletion currently works like: if (rename(path, tmppath) != 0) ereport(ERROR,
(errcode_for_file_access(), errmsg("could not rename \"%s\" to \"%s\": %m", path,
tmppath)));
/* make sure no partial state is visible after a crash */ fsync_fname(tmppath, false);
fsync_fname("pg_replslot",true);
if (!rmtree(tmppath, true)) { ereport(ERROR, (errcode_for_file_access(),
errmsg("couldnot remove directory \"%s\": %m", tmppath))); }
If we fail between the rename() and the fsync_fname() we don't really
know which state we are in. We'd also have to add code to handle
incomplete slot directories, which currently only exists for startup, to
other places.
Greetings,
Andres Freund
-- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services