Re: silent data loss with ext4 / all current versions

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: silent data loss with ext4 / all current versions
Дата
Msg-id 20160303190614.7pvivlxyttgd3xcr@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: silent data loss with ext4 / all current versions  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: silent data loss with ext4 / all current versions
Список pgsql-hackers
Hi,

> +/*
> + * rename_safe -- rename of a file, making it on-disk persistent
> + *
> + * This routine ensures that a rename file persists in case of a crash by using
> + * fsync on the old and new files before and after performing the rename so as
> + * this categorizes as an all-or-nothing operation.
> + */
> +int
> +rename_safe(const char *oldfile, const char *newfile)
> +{
> +    struct stat    filestats;
> +
> +    /*
> +     * First fsync the old entry and new entry, it this one exists, to ensure
> +     * that they are properly persistent on disk. Calling this routine with
> +     * an existing new target file is fine, rename() will first remove it
> +     * before performing its operation.
> +     */
> +    fsync_fname(oldfile, false);
> +    if (stat(newfile, &filestats) == 0)
> +        fsync_fname(newfile, false);

I don't think we want any stat()s here. I'd much, much rather check open
for ENOENT.


> +/*
> + * link_safe -- make a file hard link, making it on-disk persistent
> + *
> + * This routine ensures that a hard link created on a file persists on system
> + * in case of a crash by using fsync where on the link generated as well as on
> + * its parent directory.
> + */
> +int
> +link_safe(const char *oldfile, const char *newfile)
> +{

If we go for a new abstraction here, I'd rather make it
'replace_file_safe' or something, and move the link/rename code #ifdef
into it.


> +    if (link(oldfile, newfile) < 0)
> +        return -1;
> +
> +    /*
> +     * Make the link persistent in case of an OS crash, the new entry
> +     * generated as well as its parent directory need to be flushed.
> +     */
> +    fsync_fname(newfile, false);
> +
> +    /*
> +     * Same for parent directory. This routine is never called to rename
> +     * files across directories, but if this proves to become the case,
> +     * flushing the parent directory if the old file would be necessary.
> +     */
> +    fsync_parent_path(newfile);
> +    return 0;

I think it's a seriously bad idea to encode that knowledge in such a
general sounding routine.  We could however argue that this is about
safely replacing the *target* file; not about safely removing the old
file.


Currently I'm inclined to apply this to master soon. But I think we
might want to wait a while with backpatching.  The recent fsync upgrade
disaster kinda makes me a bit careful...


Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Korotkov
Дата:
Сообщение: Re: improving GROUP BY estimation
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: improving GROUP BY estimation