Re: silent data loss with ext4 / all current versions

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: silent data loss with ext4 / all current versions
Дата
Msg-id 20160201160706.GJ8743@awork2.anarazel.de
обсуждение исходный текст
Ответ на Re: silent data loss with ext4 / all current versions  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: silent data loss with ext4 / all current versions  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-hackers
On 2016-01-25 16:30:47 +0900, Michael Paquier wrote:
> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> index a2846c4..b124f90 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -3278,6 +3278,14 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
>                          tmppath, path)));
>          return false;
>      }
> +
> +    /*
> +     * Make sure the rename is permanent by fsyncing the parent
> +     * directory.
> +     */
> +    START_CRIT_SECTION();
> +    fsync_fname(XLOGDIR, true);
> +    END_CRIT_SECTION();
>  #endif

Hm. I'm seriously doubtful that using critical sections for this is a
good idea. What's the scenario you're protecting against by declaring
this one?  We intentionally don't error out in the isdir cases in
fsync_fname() cases anyway.

Afaik we need to fsync tmppath before and after the rename, and only
then the directory, to actually be safe.

> @@ -5297,6 +5313,9 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
>                   errmsg("could not rename file \"%s\" to \"%s\": %m",
>                          RECOVERY_COMMAND_FILE, RECOVERY_COMMAND_DONE)));
>  
> +    /* Make sure the rename is permanent by fsyncing the data directory. */
> +    fsync_fname(".", true);
> +

Shouldn't RECOVERY_COMMAND_DONE be fsynced first here?

>      ereport(LOG,
>              (errmsg("archive recovery complete")));
>  }
> @@ -6150,6 +6169,12 @@ StartupXLOG(void)
>                                  TABLESPACE_MAP, BACKUP_LABEL_FILE),
>                           errdetail("Could not rename file \"%s\" to \"%s\": %m.",
>                                     TABLESPACE_MAP, TABLESPACE_MAP_OLD)));
> +
> +            /*
> +             * Make sure the rename is permanent by fsyncing the data
> +             * directory.
> +             */
> +            fsync_fname(".", true);
>          }

Is it just me, or are the repeated four line comments a bit grating?

>          /*
> @@ -6525,6 +6550,13 @@ StartupXLOG(void)
>                                  TABLESPACE_MAP, TABLESPACE_MAP_OLD)));
>          }
>  
> +        /*
> +         * Make sure the rename is permanent by fsyncing the parent
> +         * directory.
> +         */
> +        if (haveBackupLabel || haveTblspcMap)
> +            fsync_fname(".", true);
> +

Isn't that redundant with the haveTblspcMap case above?

>          /* Check that the GUCs used to generate the WAL allow recovery */
>          CheckRequiredParameterValues();
>  
> @@ -7305,6 +7337,12 @@ StartupXLOG(void)
>                           errmsg("could not rename file \"%s\" to \"%s\": %m",
>                                  origpath, partialpath)));
>                  XLogArchiveNotify(partialfname);
> +
> +                /*
> +                 * Make sure the rename is permanent by fsyncing the parent
> +                 * directory.
> +                 */
> +                fsync_fname(XLOGDIR, true);

.partial should be fsynced first.

>              }
>          }
>      }
> @@ -10905,6 +10943,9 @@ CancelBackup(void)
>                             BACKUP_LABEL_FILE, BACKUP_LABEL_OLD,
>                             TABLESPACE_MAP, TABLESPACE_MAP_OLD)));
>      }
> +
> +    /* fsync the data directory to persist the renames */
> +    fsync_fname(".", true);
>  }

Same.

>  /*
> diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
> index 277c14a..8dda80b 100644
> --- a/src/backend/access/transam/xlogarchive.c
> +++ b/src/backend/access/transam/xlogarchive.c
> @@ -477,6 +477,12 @@ KeepFileRestoredFromArchive(char *path, char *xlogfname)
>                          path, xlogfpath)));
>  
>      /*
> +     * Make sure the renames are permanent by fsyncing the parent
> +     * directory.
> +     */
> +    fsync_fname(XLOGDIR, true);

Afaics the file under the temporary filename has not been fsynced up to
here.

> +    /*
>       * Create .done file forcibly to prevent the restored segment from being
>       * archived again later.
>       */
> @@ -586,6 +592,11 @@ XLogArchiveForceDone(const char *xlog)
>                       errmsg("could not rename file \"%s\" to \"%s\": %m",
>                              archiveReady, archiveDone)));
>  
> +        /*
> +         * Make sure the rename is permanent by fsyncing the parent
> +         * directory.
> +         */
> +        fsync_fname(XLOGDIR "/archive_status", true);
>          return;
>      }

Afaics the AllocateFile() call below is not protected at all, no?


How about introducing a 'safe_rename()' that does something roughly akin
to:
fsync(oldname);
fsync(fname) || true;
rename(oldfname, fname);
fsync(fname);
fsync(basename(fname));

I'd rather have that kind of logic somewhere once, instead of repeated a
dozen times...

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: silent data loss with ext4 / all current versions
Следующее
От: Andres Freund
Дата:
Сообщение: Re: silent data loss with ext4 / all current versions