Re: silent data loss with ext4 / all current versions

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: silent data loss with ext4 / all current versions
Дата
Msg-id CAB7nPqTOnCJQH0Kkk6tawn2MT0JtsCzNG+PkUz8Tj6vGQkwfcg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: silent data loss with ext4 / all current versions  (Andres Freund <andres@anarazel.de>)
Ответы Re: silent data loss with ext4 / all current versions  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-hackers
On Tue, Feb 2, 2016 at 1:07 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-01-25 16:30:47 +0900, Michael Paquier wrote:
>> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
>> index a2846c4..b124f90 100644
>> --- a/src/backend/access/transam/xlog.c
>> +++ b/src/backend/access/transam/xlog.c
>> @@ -3278,6 +3278,14 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
>>                                               tmppath, path)));
>>               return false;
>>       }
>> +
>> +     /*
>> +      * Make sure the rename is permanent by fsyncing the parent
>> +      * directory.
>> +      */
>> +     START_CRIT_SECTION();
>> +     fsync_fname(XLOGDIR, true);
>> +     END_CRIT_SECTION();
>>  #endif
>
> Hm. I'm seriously doubtful that using critical sections for this is a
> good idea. What's the scenario you're protecting against by declaring
> this one?  We intentionally don't error out in the isdir cases in
> fsync_fname() cases anyway.
>
> Afaik we need to fsync tmppath before and after the rename, and only
> then the directory, to actually be safe.

Regarding the fsync call on the new file before the rename, would it
be better to extend fsync_fname() with some kind of noerror flag to
work around the case of a file that does not exist or do you think it
is better just to use pg_fsync in this case after getting an fd? Using
directly pg_fsync() looks redundant with what fsync_fname does
already.

>> @@ -5297,6 +5313,9 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
>>                                errmsg("could not rename file \"%s\" to \"%s\": %m",
>>                                               RECOVERY_COMMAND_FILE, RECOVERY_COMMAND_DONE)));
>>
>> +     /* Make sure the rename is permanent by fsyncing the data directory. */
>> +     fsync_fname(".", true);
>> +
>
> Shouldn't RECOVERY_COMMAND_DONE be fsynced first here?

Check.

>>               /*
>> @@ -6525,6 +6550,13 @@ StartupXLOG(void)
>>                                                               TABLESPACE_MAP, TABLESPACE_MAP_OLD)));
>>               }
>>
>> +             /*
>> +              * Make sure the rename is permanent by fsyncing the parent
>> +              * directory.
>> +              */
>> +             if (haveBackupLabel || haveTblspcMap)
>> +                     fsync_fname(".", true);
>> +
>
> Isn't that redundant with the haveTblspcMap case above?

I am not sure I get your point here. Are you referring to the fact
that fsync should be done after each rename in this case?

>>               /* Check that the GUCs used to generate the WAL allow recovery */
>>               CheckRequiredParameterValues();
>>
>> @@ -7305,6 +7337,12 @@ StartupXLOG(void)
>>                                                errmsg("could not rename file \"%s\" to \"%s\": %m",
>>                                                               origpath, partialpath)));
>>                               XLogArchiveNotify(partialfname);
>> +
>> +                             /*
>> +                              * Make sure the rename is permanent by fsyncing the parent
>> +                              * directory.
>> +                              */
>> +                             fsync_fname(XLOGDIR, true);
>
> .partial should be fsynced first.

Check.

>>                       }
>>               }
>>       }
>> @@ -10905,6 +10943,9 @@ CancelBackup(void)
>>                                                  BACKUP_LABEL_FILE, BACKUP_LABEL_OLD,
>>                                                  TABLESPACE_MAP, TABLESPACE_MAP_OLD)));
>>       }
>> +
>> +     /* fsync the data directory to persist the renames */
>> +     fsync_fname(".", true);
>>  }
>
> Same.

Re-check.

>>  /*
>> diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
>> index 277c14a..8dda80b 100644
>> --- a/src/backend/access/transam/xlogarchive.c
>> +++ b/src/backend/access/transam/xlogarchive.c
>> @@ -477,6 +477,12 @@ KeepFileRestoredFromArchive(char *path, char *xlogfname)
>>                                               path, xlogfpath)));
>>
>>       /*
>> +      * Make sure the renames are permanent by fsyncing the parent
>> +      * directory.
>> +      */
>> +     fsync_fname(XLOGDIR, true);
>
> Afaics the file under the temporary filename has not been fsynced up to
> here.

Yes, true, the old file...

>> +     /*
>>        * Create .done file forcibly to prevent the restored segment from being
>>        * archived again later.
>>        */
>> @@ -586,6 +592,11 @@ XLogArchiveForceDone(const char *xlog)
>>                                        errmsg("could not rename file \"%s\" to \"%s\": %m",
>>                                                       archiveReady, archiveDone)));
>>
>> +             /*
>> +              * Make sure the rename is permanent by fsyncing the parent
>> +              * directory.
>> +              */
>> +             fsync_fname(XLOGDIR "/archive_status", true);
>>               return;
>>       }
>
> Afaics the AllocateFile() call below is not protected at all, no?

Yep.

> How about introducing a 'safe_rename()' that does something roughly akin
> to:
> fsync(oldname);
> fsync(fname) || true;
> rename(oldfname, fname);
> fsync(fname);
> fsync(basename(fname));
>
> I'd rather have that kind of logic somewhere once, instead of repeated a
> dozen times...

Not wrong, and this leads to the following:
void rename_safe(const char *old, const char *new, bool isdir, int elevel);
Controlling elevel is necessary per the multiple code paths that would
use it. Some use ERROR, most of them FATAL, and a bit of WARNING. Does
that look fine?
-- 
Michael



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andreas Joseph Krogh
Дата:
Сообщение: Re: [PATCH] Phrase search ported to 9.6
Следующее
От: Masahiko Sawada
Дата:
Сообщение: Re: Freeze avoidance of very large table.