Re: [HACKERS] Potential data loss of 2PC files

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: [HACKERS] Potential data loss of 2PC files
Дата
Msg-id AC421935-9B3D-4282-BBFB-A18BF8D31C01@anarazel.de
обсуждение исходный текст
Ответ на Re: [HACKERS] Potential data loss of 2PC files  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [HACKERS] Potential data loss of 2PC files  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers

On December 22, 2016 5:50:38 PM GMT+01:00, Robert Haas <robertmhaas@gmail.com> wrote:
>On Wed, Dec 21, 2016 at 8:30 PM, Michael Paquier
><michael.paquier@gmail.com> wrote:
>> Hi all,
>>
>> 2PC files are created using RecreateTwoPhaseFile() in two places
>currently:
>> - at replay on a XLOG_XACT_PREPARE record.
>> - At checkpoint with CheckPointTwoPhase().
>>
>> Now RecreateTwoPhaseFile() is careful to call pg_fsync() to be sure
>> that the 2PC files find their way into disk. But one piece is
>missing:
>> the parent directory pg_twophase is not fsync'd. At replay this is
>> more sensitive if there is a PREPARE record followed by a checkpoint
>> record. If there is a power failure after the checkpoint completes
>> there is a risk to lose 2PC status files here.
>>
>> It seems to me that we really should have CheckPointTwoPhase() call
>> fsync() on pg_twophase to be sure that no files are lost here. There
>> is no point to do this operation in RecreateTwoPhaseFile() as if
>there
>> are many 2PC transactions to replay performance would be impacted,
>and
>> we don't care about the durability of those files until a checkpoint
>> moves the redo pointer. I have drafted the patch attached to address
>> this issue.
>>
>> I am adding that as well to the next CF for consideration.
>
>It's pretty stupid that operating systems don't guarantee that calling
>pg_fsync() on the file, the file will also be visible in the parent
>directory.  There's not much use case for wanting to make sure that
>the file will have the correct contents but not caring whether it's
>there at all.

It makes more sense of you mentally separate between filename(s) and file contents.  Having to do filesystem metatata
transactionsfor an fsync intended to sync contents would be annoying...
 
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Joe Conway
Дата:
Сообщение: Re: [HACKERS] dblink get_connect_string() passes FDW option"updatable" to the connect string, connection fails.
Следующее
От: Robert Haas
Дата:
Сообщение: Re: [HACKERS] Potential data loss of 2PC files