Re: [HACKERS] Potential data loss of 2PC files

Поиск
Список
Период
Сортировка
От Ashutosh Bapat
Тема Re: [HACKERS] Potential data loss of 2PC files
Дата
Msg-id CAFjFpRfvnvoGbO7R1i8SE7THUr2VR1avhg=NAV11qJg7iw7R3g@mail.gmail.com
обсуждение исходный текст
Ответ на [HACKERS] Potential data loss of 2PC files  (Michael Paquier <michael.paquier@gmail.com>)
Ответы Re: [HACKERS] Potential data loss of 2PC files  (Michael Paquier <michael.paquier@gmail.com>)
Список pgsql-hackers
On Thu, Dec 22, 2016 at 7:00 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Hi all,
>
> 2PC files are created using RecreateTwoPhaseFile() in two places currently:
> - at replay on a XLOG_XACT_PREPARE record.
> - At checkpoint with CheckPointTwoPhase().
>
> Now RecreateTwoPhaseFile() is careful to call pg_fsync() to be sure
> that the 2PC files find their way into disk. But one piece is missing:
> the parent directory pg_twophase is not fsync'd. At replay this is
> more sensitive if there is a PREPARE record followed by a checkpoint
> record. If there is a power failure after the checkpoint completes
> there is a risk to lose 2PC status files here.
>
> It seems to me that we really should have CheckPointTwoPhase() call
> fsync() on pg_twophase to be sure that no files are lost here. There
> is no point to do this operation in RecreateTwoPhaseFile() as if there
> are many 2PC transactions to replay performance would be impacted, and
> we don't care about the durability of those files until a checkpoint
> moves the redo pointer. I have drafted the patch attached to address
> this issue.

I agree with this.
If no prepared transactions were required to be fsynced
CheckPointTwoPhase(), do we want to still fsync the directory?
Probably not.

May be you want to call fsync_fname(TWOPHASE_DIR, true); if
serialized_xacts > 0.

The comment just states what has been done earlier in the function,
which is documented in the prologue as well.   /*    * Make sure that the content created is persistent on disk to
prevent   * data loss in case of an OS crash or a power failure. Each 2PC file    * has been already created and
flushedto disk individually by    * RecreateTwoPhaseFile() using the list of GXACTs available for    * normal
processingas well as at recovery when replaying individually    * each XLOG_XACT_PREPARE record.    */
 
Instead, you may want to just say "If we have flushed any 2PC files,
flush the metadata in the pg_twophase directory."

Although, I thought that a simple case of creating a persistent table
which requires creating a file also would need to flush the directory.
I tried to locate the corresponding code to see if it also uses
fsync_fname(). I couldn't locate the code. I have looked at
mdcreate(), mdpreckpt(), mdpostckpt(). But may be that's not relevant
here.


>
> I am adding that as well to the next CF for consideration.
>
> Thoughts?
> --
> Michael
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: [HACKERS] proposal: session server side variables
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: [HACKERS] proposal: session server side variables