Re: [HACKERS] Potential data loss of 2PC files
От | Ashutosh Bapat |
---|---|
Тема | Re: [HACKERS] Potential data loss of 2PC files |
Дата | |
Msg-id | CAFjFpRfvnvoGbO7R1i8SE7THUr2VR1avhg=NAV11qJg7iw7R3g@mail.gmail.com обсуждение исходный текст |
Ответ на | [HACKERS] Potential data loss of 2PC files (Michael Paquier <michael.paquier@gmail.com>) |
Ответы |
Re: [HACKERS] Potential data loss of 2PC files
(Michael Paquier <michael.paquier@gmail.com>)
|
Список | pgsql-hackers |
On Thu, Dec 22, 2016 at 7:00 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > Hi all, > > 2PC files are created using RecreateTwoPhaseFile() in two places currently: > - at replay on a XLOG_XACT_PREPARE record. > - At checkpoint with CheckPointTwoPhase(). > > Now RecreateTwoPhaseFile() is careful to call pg_fsync() to be sure > that the 2PC files find their way into disk. But one piece is missing: > the parent directory pg_twophase is not fsync'd. At replay this is > more sensitive if there is a PREPARE record followed by a checkpoint > record. If there is a power failure after the checkpoint completes > there is a risk to lose 2PC status files here. > > It seems to me that we really should have CheckPointTwoPhase() call > fsync() on pg_twophase to be sure that no files are lost here. There > is no point to do this operation in RecreateTwoPhaseFile() as if there > are many 2PC transactions to replay performance would be impacted, and > we don't care about the durability of those files until a checkpoint > moves the redo pointer. I have drafted the patch attached to address > this issue. I agree with this. If no prepared transactions were required to be fsynced CheckPointTwoPhase(), do we want to still fsync the directory? Probably not. May be you want to call fsync_fname(TWOPHASE_DIR, true); if serialized_xacts > 0. The comment just states what has been done earlier in the function, which is documented in the prologue as well. /* * Make sure that the content created is persistent on disk to prevent * data loss in case of an OS crash or a power failure. Each 2PC file * has been already created and flushedto disk individually by * RecreateTwoPhaseFile() using the list of GXACTs available for * normal processingas well as at recovery when replaying individually * each XLOG_XACT_PREPARE record. */ Instead, you may want to just say "If we have flushed any 2PC files, flush the metadata in the pg_twophase directory." Although, I thought that a simple case of creating a persistent table which requires creating a file also would need to flush the directory. I tried to locate the corresponding code to see if it also uses fsync_fname(). I couldn't locate the code. I have looked at mdcreate(), mdpreckpt(), mdpostckpt(). But may be that's not relevant here. > > I am adding that as well to the next CF for consideration. > > Thoughts? > -- > Michael > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
В списке pgsql-hackers по дате отправления: