Re: Inconsistent DB data in Streaming Replication
От | Samrat Revagade |
---|---|
Тема | Re: Inconsistent DB data in Streaming Replication |
Дата | |
Msg-id | CAF8Q-GwH0N7yFUT+QophzsC5z7+7KxRjWPdTUASGzvaO2rgyxw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Inconsistent DB data in Streaming Replication (Fujii Masao <masao.fujii@gmail.com>) |
Ответы |
Re: Inconsistent DB data in Streaming Replication
|
Список | pgsql-hackers |
<div dir="ltr"><div style="border-width:1.5pt 1pt;border-color:rgb(211,211,211);border-style:solid;padding:12pt 0in 12pt24pt"><p class="" style="margin-bottom:0.0001pt;border:none;padding:0in"><p class="" style="margin-bottom:0.0001pt"><spanstyle="font-size:10pt;font-family:Arial,sans-serif">>What Samrat is proposing hereis that WAL is not flushed to the OS before</span><p class="" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:Arial,sans-serif">>itis acked by a synchronous replica so recovery won't go past the</span><pclass="" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:Arial,sans-serif">>timelinechange made in failover, making it necessary to take a new</span><pclass="" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:Arial,sans-serif">>base backupto resync with the new master.</span><p><p class="" style="margin-bottom:0.0001pt;border:none;padding:0in"><p class=""style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:Arial,sans-serif">Actually we are proposingthat the data page on the master is not committed till master receives ACK from the standby. The WAL files can beflushed to the disk on both the master and standby, before standby generates ACK to master. The end objective is the sameof avoiding to take base backup of old master to resync with new master.</span><p class="" style="margin-bottom:0.0001pt"><pclass="" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:Arial,sans-serif">>Whydo you think that the inconsistent data after failover happensis</span><span style="font-size:10pt;font-family:Arial,sans-serif"><br /><span style="background-repeat:initial initial">>problem?Because </span></span><p class="" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:Arial,sans-serif">>it'sone of the reasons why a fresh base backup is required when</span><spanstyle="font-size:10pt;font-family:Arial,sans-serif"><br /><span style="background-repeat:initial initial">>startingold master as</span><br /><span style="background-repeat:initial initial">>new standby? If yes, Iagree with you. I've often heard the complaints</span><br /><span style="background-repeat:initial initial">>about abackup</span><br /><span style="background-repeat:initial initial">>when restarting new standby. That's really big problem.</span></span><pclass="" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:Arial,sans-serif"> </span><span style="font-family:Arial,sans-serif;font-size:10pt">Yes,taking backup is</span><span style="font-family:Arial,sans-serif;font-size:10pt"> </span><span style="font-family:Arial,sans-serif;font-size:10pt">majorproblem when the database size is more than several TB. It wouldtake very long time to ship backup data over the slow WAN network. </span><p class="" style="margin-bottom:12pt"><spanstyle="font-size:10pt;font-family:Arial,sans-serif;color:rgb(80,0,80)">>> One solutionto avoid this situation is have the master send WAL records to standby and wait for ACK from standby committing WALfiles to disk and only after that commit data page related to this transaction on master.</span><p class="" style="margin-bottom:0.0001pt"><spanstyle="font-size:10pt;font-family:Arial,sans-serif">>You mean to make the master waitthe data page write until WAL has been not only</span><span style="font-size:10pt;font-family:Arial,sans-serif"><br /><spanstyle="background-repeat:initial initial">>flushed to disk but also replicated to the standby?</span></span><pclass="" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:Arial,sans-serif"> </span><span style="font-size:10pt;font-family:Arial,sans-serif">Yes.Master should not write the data page before corresponding WAL recordshave been replicated to the standby. The WAL records have been flushed to disk on both master and standby.</span><pclass="" style="margin-bottom:12pt"><span style="font-size:10pt;font-family:Arial,sans-serif;color:rgb(80,0,80)">>>The main drawback would be increased waittime for the client due to extra round trip to standby before master sends ACK to client. Are there any other issueswith this approach?</span><p class=""><span style="font-size:10pt;line-height:115%;font-family:Arial,sans-serif">>Ithink that you can introduce GUC specifying whetherthis extra check</span><span style="font-size:10pt;line-height:115%;font-family:Arial,sans-serif"><br /><span style="background-repeat:initialinitial">>is required to avoid a backup when failback</span></span><p class=""><p class=""style="margin-bottom:0.0001pt"><span style="font-family:Arial,sans-serif;font-size:10pt;line-height:115%">That wouldbe better idea. We can disable it whenever taking a fresh backup is not a problem. </span><span style="font-family:Arial,sans-serif;font-size:10pt;line-height:115%"> </span><span style="font-family:Arial,sans-serif;font-size:10pt;line-height:115%"> </span><span style="font-family:Arial,sans-serif;font-size:10pt;line-height:115%"> </span><span style="font-family:Arial,sans-serif;font-size:10pt;line-height:115%"> </span><span style="font-size:10pt;font-family:Arial,sans-serif"> </span><pclass="" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:Arial,sans-serif"><br/></span><p class="" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:Arial,sans-serif">Regards,</span><pclass="" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:Arial,sans-serif">Samrat </span><span style="font-size:10pt;font-family:Arial,sans-serif"> </span><p><p></div></div><divclass="gmail_extra"><br /><br /><div class="gmail_quote">OnMon, Apr 8, 2013 at 10:40 PM, Fujii Masao <span dir="ltr"><<a href="mailto:masao.fujii@gmail.com"target="_blank">masao.fujii@gmail.com</a>></span> wrote:<br /><blockquote class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Mon, Apr 8, 2013at 7:34 PM, Samrat Revagade<br /> <<a href="mailto:revagade.samrat@gmail.com">revagade.samrat@gmail.com</a>> wrote:<br/> ><br /> > Hello,<br /> ><br /> > We have been trying to figure out possible solutions to the followingproblem in streaming replication Consider following scenario:<br /> ><br /> > If master receives commit command,it writes and flushes commit WAL records to the disk, It also writes and flushes data page related to this transaction.<br/> ><br /> > The master then sends WAL records to standby up to the commit WAL record. But before sendingthese records if failover happens then, old master is ahead of standby which is now the new master in terms of DBdata leading to inconsistent data .<br /><br /></div>Why do you think that the inconsistent data after failover happensis<br /> problem? Because<br /> it's one of the reasons why a fresh base backup is required when<br /> starting oldmaster as<br /> new standby? If yes, I agree with you. I've often heard the complaints<br /> about a backup<br /> whenrestarting new standby. That's really big problem.<br /><br /> The timeline mismatch after failover was one of the reasonswhy a<br /> backup is required.<br /> But, thanks to Heikki's recent work, that's solved, i.e., the timeline<br />mismatch would be<br /> automatically resolved when starting replication in 9.3. So, the<br /> remaining problem is an<br/> inconsistent database.<br /><div class="im"><br /> > One solution to avoid this situation is have the master sendWAL records to standby and wait for ACK from standby committing WAL files to disk and only after that commit data pagerelated to this transaction on master.<br /><br /></div>You mean to make the master wait the data page write until WALhas been not only<br /> flushed to disk but also replicated to the standby?<br /><div class="im"><br /> > The maindrawback would be increased wait time for the client due to extra round trip to standby before master sends ACK to client.Are there any other issues with this approach?<br /><br /></div>I think that you can introduce GUC specifying whetherthis extra check<br /> is required to<br /> avoid a backup when failback.<br /><br /> Regards,<br /><br /> --<br />Fujii Masao<br /></blockquote></div><br /></div>
В списке pgsql-hackers по дате отправления: