Re: Inconsistent DB data in Streaming Replication

Поиск
Список
Период
Сортировка
От Samrat Revagade
Тема Re: Inconsistent DB data in Streaming Replication
Дата
Msg-id CAF8Q-GwH0N7yFUT+QophzsC5z7+7KxRjWPdTUASGzvaO2rgyxw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Inconsistent DB data in Streaming Replication  (Fujii Masao <masao.fujii@gmail.com>)
Ответы Re: Inconsistent DB data in Streaming Replication  (Ants Aasma <ants@cybertec.at>)
Список pgsql-hackers
<div dir="ltr"><div style="border-width:1.5pt 1pt;border-color:rgb(211,211,211);border-style:solid;padding:12pt 0in
12pt24pt"><p class="" style="margin-bottom:0.0001pt;border:none;padding:0in"><p class=""
style="margin-bottom:0.0001pt"><spanstyle="font-size:10pt;font-family:Arial,sans-serif">>What Samrat is proposing
hereis that WAL is not flushed to the OS before</span><p class="" style="margin-bottom:0.0001pt"><span
style="font-size:10pt;font-family:Arial,sans-serif">>itis acked by a synchronous replica so recovery won't go past
the</span><pclass="" style="margin-bottom:0.0001pt"><span
style="font-size:10pt;font-family:Arial,sans-serif">>timelinechange made in failover, making it necessary to take a
new</span><pclass="" style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:Arial,sans-serif">>base
backupto resync with the new master.</span><p><p class="" style="margin-bottom:0.0001pt;border:none;padding:0in"><p
class=""style="margin-bottom:0.0001pt"><span style="font-size:10pt;font-family:Arial,sans-serif">Actually we are
proposingthat the data page on the master is not committed till master receives ACK from the standby. The WAL files can
beflushed to the disk on both the master and standby, before standby generates ACK to master. The end objective is the
sameof avoiding to take base backup of old master to resync with new master.</span><p class=""
style="margin-bottom:0.0001pt"><pclass="" style="margin-bottom:0.0001pt"><span
style="font-size:10pt;font-family:Arial,sans-serif">>Whydo you think that the inconsistent data after failover
happensis</span><span style="font-size:10pt;font-family:Arial,sans-serif"><br /><span style="background-repeat:initial
initial">>problem?Because </span></span><p class="" style="margin-bottom:0.0001pt"><span
style="font-size:10pt;font-family:Arial,sans-serif">>it'sone of the reasons why a fresh base backup is required
when</span><spanstyle="font-size:10pt;font-family:Arial,sans-serif"><br /><span style="background-repeat:initial
initial">>startingold master as</span><br /><span style="background-repeat:initial initial">>new standby? If yes,
Iagree with you. I've often heard the complaints</span><br /><span style="background-repeat:initial initial">>about
abackup</span><br /><span style="background-repeat:initial initial">>when restarting new standby. That's really big
problem.</span></span><pclass="" style="margin-bottom:0.0001pt"><span
style="font-size:10pt;font-family:Arial,sans-serif"> </span><span
style="font-family:Arial,sans-serif;font-size:10pt">Yes,taking backup is</span><span
style="font-family:Arial,sans-serif;font-size:10pt"> </span><span
style="font-family:Arial,sans-serif;font-size:10pt">majorproblem when the database size is more than several TB. It
wouldtake very long time to ship backup data over the slow WAN network. </span><p class=""
style="margin-bottom:12pt"><spanstyle="font-size:10pt;font-family:Arial,sans-serif;color:rgb(80,0,80)">>> One
solutionto avoid this situation is have the master send WAL records to standby and wait for ACK from standby committing
WALfiles to disk and only after that commit data page related to this transaction on master.</span><p class=""
style="margin-bottom:0.0001pt"><spanstyle="font-size:10pt;font-family:Arial,sans-serif">>You mean to make the master
waitthe data page write until WAL has been not only</span><span style="font-size:10pt;font-family:Arial,sans-serif"><br
/><spanstyle="background-repeat:initial initial">>flushed to disk but also replicated to the
standby?</span></span><pclass="" style="margin-bottom:0.0001pt"><span
style="font-size:10pt;font-family:Arial,sans-serif"> </span><span
style="font-size:10pt;font-family:Arial,sans-serif">Yes.Master should not write the data page before corresponding WAL
recordshave been replicated to the standby. The WAL records have been flushed to disk on both master and
standby.</span><pclass="" style="margin-bottom:12pt"><span
style="font-size:10pt;font-family:Arial,sans-serif;color:rgb(80,0,80)">>>The main drawback would be increased
waittime for the client due to extra round trip to standby before master sends ACK to client. Are there any other
issueswith this approach?</span><p class=""><span
style="font-size:10pt;line-height:115%;font-family:Arial,sans-serif">>Ithink that you can introduce GUC specifying
whetherthis extra check</span><span style="font-size:10pt;line-height:115%;font-family:Arial,sans-serif"><br /><span
style="background-repeat:initialinitial">>is required to avoid a backup when failback</span></span><p class=""><p
class=""style="margin-bottom:0.0001pt"><span style="font-family:Arial,sans-serif;font-size:10pt;line-height:115%">That
wouldbe better idea. We can disable it whenever taking a fresh backup is not a problem. </span><span
style="font-family:Arial,sans-serif;font-size:10pt;line-height:115%"> </span><span
style="font-family:Arial,sans-serif;font-size:10pt;line-height:115%"> </span><span
style="font-family:Arial,sans-serif;font-size:10pt;line-height:115%"> </span><span
style="font-family:Arial,sans-serif;font-size:10pt;line-height:115%"> </span><span
style="font-size:10pt;font-family:Arial,sans-serif"> </span><pclass="" style="margin-bottom:0.0001pt"><span
style="font-size:10pt;font-family:Arial,sans-serif"><br/></span><p class="" style="margin-bottom:0.0001pt"><span
style="font-size:10pt;font-family:Arial,sans-serif">Regards,</span><pclass="" style="margin-bottom:0.0001pt"><span
style="font-size:10pt;font-family:Arial,sans-serif">Samrat </span><span
style="font-size:10pt;font-family:Arial,sans-serif"> </span><p><p></div></div><divclass="gmail_extra"><br /><br /><div
class="gmail_quote">OnMon, Apr 8, 2013 at 10:40 PM, Fujii Masao <span dir="ltr"><<a
href="mailto:masao.fujii@gmail.com"target="_blank">masao.fujii@gmail.com</a>></span> wrote:<br /><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Mon, Apr 8,
2013at 7:34 PM, Samrat Revagade<br /> <<a href="mailto:revagade.samrat@gmail.com">revagade.samrat@gmail.com</a>>
wrote:<br/> ><br /> > Hello,<br /> ><br /> > We have been trying to figure out possible solutions to the
followingproblem in streaming replication Consider following scenario:<br /> ><br /> > If master receives commit
command,it writes and flushes commit WAL records to the disk, It also writes and flushes data page related to this
transaction.<br/> ><br /> > The master then sends WAL records to standby up to the commit WAL record. But before
sendingthese records if failover happens then,  old master is ahead of  standby which is now the new master in terms of
DBdata leading to inconsistent data .<br /><br /></div>Why do you think that the inconsistent data after failover
happensis<br /> problem? Because<br /> it's one of the reasons why a fresh base backup is required when<br /> starting
oldmaster as<br /> new standby? If yes, I agree with you. I've often heard the complaints<br /> about a backup<br />
whenrestarting new standby. That's really big problem.<br /><br /> The timeline mismatch after failover was one of the
reasonswhy a<br /> backup is required.<br /> But, thanks to Heikki's recent work, that's solved, i.e., the timeline<br
/>mismatch would be<br /> automatically resolved when starting replication in 9.3. So, the<br /> remaining problem is
an<br/> inconsistent database.<br /><div class="im"><br /> > One solution to avoid this situation is have the master
sendWAL records to standby and wait for ACK from standby committing WAL files to disk and only after that commit data
pagerelated to this transaction on master.<br /><br /></div>You mean to make the master wait the data page write until
WALhas been not only<br /> flushed to disk but also replicated to the standby?<br /><div class="im"><br /> > The
maindrawback would be increased wait time for the client due to extra round trip to standby before master sends ACK to
client.Are there any other issues with this approach?<br /><br /></div>I think that you can introduce GUC specifying
whetherthis extra check<br /> is required to<br /> avoid a backup when failback.<br /><br /> Regards,<br /><br /> --<br
/>Fujii Masao<br /></blockquote></div><br /></div> 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: Unrecognized type error (postgres 9.1.4)
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Enabling Checksums