Обсуждение: Timeline in the light of Synchronous replication
Hello guys,<br /><br />The concept of time line makes sense to me in the case of asynchronous replication. But in case ofsynchronous replication, I am not so sure.<br /><br />When a standby connects to the primary, it checks if both have thesame time line. If not, it doesn't start.<br /><br />Now, consider the following scenario. The primary (call it A) fails,the standby (call it B), via a trigger file, comes out of recovery mode (increments time line id to say 2), and morphsinto a primary. Now, lets say we start the old primary A as a standby, to connect to the new primary B (which previouslywas a standby). As the code is at the moment, the old primary A will not be allowed to connect to the new primaryB because A's timelineid (1) is not equivalent to that of the new primary B (2). Hence, we need to create a backupagain, and setup the standby from scratch. <br /><br />In the above scenario, if the system was using asynchronousreplication, time lines would have saved us from doing something wrong. But, if we are using synchronous replication,we know that both A and B would have been in sync before the failure. In this case, forcing to create a new standbyfrom scratch because of time lines might not be very desirable if the database is huge.<br /><br />Your comments onthe above will be appreciated.<br /><br />Regards<br /><br /><br />
On Wed, Oct 13, 2010 at 04:23:57PM -0700, fazool mein wrote: > Hello guys, > > The concept of time line makes sense to me in the case of asynchronous > replication. But in case of synchronous replication, I am not so sure. > > When a standby connects to the primary, it checks if both have the same time > line. If not, it doesn't start. > > Now, consider the following scenario. The primary (call it A) fails, the > standby (call it B), via a trigger file, comes out of recovery mode > (increments time line id to say 2), and morphs into a primary. Now, lets say > we start the old primary A as a standby, to connect to the new primary B > (which previously was a standby). As the code is at the moment, the old > primary A will not be allowed to connect to the new primary B because A's > timelineid (1) is not equivalent to that of the new primary B (2). Hence, we > need to create a backup again, and setup the standby from scratch. Yes. > In the above scenario, if the system was using asynchronous > replication, time lines would have saved us from doing something > wrong. But, if we are using synchronous replication, we know that > both A and B would have been in sync before the failure. In this > case, forcing to create a new standby from scratch because of time > lines might not be very desirable if the database is huge. One way to get them in sync without starting from scratch is to use rsync from A to B. This works in the asynchronous case, too. :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Thu, Oct 14, 2010 at 8:23 AM, fazool mein <fazoolmein@gmail.com> wrote: > The concept of time line makes sense to me in the case of asynchronous > replication. But in case of synchronous replication, I am not so sure. > > When a standby connects to the primary, it checks if both have the same time > line. If not, it doesn't start. > > Now, consider the following scenario. The primary (call it A) fails, the > standby (call it B), via a trigger file, comes out of recovery mode > (increments time line id to say 2), and morphs into a primary. Now, lets say > we start the old primary A as a standby, to connect to the new primary B > (which previously was a standby). As the code is at the moment, the old > primary A will not be allowed to connect to the new primary B because A's > timelineid (1) is not equivalent to that of the new primary B (2). Hence, we > need to create a backup again, and setup the standby from scratch. Yep. > In the above scenario, if the system was using asynchronous replication, > time lines would have saved us from doing something wrong. But, if we are > using synchronous replication, we know that both A and B would have been in > sync before the failure. In this case, forcing to create a new standby from > scratch because of time lines might not be very desirable if the database is > huge. At least in my sync rep patch, the data buffer flush waits until WAL has been written to the disk, but not until WAL has arrived at the standby. So the database in A might be ahead of that in B, even in sync rep. To avoid this, we should make the buffer flush wait for also replication? But, even though we will have done that, it should be noted that WAL in A might be ahead of that in B. For example, A might crash right after writing WAL to the disk and before sending it to B. So when we restart the old master A as the standby after failover, we should need to delete some WAL files (in A) which are inconsistent with the WAL sequence in B. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao <masao.fujii@gmail.com> writes: > But, even though we will have done that, it should be noted that WAL in > A might be ahead of that in B. For example, A might crash right after > writing WAL to the disk and before sending it to B. So when we restart > the old master A as the standby after failover, we should need to delete > some WAL files (in A) which are inconsistent with the WAL sequence in B. The idea to send from master to slave the current last applied LSN has been talked about already. It would allow to send the WAL content in parallel of it's local fsync() on the master, the standby would refrain from applying any WAL segment until it knows the master is past that. Now, given such a behavior, that would mean that when A joins again as a standby, it would have to ask B for the current last applied LSN too, and would notice the timeline change. Maybe by adding a facility to request the last LSN of the previous timeline, and with the behavior above applied there (skipping now-known-future-WALs in recovery), that would work automatically? There's still the problem of WALs that have been applied before recovery, I don't know that we can do anything here. But maybe we could also tweak the CHECKPOINT mecanism not to advance the restart point until we know the standbys have already replayed anything up to the restart point? -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
I believe we should come up with a universal solution that will solve potential future problems as well (for example, ifin sync replication, we decide to send writes to standbys in parallel to writing on local disk).<br /><br /> The idealthing would be to have an id that is incremented on every failure, and is stored in the WAL. Whenever a standby connectsto the primary, it should send the point p in WAL where streaming should start, plus the id. If the id is the sameat the primary at point p, things are good. Else, we should tell the standby to either create a new copy from scratch,or delete some WALs.<br /><br />@David<br />> One way to get them in sync without starting from scratch is touse<br />> rsync from A to B. This works in the asynchronous case, too. :)<br /><br />The problem mainly is detectingwhen one can rsync/stream and when not.<br /><br />Regards<br /><br /><br /><br /><div class="gmail_quote">On Mon,Oct 18, 2010 at 1:57 AM, Dimitri Fontaine <span dir="ltr"><<a href="mailto:dimitri@2ndquadrant.fr">dimitri@2ndquadrant.fr</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div class="im">Fujii Masao<<a href="mailto:masao.fujii@gmail.com">masao.fujii@gmail.com</a>> writes:<br /> > But, even though we willhave done that, it should be noted that WAL in<br /> > A might be ahead of that in B. For example, A might crash rightafter<br /> > writing WAL to the disk and before sending it to B. So when we restart<br /> > the old master Aas the standby after failover, we should need to delete<br /> > some WAL files (in A) which are inconsistent with theWAL sequence in B.<br /><br /></div>The idea to send from master to slave the current last applied LSN has<br /> beentalked about already. It would allow to send the WAL content in<br /> parallel of it's local fsync() on the master, thestandby would refrain<br /> from applying any WAL segment until it knows the master is past that.<br /><br /> Now, givensuch a behavior, that would mean that when A joins again as a<br /> standby, it would have to ask B for the currentlast applied LSN too,<br /> and would notice the timeline change. Maybe by adding a facility to<br /> request thelast LSN of the previous timeline, and with the behavior<br /> above applied there (skipping now-known-future-WALs inrecovery), that<br /> would work automatically?<br /><br /> There's still the problem of WALs that have been applied before<br/> recovery, I don't know that we can do anything here. But maybe we could<br /> also tweak the CHECKPOINT mecanismnot to advance the restart point<br /> until we know the standbys have already replayed anything up to the<br />restart point?<br /><font color="#888888"><br /> --<br /> Dimitri Fontaine<br /><a href="http://2ndQuadrant.fr" target="_blank">http://2ndQuadrant.fr</a> PostgreSQL : Expertise, Formation et Support<br /></font></blockquote></div><br/>
On Mon, Oct 18, 2010 at 4:31 AM, Fujii Masao <masao.fujii@gmail.com> wrote: > But, even though we will have done that, it should be noted that WAL in > A might be ahead of that in B. For example, A might crash right after > writing WAL to the disk and before sending it to B. So when we restart > the old master A as the standby after failover, we should need to delete > some WAL files (in A) which are inconsistent with the WAL sequence in B. Right. There's no way to make it categorically safe to turn A into a standby, because there's no way to guarantee that the fsyncs of the WAL happen at the same femtosecond on both machines. What we should be looking for is a reliable way to determine whether or not it is in fact safe. Timelines are intended to provide that, but there are holes, so they don't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company