Обсуждение: Synch Rep: direct transfer of WAL file from the primary to the standby
Hi, http://archives.postgresql.org/message-id/496B9495.4010902@enterprisedb.com > IMHO, the synchronous replication isn't in such good shape, I'm afraid. > I've said this before, but I'm not happy with the "built from spare parts" > nature of it. You shouldn't have to configure an archive, file-based log > shipping using rsync or whatever, and pg_standby. All that is in addition > to the direct connection between master and slave. The slave really should > be able to just connect to the master, and download all the WAL it needs > directly. That's a huge usability issue if left as is, but requires very large > architectural changes to fix. One of the major problems in Synch Rep was that WAL files generated before replication starts are not automatically transferred to the standby server. Those files needed to be shipped by hand or using warm-standby mechanism. This degraded the usability of Synch Rep. So, I'd like to propose the capability that the startup process automatically restores the missing file (WAL file, backup history file or timeline history file) from the primary server. Specifically, the startup process tries to retrieve the file in the following order: 1) from the archive in the standby server 2) from the primary server <--- New Feature! 3) from pg_xlog in the standby server This means that users don't need extra copy operations anymore to set up replication. Implementation -------------------- The main part of this capability is the new function to read the specified WAL file. The following is the definition of it. pg_read_xlogfile (filename text [, restore bool]) returns setof bytea - filename: name of file to read - restore: indicates whether to try to restore the file from the archive - returns the content of the specified file (max size of one row is 8KB, i.e. this function returns 2,048 rows when WAL file whose size is 16MB is requested.) If restore=true, this function tries to retrieve the file from the archive at first. This requires restore_command which needs to be specified in postgresql.conf. If that restore fails or restore=false, it tries to retrieve the file from pg_xlog. In this case, WAL files or backup history file might be removed from pg_xlog by concurrent checkpoint or pg_stop_backup, respectively. So, ControlFileLock must be held to read it. On the other hand, we should not send (return) any read data while holding the lock. Otherwise, a network outage would seriously block the processing which requires the lock. So, WAL file or backup history file in pg_xlog is copied to a temporary file while holding the lock, then read and sent (returned) after releasing it. In the standby server, if a missing file is found, the startup process connects to the primary server as a normal client, and retrieves the binary contents of the WAL file by using the following SQL. Then, the restored file is written to pg_xlog, and applied. COPY (SELECT pg_read_xlogfilie('filename', true)) TO STDOUT WITH BINARY The attached latest patch provides this capability. You can easily set up the synch rep according to the following procedure. http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#How_to_set_up_Synch_Rep Comments? Do you have another better approach? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Вложения
On Tue, Jun 16, 2009 at 2:13 AM, Fujii Masao<masao.fujii@gmail.com> wrote: > The attached latest patch provides this capability. You can easily set up the > synch rep according to the following procedure. > http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#How_to_set_up_Synch_Rep This patch no longer applies cleanly. Can you rebase and resubmit it for the upcoming CommitFest? It might also be good to go through and clean up the various places where you have trailing whitespace and/or spaces preceding tabs. It seems this will be one of the "big" patches for the upcoming CommitFest. Hot Standby seems to be off the table, because Simon has indicated that he thinks Synch Rep should go first, and Heikki has indicated that he's willing to review and commit, but not also play lead developer. http://archives.postgresql.org/pgsql-hackers/2009-07/msg00005.php http://archives.postgresql.org/pgsql-hackers/2009-06/msg01534.php Given that this is a substantial patch, I have a couple of questions about strategy. First, I am wondering whether this patch should be reviewed (and committed) as a whole, or whether there are distinct chunks of it that should be reviewed and committed separately - particularly the signal handling piece, which AIUI is independently useful. I note that it seems to be included in the tarball as a separate patch file, which is very useful. Second, I am wondering whether Heikki feels that it would be useful to assign round-robin reviewers for this patch, or whether he's going to be the principal reviewer himself. We could assign either a reviewer (or reviewers) to the whole patch, or we could assign reviewers to particular chunks of the patch, such as the signal handling piece. Thanks, ...Robert
On Thu, Jul 2, 2009 at 10:02 PM, Robert Haas<robertmhaas@gmail.com> wrote: > Second, I am wondering whether Heikki feels that it would be useful to > assign round-robin reviewers for this patch, or whether he's going to > be the principal reviewer himself. We could assign either a reviewer > (or reviewers) to the whole patch, or we could assign reviewers to > particular chunks of the patch, such as the signal handling piece. Hmm, taking a look at the wiki, I see that Simon's name is listed for this patch as a reviewer already. Assuming that's a point of view that Simon agrees with and not the result of his name having been added by someone else, I guess the question is whether we need additional reviewers here beyond Heikki and Simon. ...Robert
Robert Haas escribió: > Second, I am wondering whether Heikki feels that it would be useful to > assign round-robin reviewers for this patch, or whether he's going to > be the principal reviewer himself. We could assign either a reviewer > (or reviewers) to the whole patch, or we could assign reviewers to > particular chunks of the patch, such as the signal handling piece. WRT the signal handling piece, I remember something in that area being committed and then reverted because it had issues. Does this version fix those issues? (Assuming it's the same patch) -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Hi, On Fri, Jul 3, 2009 at 11:02 AM, Robert Haas<robertmhaas@gmail.com> wrote: > On Tue, Jun 16, 2009 at 2:13 AM, Fujii Masao<masao.fujii@gmail.com> wrote: >> The attached latest patch provides this capability. You can easily set up the >> synch rep according to the following procedure. >> http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#How_to_set_up_Synch_Rep > > This patch no longer applies cleanly. Can you rebase and resubmit it > for the upcoming CommitFest? It might also be good to go through and > clean up the various places where you have trailing whitespace and/or > spaces preceding tabs. Sure. I'll resubmit the patch after fixing some bugs and finishing the documents. > Given that this is a substantial patch, I have a couple of questions > about strategy. First, I am wondering whether this patch should be > reviewed (and committed) as a whole, or whether there are distinct > chunks of it that should be reviewed and committed separately - > particularly the signal handling piece, which AIUI is independently > useful. I note that it seems to be included in the tarball as a > separate patch file, which is very useful. I think that the latter strategy makes more sense. At least, the signal handling piece and non-blocking pqcomm (communication between a frontend and a backend) can be reviewed independently of synch rep itself. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, On Fri, Jul 3, 2009 at 11:59 AM, Alvaro Herrera<alvherre@commandprompt.com> wrote: > WRT the signal handling piece, I remember something in that area being > committed and then reverted because it had issues. Does this version > fix those issues? (Assuming it's the same patch) Yes. After the patch was reverted, Heikki and I fixed the problems. The problem which was pointed out is: http://archives.postgresql.org/message-id/14969.1228835521@sss.pgh.pa.us Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Fri, Jul 3, 2009 at 12:32 AM, Fujii Masao<masao.fujii@gmail.com> wrote: > Hi, > > On Fri, Jul 3, 2009 at 11:02 AM, Robert Haas<robertmhaas@gmail.com> wrote: >> On Tue, Jun 16, 2009 at 2:13 AM, Fujii Masao<masao.fujii@gmail.com> wrote: >>> The attached latest patch provides this capability. You can easily set up the >>> synch rep according to the following procedure. >>> http://wiki.postgresql.org/wiki/NTT%27s_Development_Projects#How_to_set_up_Synch_Rep >> >> This patch no longer applies cleanly. Can you rebase and resubmit it >> for the upcoming CommitFest? It might also be good to go through and >> clean up the various places where you have trailing whitespace and/or >> spaces preceding tabs. > > Sure. I'll resubmit the patch after fixing some bugs and finishing > the documents. > >> Given that this is a substantial patch, I have a couple of questions >> about strategy. First, I am wondering whether this patch should be >> reviewed (and committed) as a whole, or whether there are distinct >> chunks of it that should be reviewed and committed separately - >> particularly the signal handling piece, which AIUI is independently >> useful. I note that it seems to be included in the tarball as a >> separate patch file, which is very useful. > > I think that the latter strategy makes more sense. At least, the signal > handling piece and non-blocking pqcomm (communication between > a frontend and a backend) can be reviewed independently of synch rep > itself. My preference for ease of CommitFest management would be one thread on -hackers for each chunk that can be separately reviewed and committed.So if there are three severable chunks here, send apatch for each one with a descriptive subject line, and mention the dependencies in the body of the email ("before applying this patch, you must first apply blah blah <link to archives>"). That way, we can keep the discussion of each topic separate, have separate entries on the CommitFest page with subjects that match the email thread, etc. Thanks, ...Robert
Hi, On Fri, Jul 3, 2009 at 2:01 PM, Robert Haas<robertmhaas@gmail.com> wrote: > My preference for ease of CommitFest management would be one thread on > -hackers for each chunk that can be separately reviewed and committed. > So if there are three severable chunks here, send a patch for each > one with a descriptive subject line, and mention the dependencies in > the body of the email ("before applying this patch, you must first > apply blah blah <link to archives>"). That way, we can keep the > discussion of each topic separate, have separate entries on the > CommitFest page with subjects that match the email thread, etc. That sounds good. I'll submit the patches separately. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, On Tue, Jun 16, 2009 at 3:13 PM, Fujii Masao<masao.fujii@gmail.com> wrote: > The main part of this capability is the new function to read the specified > WAL file. The following is the definition of it. > > pg_read_xlogfile (filename text [, restore bool]) returns setof bytea > > - filename: name of file to read > - restore: indicates whether to try to restore the file from the archive > > - returns the content of the specified file > (max size of one row is 8KB, i.e. this function returns 2,048 rows when > WAL file whose size is 16MB is requested.) > > If restore=true, this function tries to retrieve the file from the > archive at first. > This requires restore_command which needs to be specified in postgresql.conf. In order for the primary server (ie. a normal backend) to read an archived file, restore_command needs to be specified in also postgresql.conf. In this case, how should we handle restore_command in recovery.conf? 1) Delete restore_command from recovery.conf. In this case, an user has to specify it in postgresql.conf instead of recovery.confwhen doing PITR. This is simple, but tempts me to merge two configuration files. I'm not sure why the parametersfor recovery should be set apart from postgresql.conf. 2) Leave restore_command in recovery.conf; it can be set in both or either of two configuration files. We put recovery.confbefore postgresql.conf only during recovery if it's in both. After recovery, we prioritize postgresql.conf. In this case, recovery.conf also needs to be re-read during recovery when SIGHUP arrives. This mightbe complicated for an user. 3) Separate restore_command into two parameters. For example, - normal_restore_command: is used by a normal backend -recovery_restore_command: is used by startup process for PITR In this case, it's bothersome that the same command mustbe set in both of two configuration files. I'm leaning to 1) that restore_command is simply moved from recovery.conf to postgresql.conf. What's your opinion? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao <masao.fujii@gmail.com> writes: > In order for the primary server (ie. a normal backend) to read an archived file, > restore_command needs to be specified in also postgresql.conf. In this case, > how should we handle restore_command in recovery.conf? I confess to not having paid much attention to this thread so far, but ... what is the rationale for having such a capability at all? It seems to me to be exposing implementation details that we do not need to expose, as well as making assumptions that we shouldn't make (like there is exactly one archive and the primary server has read access to it). regards, tom lane
Hi, Thanks for the comment! On Tue, Jul 7, 2009 at 12:16 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > Fujii Masao <masao.fujii@gmail.com> writes: >> In order for the primary server (ie. a normal backend) to read an archived file, >> restore_command needs to be specified in also postgresql.conf. In this case, >> how should we handle restore_command in recovery.conf? > > I confess to not having paid much attention to this thread so far, but ... > what is the rationale for having such a capability at all? If the XLOG files which are required for recovery exist only in the primary server, the standby server has to read them in some way. For example, when the latest XLOG file of the primary server is 09 and the standby server has only 01, the missing files (02-08) has to be read for recovery by the standby server. In this case, the XLOG records in 09 or later are shipped to the standby server in real time by synchronous replication feature. The problem which I'd like to solve is how to make the standby server read the XLOG files (XLOG file, backup history file and timeline history) which exist only in the primary server. In the previous patch, we had to manually copy those missing files to the archive of the standby server or use the warm-standby mechanism. This would decrease the usability of synchronous replication. So, I proposed one of the solutions which makes the standby server read those missing files automatically: introducing new function pg_read_xlogfile() which reads the specified XLOG file. Is this solution in the right direction? Do you have another reasonable solution? > It seems to > me to be exposing implementation details that we do not need to expose, > as well as making assumptions that we shouldn't make (like there is > exactly one archive and the primary server has read access to it). You mean that one archive is shared between two servers? If so, no. I attached the picture of the environment which I assume. Please feel free to comment. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Вложения
Re: Re: Synch Rep: direct transfer of WAL file from the primary to the standby
От
Heikki Linnakangas
Дата:
Fujii Masao wrote: > On Tue, Jul 7, 2009 at 12:16 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> Fujii Masao <masao.fujii@gmail.com> writes: >>> In order for the primary server (ie. a normal backend) to read an archived file, >>> restore_command needs to be specified in also postgresql.conf. In this case, >>> how should we handle restore_command in recovery.conf? >> I confess to not having paid much attention to this thread so far, but ... >> what is the rationale for having such a capability at all? > > If the XLOG files which are required for recovery exist only in the > primary server, > the standby server has to read them in some way. For example, when the latest > XLOG file of the primary server is 09 and the standby server has only 01, the > missing files (02-08) has to be read for recovery by the standby server. In this > case, the XLOG records in 09 or later are shipped to the standby server in real > time by synchronous replication feature. > > The problem which I'd like to solve is how to make the standby server read the > XLOG files (XLOG file, backup history file and timeline history) which > exist only > in the primary server. In the previous patch, we had to manually copy those > missing files to the archive of the standby server or use the warm-standby > mechanism. This would decrease the usability of synchronous replication. So, > I proposed one of the solutions which makes the standby server read those > missing files automatically: introducing new function pg_read_xlogfile() which > reads the specified XLOG file. pg_read_xlogfile() feels like a quite hacky way to implement that. Do we require the master to always have read access to the PITR archive? And indeed, to have a PITR archive configured to begin with. If you need to set up archiving just because of the standby server, how do old files that are no longer required by the standby get cleaned up? I feel that the master needs to explicitly know what is the oldest WAL file the standby might still need, and refrain from deleting files the standby might still need. IOW, keep enough history in pg_xlog. Then we have the risk of running out of disk space on pg_xlog if the connection to the standby is lost for a long time, so we'll need some cap on that, after which the master declares the standby as dead and deletes the old WAL anyway. Nevertheless, I think that would be much simpler to implement, and simpler for admins. And if the standby can read old WAL segments from the PITR archive, in addition to requesting them from the primary, it is just as safe. I'd like to see a description of the proposed master/slave protocol for replication. If I understood correctly, you're proposing that the standby server connects to the master with libpq like any client, authenticates as usual, and then sends a message indicating that it wants to switch to "replication mode". In replication mode, normal FE/BE messages are not accepted, but there's a different set of message types for tranferring XLOG data. I'd like to see a more formal description of that protocol and the new message types. Some examples of how they would be in different scenarios, like when standby server connects to the master for the first time and needs to catch up. Looking at the patch briefly, it seems to assume that there is only one WAL sender active at any time. What happens when a new WAL sender connects and one is active already? While supporting multiple slaves isn't a priority, I think we should support multiple WAL senders right from the start. It shouldn't be much harder, and otherwise we need to ensure that the switch from old WAL sender to a new one is clean, which seems non-trivial. Or not accept a new WAL sender while old one is still active, but then a dead WAL sender process (because the standby suddenly crashed, for example) would inhibit a new standby from connecting, possibly for several minutes. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Re: Synch Rep: direct transfer of WAL file from the primary to the standby
От
Andrew Dunstan
Дата:
Heikki Linnakangas wrote: > While supporting multiple slaves > isn't a priority, > Really? I should have thought it was a basic requirement. At the very least we need to design with it in mind. cheers andrew
Hi, Thanks for the comment! On Tue, Jul 7, 2009 at 5:07 PM, Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> wrote: > pg_read_xlogfile() feels like a quite hacky way to implement that. Do we > require the master to always have read access to the PITR archive? And > indeed, to have a PITR archive configured to begin with. If you need to > set up archiving just because of the standby server, how do old files > that are no longer required by the standby get cleaned up? > > I feel that the master needs to explicitly know what is the oldest WAL > file the standby might still need, and refrain from deleting files the > standby might still need. IOW, keep enough history in pg_xlog. Then we > have the risk of running out of disk space on pg_xlog if the connection > to the standby is lost for a long time, so we'll need some cap on that, > after which the master declares the standby as dead and deletes the old > WAL anyway. Nevertheless, I think that would be much simpler to > implement, and simpler for admins. And if the standby can read old WAL > segments from the PITR archive, in addition to requesting them from the > primary, it is just as safe. I think of making pg_read_xlogfile() read the XLOG files from pg_xlog when restore_command is not specified or returns non-zero code (ie. failure). So, pg_read_xlogfile() with the following conditions might already cover the case you described. - checkpoint_segments = N (big number) - restore_command = '' In this case, we can expect that the XLOG files which are required for the standby exist in pg_xlog because of big checkpoint_segments. And, pg_read_xlogfile() reads them only from pg_xlog. checkpoint_segments would play a role of the cap and determine the maximum disk size of pg_xlog. The overflow files which might be no longer required for the standby are removed safely by postgres. OTOH, if there is not enough disk space for pg_xlog, we can specify restore_command and decrease checkpoint_segments. This is more flexible approach, I think. But, if the primary should not restore any archived file at any time, I have only to get rid of the code which pg_read_xlogfile() restores it? > I'd like to see a description of the proposed master/slave protocol for > replication. If I understood correctly, you're proposing that the > standby server connects to the master with libpq like any client, > authenticates as usual, and then sends a message indicating that it > wants to switch to "replication mode". In replication mode, normal FE/BE > messages are not accepted, but there's a different set of message types > for tranferring XLOG data. http://archives.postgresql.org/message-id/4951108A.5040608@enterprisedb.com > I don't think we need or should > allow running regular queries before entering "replication mode". the > backend should become a walsender process directly after authentication. I changed the protocol according to your suggestion. Here is the current protocol: On start-up, the standby calls PQstartReplication() which is new libpq function. It sends the startup packet with a special code for replication to the primary, like a cancel request. The backend which received this code becomes walsender directly. Authentication is performed as normal. Then, walsender switches the XLOG file, and sends the ReplicationStart message 'l' which includes the timeline ID and the replication start XLOG position. ReplicationStart (B) Byte1('l'): Identifies the message as a replication-start indicator. Int32(17): Length of messagecontents in bytes, including self. Int32: The timeline ID Int32: The start log file of replication Int32: Thestart byte offset of replication After that, walsender sends the XLogData message 'w' which includes the XLOG records, the flag (e.g. indicates whether the records should be fsynced or not), and the XLOG position, in real time. The standby receives the message using PQgetXLogData() which is new libpq function. OTOH, after writing or fsyncing the records, the standby sends the XLogResponse message 'r' which includes the flag and the position of the written/fsynced records, using PQputXLogRecPtr() which is new libpq function. XLogData (B) Byte1('w'): Identifies the message as XLOG records. Int32: Length of message contents in bytes, includingself. Int8: Flag bits indicating how the records should be treated. Int32: The log file number of the records. Int32: The byte offset of the records. Byte n: The XLOG records. XLogResponse (F) Byte1('r'): Identifies the message as ACK for XLOG records. Int32: Length of message contents in bytes,including self. Int8: Flag bits indicating how the records were treated. Int32: The log file number of the records. Int32: The byte offset of the records. Normal exit of walsender (e.g. by smart shutdown) sends the ReplicationEnd message 'z'. OTOH, normal exit of walreceiver sends the existing Terminate message 'X'. The above protocol is used between walsender and walreceiver. > I'd like to see a more formal description of that protocol and the new > message types. Some examples of how they would be in different > scenarios, like when standby server connects to the master for the first > time and needs to catch up. If there is a missing XLOG file which is required for recovery, the startup process connects to the primary as a normal client, and receives the binary contents of the file by using the following SQL. This has nothing to do with the above protocol. So, the transfer of missing file and synchronous XLOG streaming are performed concurrently. COPY (SELECT pg_read_xlogfilie('filename', true)) TO STDOUT WITH BINARY If no missing files are found (ie. recovery of the standby has reached the replication start position), the transfer of file drops out of use. > Looking at the patch briefly, it seems to assume that there is only one > WAL sender active at any time. What happens when a new WAL sender > connects and one is active already? The new request is refused because of existing walsender. > While supporting multiple slaves > isn't a priority, I think we should support multiple WAL senders right > from the start. It shouldn't be much harder, and otherwise we need to > ensure that the switch from old WAL sender to a new one is clean, which > seems non-trivial. Or not accept a new WAL sender while old one is still > active, Yeah, the current patch doesn't accept a new walsender while old one is still active. > but then a dead WAL sender process (because the standby suddenly > crashed, for example) would inhibit a new standby from connecting, > possibly for several minutes. Yes, new standby cannot start walsender until walsender detects the death of old standby. You can shorten the time to detect it by setting some timeout (replication_timeout and some keepalive parameters). I don't think that it's a problem that walsender cannot start for a short time. You think that walsender must *always* be able to start? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Fujii Masao <masao.fujii@gmail.com> writes: > On Tue, Jul 7, 2009 at 12:16 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> I confess to not having paid much attention to this thread so far, but ... >> what is the rationale for having such a capability at all? > If the XLOG files which are required for recovery exist only in the > primary server, > the standby server has to read them in some way. For example, when the latest > XLOG file of the primary server is 09 and the standby server has only 01, the > missing files (02-08) has to be read for recovery by the standby server. In this > case, the XLOG records in 09 or later are shipped to the standby server in real > time by synchronous replication feature. > The problem which I'd like to solve is how to make the standby server read the > XLOG files (XLOG file, backup history file and timeline history) which > exist only > in the primary server. In the previous patch, we had to manually copy those > missing files to the archive of the standby server or use the warm-standby > mechanism. This would decrease the usability of synchronous replication. So, > I proposed one of the solutions which makes the standby server read those > missing files automatically: introducing new function pg_read_xlogfile() which > reads the specified XLOG file. > Is this solution in the right direction? Do you have another > reasonable solution? This design seems totally wrong to me. It's confusing the master's pg_xlog directory with the archive. We should *not* use pg_xlog as a long-term archive area; that's terrible from both a performance and a reliability perspective. Performance because pg_xlog has to be fairly high-speed storage, which conflicts with it needing to hold a lot of stuff; and reliability because the entire point of all this is to survive a master server crash, and you're probably not going to have its pg_xlog anymore after that. If slaves need to be able to get at past WAL, they should be getting it from a separate archive server that is independent of the master DB. regards, tom lane
On Tue, Jul 7, 2009 at 4:49 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > This design seems totally wrong to me. It's confusing the master's > pg_xlog directory with the archive. We should *not* use pg_xlog as > a long-term archive area; that's terrible from both a performance > and a reliability perspective. Performance because pg_xlog has to be > fairly high-speed storage, which conflicts with it needing to hold > a lot of stuff; and reliability because the entire point of all this > is to survive a master server crash, and you're probably not going to > have its pg_xlog anymore after that. Hm, those are all good points. > If slaves need to be able to get at past WAL, they should be getting > it from a separate archive server that is independent of the master DB. But this conflicts with earlier discussions where we were concerned about the length of the path wal has to travel between the master and the slaves. We want slaves to be able to be turned on simply using a simple robust configuration and to be able to respond quickly to transactions that are committed in the master for synchronous operation. Having wal have to be written to the master xlog directory, be copied to the archive, then be copied from the archive to the slave's wal directory, and then finally be reread and replayed in the slave means a lot of extra complicated configuration which can be set up wrong and which might not be apparent until things fall apart. And it means a huge latency before the wal files are finally replayed on the slave which will make transitioning to synchronous mode -- with a whole other different mode of operation to configure -- quite tricky and potentialy slow. I'm not sure how to reconcile these two sets of priorities though. Your points above are perfectly valid as well. How do other databases handle log shipping? Do they depend on archived logs to bring the slaves up to speed? Is there a separate log management daemon? -- greg http://mit.edu/~gsstark/resume.pdf
Greg Stark <gsstark@mit.edu> writes: > On Tue, Jul 7, 2009 at 4:49 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> This design seems totally wrong to me. >> ... > But this conflicts with earlier discussions where we were concerned > about the length of the path wal has to travel between the master and > the slaves. We want slaves to be able to be turned on simply using a > simple robust configuration and to be able to respond quickly to > transactions that are committed in the master for synchronous > operation. Well, the problem I've really got with this is that if you want sync replication, couching it in terms of WAL files in the first place seems like getting off on fundamentally the wrong foot. That still leaves you with all the BS about having to force WAL file switches (and eat LSN space) for all sorts of undesirable reasons. I think we want the API to operate more like a WAL stream. I would envision the slaves connecting to the master's replication port and asking "feed me WAL beginning at LSN position thus-and-so", with no notion of WAL file boundaries exposed anyplace. The point about not wanting to archive lots of WAL on the master would imply that the master reserves the right to fail if the requested starting position is too old, whereupon the slave needs some way to resync --- but that probably involves something close to taking a fresh base backup to copy to the slave. You either have the master not recycle its WAL while the backup is going on (so the slave can start reading afterwards), or expect the slave to absorb and buffer the WAL stream while the backup is going on. In neither case is there any reason to have an API that involves fetching arbitrary chunks of past WAL, and certainly not one that is phrased as fetching specific WAL segment files. There are still some interesting questions in this about exactly how you switch over from "catchup mode" to following the live WAL broadcast. With the above design it would be the master's responsibility to manage that, since presumably the requested start position will almost always be somewhat behind the live end of WAL. It might be nicer to push that complexity to the slave side, but then you do need two data paths somehow (ie, retrieving the slightly-stale WAL is separated from tracking live events). Which is what you're saying we should avoid, and I do see the point there. regards, tom lane
Re: Re: Synch Rep: direct transfer of WAL file from the primary to the standby
От
Heikki Linnakangas
Дата:
Tom Lane wrote: > Greg Stark <gsstark@mit.edu> writes: >> On Tue, Jul 7, 2009 at 4:49 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> This design seems totally wrong to me. >>> ... > >> But this conflicts with earlier discussions where we were concerned >> about the length of the path wal has to travel between the master and >> the slaves. We want slaves to be able to be turned on simply using a >> simple robust configuration and to be able to respond quickly to >> transactions that are committed in the master for synchronous >> operation. > > Well, the problem I've really got with this is that if you want sync > replication, couching it in terms of WAL files in the first place seems > like getting off on fundamentally the wrong foot. That still leaves you > with all the BS about having to force WAL file switches (and eat LSN > space) for all sorts of undesirable reasons. I think we want the > API to operate more like a WAL stream. I think we all agree on that. > I would envision the slaves > connecting to the master's replication port and asking "feed me WAL > beginning at LSN position thus-and-so", with no notion of WAL file > boundaries exposed anyplace. Yep, that's the way I envisioned it to work in my protocol suggestion that Fujii adopted (http://archives.postgresql.org/message-id/4951108A.5040608@enterprisedb.com). The <begin> and <end> values are XLogRecPtrs, not WAL filenames. >The point about not wanting to archive > lots of WAL on the master would imply that the master reserves the right > to fail if the requested starting position is too old, whereupon the > slave needs some way to resync --- but that probably involves something > close to taking a fresh base backup to copy to the slave. Works for me, except that people will want the ability to use a PITR archive for the catchup, if available. The master should have no business business peeking into the archive, however. That should be implemented entirely in the slave. And I'm sure people will want the option to retain WAL longer in the master, to avoid an expensive resync if the slave falls behind. It would be simple to provide a GUC option for "always retain X GB of old WAL in pg_xlog". > There are still some interesting questions in this about exactly how you > switch over from "catchup mode" to following the live WAL broadcast. > With the above design it would be the master's responsibility to manage > that, since presumably the requested start position will almost always > be somewhat behind the live end of WAL. It might be nicer to push that > complexity to the slave side, but then you do need two data paths > somehow (ie, retrieving the slightly-stale WAL is separated from > tracking live events). Which is what you're saying we should avoid, > and I do see the point there. Yeah, that logic belongs to the master. We'll want to send message from the master to the slave when the catchup is done, so that the slave knows it's up-to-date. For logging, if for no other reason. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > And I'm sure people will want the option to retain WAL longer in the > master, to avoid an expensive resync if the slave falls behind. It would > be simple to provide a GUC option for "always retain X GB of old WAL in > pg_xlog". Right, we would want to provide some more configurability on the when-to-recycle-WAL decision than there is now. But the basic point is that I don't see the master pg_xlog as being a long-term archive. The amount of back WAL that you'd want to keep there is measured in minutes or hours, not weeks or months. (If nothing else, there is no point in keeping so much WAL that catching up by scanning it would take longer than taking a fresh base backup. My impression from recent complaints about our WAL-reading speed is that that might be a pretty tight threshold ...) regards, tom lane
Re: Re: Synch Rep: direct transfer of WAL file from the primary to the standby
От
Dimitri Fontaine
Дата:
Le 7 juil. 09 à 21:12, Tom Lane a écrit : > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> And I'm sure people will want the option to retain WAL longer in the >> master, to avoid an expensive resync if the slave falls behind. It >> would >> be simple to provide a GUC option for "always retain X GB of old >> WAL in >> pg_xlog". > > Right, we would want to provide some more configurability on the > when-to-recycle-WAL decision than there is now. But the basic point > is that I don't see the master pg_xlog as being a long-term archive. > The amount of back WAL that you'd want to keep there is measured in > minutes or hours, not weeks or months. Could we add yet another postmaster specialized child to handle the archive, which would be like a default archive_command implemented in core. This separate process could then be responsible for feeding the slave(s) with the WAL history for any LSN not available in pg_xlog anymore. The bonus would be to have a good reliable WAL archiving default setup for simple PITR and simple replication setups. One of the reasons PITR looks so difficult is that it involves reading a lot of documentation then hand-writing scripts even in the simple default case. > (If nothing else, there is no point in keeping so much WAL that > catching > up by scanning it would take longer than taking a fresh base backup. > My impression from recent complaints about our WAL-reading speed is > that > that might be a pretty tight threshold ...) Could the design above make it so that your later PITR backup is always an option for setting up a WAL Shipping slave? Regards, -- dim
Dimitri Fontaine <dfontaine@hi-media.com> writes: > Could we add yet another postmaster specialized child to handle the > archive, which would be like a default archive_command implemented in > core. I think this fails the basic sanity check: do you need it to still work when the master is dead. It's reasonable to ask the master to supply a few gigs of very-recent WAL, but as soon as the word "archive" enters the conversation, you should be thinking in terms of a different machine. Or at least a design that easily scales to put the archive on a different machine. regards, tom lane
On Tue, Jul 7, 2009 at 8:12 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > (If nothing else, there is no point in keeping so much WAL that catching > up by scanning it would take longer than taking a fresh base backup. > My impression from recent complaints about our WAL-reading speed is that > that might be a pretty tight threshold ...) Well those are two independent variables. The time taken to scan WAL is dependent on the transaction rate and the time to take a fresh backup is dependent on the total database size. There are plenty of low transaction rate humungous databases where it would be faster to replay weeks of transactions than try to take a fresh base backup. -- greg http://mit.edu/~gsstark/resume.pdf
Hi, On Wed, Jul 8, 2009 at 12:49 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: > This design seems totally wrong to me. It's confusing the master's > pg_xlog directory with the archive. We should *not* use pg_xlog as > a long-term archive area; that's terrible from both a performance > and a reliability perspective. Performance because pg_xlog has to be > fairly high-speed storage, which conflicts with it needing to hold > a lot of stuff; and reliability because the entire point of all this > is to survive a master server crash, and you're probably not going to > have its pg_xlog anymore after that. Yeah, I agree that pg_xlog is not a long-term archive area. So, in my design, the primary server tries to read the old XLOG file from not only pg_xlog but also an archive if available, and transfers it. > If slaves need to be able to get at past WAL, they should be getting > it from a separate archive server that is independent of the master DB. You assume that restore_command which retrieves the old XLOG file from a separate archive server is specified in the standby? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, On Wed, Jul 8, 2009 at 4:00 AM, Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> wrote: >> I would envision the slaves >> connecting to the master's replication port and asking "feed me WAL >> beginning at LSN position thus-and-so", with no notion of WAL file >> boundaries exposed anyplace. > > Yep, that's the way I envisioned it to work in my protocol suggestion > that Fujii adopted > (http://archives.postgresql.org/message-id/4951108A.5040608@enterprisedb.com). > The <begin> and <end> values are XLogRecPtrs, not WAL filenames. If <begin> indicates the middle of the XLOG file, the file written to the standby is partial. Is this OK? After two server failed, the XLOG file including <begin> might still be required for crash recovery of the standby server. But, since it's partial, the crash recovery would fail. I think that any XLOG file should be written to the standby as it can be replayed by a normal recovery. >>The point about not wanting to archive >> lots of WAL on the master would imply that the master reserves the right >> to fail if the requested starting position is too old, whereupon the >> slave needs some way to resync --- but that probably involves something >> close to taking a fresh base backup to copy to the slave. What if the XLOG file required for recovery has gone while doing resync a large data? In this case, the standby might never start because the requested starting position is always too old. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, Thanks for the brilliant comments! On Wed, Jul 8, 2009 at 4:00 AM, Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> wrote: >> There are still some interesting questions in this about exactly how you >> switch over from "catchup mode" to following the live WAL broadcast. >> With the above design it would be the master's responsibility to manage >> that, since presumably the requested start position will almost always >> be somewhat behind the live end of WAL. It might be nicer to push that >> complexity to the slave side, but then you do need two data paths >> somehow (ie, retrieving the slightly-stale WAL is separated from >> tracking live events). Which is what you're saying we should avoid, >> and I do see the point there. > > Yeah, that logic belongs to the master. > > We'll want to send message from the master to the slave when the catchup > is done, so that the slave knows it's up-to-date. For logging, if for no > other reason. This seems to be a main difference between us. You and Tom think that the catchup (transferring the old XLOG file) and WAL streaming (shipping the latest XLOG records continuously) are performed in serial by using the same connection. I think that in parallel by using more than one connection. I'd like to build consensus which design should be chosen. If my design is worse, I'll change the patch according to the other design. In my design, WAL streaming is performed between walsender and walreceiver. In parallel with that, the startup process requests the old XLOG file to a normal backend if it's not found during recovery. If the startup process has reached the WAL streaming start position, it's guaranteed that all the XLOG files required for recovery exist in the standby, which means that it's up-to-date. After that, the startup process replays only the records shipped by WAL streaming. The advantage of my design is: - It's guaranteed that the standby can catch up with the primary within a reasonable period. - We can keep walsender simple. It has only to take care of the latest XLOG records (ie. doesn't need to control the oldrecords and some history files). And, it doesn't need to calculate whether the standby is already up-to-date or not bycomparing some LSNs. - In the future, in order to make the standby catch up more quickly, we can easily extend the mechanism so that two or moreold XLOG files might be transferred concurrently by using multiple connections. What is your opinion? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: Re: Synch Rep: direct transfer of WAL file from the primary to the standby
От
Dimitri Fontaine
Дата:
Hi, Tom Lane <tgl@sss.pgh.pa.us> writes: > I think this fails the basic sanity check: do you need it to still work > when the master is dead. I don't get it. Why would we want to setup a slave against a dead master? The way I understand the current design of Synch Rep, when you start a new slave the following happen: 1. init: slave asks the master the current LSN and start streaming WAL 2. setup: slave asks the master for missing WALs from its current position to this LSN it just got, and apply them allto reach initial LSN (this happen in parallel to 1.) 3. catchup: slave has replayed missing WALs and now is replaying the stream he received in parallel, and which appliesfrom init LSN (just reached) 4. sync: slave is no more lagging, it's applying the stream as it gets it, either as part of the master transaction ornot depending on the GUC settings So, what I'm understanding you're saying is that the slave still should be able to setup properly when master died before it synced. What I'm saying is that if master dies before any sync slave exists, you get to start from backups (filesystem snaphost + archives for example, PITR recovery etc), as there's no slave. Regards, -- dim
Re: Re: Synch Rep: direct transfer of WAL file from the primary to the standby
От
"Kevin Grittner"
Дата:
Dimitri Fontaine <dfontaine@hi-media.com> wrote: > 4. sync: slave is no more lagging, it's applying the stream as it > gets it, either as part of the master transaction or not > depending on the GUC settings I think the interesting bit is when you're at this point and the connection between the master and slave goes down for a couple days. How do you handle that? -Kevin
Re: Re: Synch Rep: direct transfer of WAL file from the primary to the standby
От
Dimitri Fontaine
Дата:
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Dimitri Fontaine <dfontaine@hi-media.com> wrote: > >> 4. sync: slave is no more lagging, it's applying the stream as it >> gets it, either as part of the master transaction or not >> depending on the GUC settings > > I think the interesting bit is when you're at this point and the > connection between the master and slave goes down for a couple days. > How do you handle that? Maybe how londiste handle the case could help us here: http://skytools.projects.postgresql.org/doc/londiste.ref.html#toc18 State | Owner | What is done ---------------------+--------+-------------------- NULL | replay | Changes state to "in-copy", launches londiste.py copy process,continues with it's work in-copy | copy | drops indexes, truncates, copies data in, restores indexes,changes state to "catching-up" catching-up | copy | replay events for that table only until no more batches (meanscurrent moment), change state to "wanna-sync:<tick_id>" and wait for state to change wanna-sync:<tick_id> | replay | catch up to given tick_id, change state to "do-sync:<tick_id>"and wait for state to change do-sync:<tick_id> | copy | catch up to given tick_id, both replay and copy must now beat same position. change state to "ok" and exit ok | replay | synced table, events can be applied Such state changes must guarantee that any process can die at any time and by just restarting it can continue where it left. "subscriber add" registers table with NULL state. "subscriber add —expect-sync" registers table with ok state. "subscriber resync" sets table state to NULL. Regards, -- dim
On 07/08/2009 09:59 AM, Kevin Grittner wrote: <blockquote cite="mid:4A545FFB020000250002855B@gw.wicourts.gov" type="cite"><prewrap="">Dimitri Fontaine <a class="moz-txt-link-rfc2396E" href="mailto:dfontaine@hi-media.com"><dfontaine@hi-media.com></a>wrote: </pre><blockquote type="cite"><pre wrap="">4. sync: slave is no more lagging, it's applying the stream as it gets it, either as part of the master transactionor not depending on the GUC settings </pre></blockquote><pre wrap=""> I think the interesting bit is when you're at this point and the connection between the master and slave goes down for a couple days. How do you handle that?</pre></blockquote><br /> Been following with great interest...<br /><br /> If the updates are notperformed at a regular enough interval, the slave is not truly a functioning standby. I think it's a different problemdomain, probably best served by the existing pg_standby support? If the slave can be out of touch with the masterfor an extended period of time, near real time logs provide no additional benefit over just shipping the archived WALlogs and running the standby in continuous recovery mode?<br /><br /> Cheers,<br /> mark<br /><br /><pre class="moz-signature"cols="72">-- Mark Mielke <a class="moz-txt-link-rfc2396E" href="mailto:mark@mielke.cc"><mark@mielke.cc></a> </pre>
Re: Re: Synch Rep: direct transfer of WAL file from the primary to the standby
От
Heikki Linnakangas
Дата:
Mark Mielke wrote: > On 07/08/2009 09:59 AM, Kevin Grittner wrote: >> I think the interesting bit is when you're at this point and the >> connection between the master and slave goes down for a couple days. >> How do you handle that? > > Been following with great interest... > > If the updates are not performed at a regular enough interval, the slave > is not truly a functioning standby. I think it's a different problem > domain, probably best served by the existing pg_standby support? If the > slave can be out of touch with the master for an extended period of > time, near real time logs provide no additional benefit over just > shipping the archived WAL logs and running the standby in continuous > recovery mode? Might be easier to set up than pg_standby.. But more importantly, it can happen by accident. Someone trips on the power plug of the slave on Friday, and it goes unnoticed until Monday when DBA comes to work. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Re: Synch Rep: direct transfer of WAL file from theprimary to the standby
От
"Kevin Grittner"
Дата:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > But more importantly, it can happen by accident. Someone trips on > the power plug of the slave on Friday, and it goes unnoticed until > Monday when DBA comes to work. We've had people unplug things by accident exactly that way. :-/ We've also had replication across part of our WAN go down for the better part of a day because a beaver chewed through a fiber optic cable where it ran through a marsh. Our (application framework based) replication just picks up where it left off, without any intervention, when connectivity is restored. I think it would be a mistake to design something less robust than that. By the way, we don't use any state transitions for this, other than keeping track of when we seem to have a working connection. The client side knows what it last got, and when its reconnection attempts eventually succeed it makes a request of the server side to provide a stream of transactions from that point on. The response to that request continues indefinitely, as long as the connection is up, which can be months at a time. -Kevin "Everything should be made as simple as possible, but no simpler." - Albert Einstein
Re: Re: Synch Rep: direct transfer of WAL file from the primary to the standby
От
Heikki Linnakangas
Дата:
Fujii Masao wrote: > On Wed, Jul 8, 2009 at 4:00 AM, Heikki > Linnakangas<heikki.linnakangas@enterprisedb.com> wrote: >>> I would envision the slaves >>> connecting to the master's replication port and asking "feed me WAL >>> beginning at LSN position thus-and-so", with no notion of WAL file >>> boundaries exposed anyplace. >> Yep, that's the way I envisioned it to work in my protocol suggestion >> that Fujii adopted >> (http://archives.postgresql.org/message-id/4951108A.5040608@enterprisedb.com). >> The <begin> and <end> values are XLogRecPtrs, not WAL filenames. > > If <begin> indicates the middle of the XLOG file, the file written to the > standby is partial. Is this OK? After two server failed, the XLOG file > including <begin> might still be required for crash recovery of the > standby server. But, since it's partial, the crash recovery would fail. > I think that any XLOG file should be written to the standby as it can > be replayed by a normal recovery. The standby can store the streamed WAL to files in pg_xlog of the standby, to facilitate crash recovery, but it doesn't need to be exposed in the protocol. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Hi, On Wed, Jul 8, 2009 at 10:59 PM, Kevin Grittner<Kevin.Grittner@wicourts.gov> wrote: > Dimitri Fontaine <dfontaine@hi-media.com> wrote: > >> 4. sync: slave is no more lagging, it's applying the stream as it >> gets it, either as part of the master transaction or not >> depending on the GUC settings > > I think the interesting bit is when you're at this point and the > connection between the master and slave goes down for a couple days. > How do you handle that? In the current design of synch rep, you have only to restart the standby after repairing the network. The startup process of the standby would restart an archive recovery from the last restart point, and request the missing file from the primary if it's found. On the other hand, WAL streaming would start from the current XLOG position of the primary, which is performed by walsender and walreceiver. If the file required for the archive recovery has gone from the primary (pg_xlog and archive) during a couple of days, and now exists only in a separate archive server, the archive recovery by the standby would fail. In this case, you need to copy the missing files from the archive server to the standby before restarting the standby. Otherwise you need to make a new base backup of the primary, and start the setup of the standby from the beginning. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Hi, On Tue, Jul 7, 2009 at 8:51 PM, Fujii Masao<masao.fujii@gmail.com> wrote: > http://archives.postgresql.org/message-id/4951108A.5040608@enterprisedb.com >> I don't think we need or should >> allow running regular queries before entering "replication mode". the >> backend should become a walsender process directly after authentication. > > I changed the protocol according to your suggestion. > Here is the current protocol: Just to the record, I'd like to explain the correspondence relationship between Heikki's protocol and mine. > ReplicationStart (B) > Byte1('l'): Identifies the message as a replication-start indicator. > Int32(17): Length of message contents in bytes, including self. > Int32: The timeline ID > Int32: The start log file of replication > Int32: The start byte offset of replication This corresponds to "StartReplication <begin>". But this is sent from the primary to the standby, though "StartReplication" is sent in theopposite direction. So, in the current design, the primary determines the WAL streaming start position, which indicates the head of the next XLOG file of the switched file by walsender. > XLogData (B) > Byte1('w'): Identifies the message as XLOG records. > Int32: Length of message contents in bytes, including self. > Int8: Flag bits indicating how the records should be treated. > Int32: The log file number of the records. > Int32: The byte offset of the records. > Byte n: The XLOG records. This corresponds to "WALRange <begin> <end> <data>". But XLogData doesn't have <begin> in order to reduce the wire traffic because it can be calculated from <end> and the length of the records. > XLogResponse (F) > Byte1('r'): Identifies the message as ACK for XLOG records. > Int32: Length of message contents in bytes, including self. > Int8: Flag bits indicating how the records were treated. > Int32: The log file number of the records. > Int32: The byte offset of the records. This corresponds to "ReplicatedUpTo <end>". They are almost the same. > If there is a missing XLOG file which is required for recovery, the > startup process connects to the primary as a normal client, and > receives the binary contents of the file by using the following SQL. > This has nothing to do with the above protocol. So, the transfer of > missing file and synchronous XLOG streaming are performed > concurrently. > > COPY (SELECT pg_read_xlogfilie('filename', true)) TO STDOUT WITH BINARY This corresponds to "RequestWAL <begin> <end>". Since the XLOG file written to the standby has to be recoverable, I use the filename instead of XLogRecPtr here, and make the primary send the whole file. Also, this filename can indicate not only XLOG file but also a history file. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: Re: Synch Rep: direct transfer of WAL file from the primary to the standby
От
"Kevin Grittner"
Дата:
Fujii Masao <masao.fujii@gmail.com> wrote: > Kevin Grittner<Kevin.Grittner@wicourts.gov> wrote: >> I think the interesting bit is when you're at this point and the >> connection between the master and slave goes down for a couple >> days. How do you handle that? > > In the current design of synch rep, you have only to restart the > standby after repairing the network. How long does the interruption need to last to require manual intervention? Would an automated retry make sense? (I'd bet that more days than not we lose connectivity to at least one of our remote sites for at least a few minutes.) -Kevin
Hi, On Thu, Jul 9, 2009 at 11:13 PM, Kevin Grittner<Kevin.Grittner@wicourts.gov> wrote: > Fujii Masao <masao.fujii@gmail.com> wrote: >> Kevin Grittner<Kevin.Grittner@wicourts.gov> wrote: > >>> I think the interesting bit is when you're at this point and the >>> connection between the master and slave goes down for a couple >>> days. How do you handle that? >> >> In the current design of synch rep, you have only to restart the >> standby after repairing the network. > > How long does the interruption need to last to require manual > intervention? It depends on when the files required for the standby's recovery have gone from the primary. If they exist in the primary's archive forever, there would be no chance to do manual copying of them. > Would an automated retry make sense? (I'd bet that > more days than not we lose connectivity to at least one of our remote > sites for at least a few minutes.) Yes, but I think that it's not postgres' but clusterware's job. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center