Обсуждение: standby questions
hi list: im working in the setup of a warm standby server. im using postgres-8.2.5 in the slave and 8.2.4 in master right now, in production it will be only 8.2.5, everything in debian etch. we decided this solution because we want an easy to migrate/implement/adminstrate backup sever, and we have a few minutes of data loss (if we can be precise with the "starting point in time" of that loss). by now, i have the warm standby server working fine (at least, it logs that processes the archives...). if i am undestanding, the "complete" (in broad strokes) process for WAL archiving and warm standby, is something like this: 1) the primary server processes some transactions and generates some "WAL segment files" in the $DATA/pg_xlog/ dir of 16MB each (by default). 2) if archive_command is activated and working, the primary server sends (preferably with rsync or some other "atomic tool") the NEW WAL files to the standby server. Later, at some point, the primary server will delete this files when considers that are not necessary anymore. 3) the standby server receives the archive files and processes them somehow (almost everybody does this with pg_standby by now?) and keep waiting for a new WAL file or the trigger file. now, as you can see, im not a expert on database tech, maybe only a "brave admin", so, reading the docs and seeing the warm standby "working", i have a few questions (in any case, links to docs a welcome!!): -a "WAL segment file" is the same that a "log file segment"? -what are the "log file segments" referenced by checkpoint_segments? are the "binary logs" where postgres stores lasts transactions, the ones in $DATA/pg_xlog dir? if this is true, then: what means "Maximum distance between automatic WAL checkpoints"??? this is how often, in "log file segments", postgres will perform a checkpoint, generating a special checkpoint record from which to start the redo operation, dont? -what is the "restartpoint" named in the "Warm Standby Servers for High Availability" page? (http://www.postgresql.org/docs/8.2/static/warm-standby.html) i cant find a definition in any other doc... -how often a new WAL file is generated? this depends on the server load? -in one WAL file i could have one, more than one and even an incomplete "transaction"?? (where is the doc i need to read?!) -if i have incomplete transactions on a WAL, how the standby server processes that? what if the rest of the transaction never reaches the standby server? -how do i know exactly at which point in time (in transactions) my standby server is, if i have to switch to primary role? -how many archive files is safe to keep in the standby server? right now, i have "-k 100" in the pg_standby opts but in the pg_standby README says: "You should be wary against setting this number too low, since this may mean you cannot restart the standby. This is because the last restartpoint marked in the WAL files may be many files in the past and can vary considerably. This should be set to a value exceeding the number of WAL files that can be recovered in 2*checkpoint_timeout seconds, according to the value in the warm standby postgresql.conf. It is wholly unrelated to the setting of checkpoint_segments on either primary or standby." i cant use the %r because im in 8.2 and not thinking in upgrade by now... this is related to the restartpoint in my previous question... but i dont know exactly what it is so i cant make a more detailed analysis. finally: does anybody have a config like this one working with heartbeat? is that recommendable? wow! sorry for the huge post... i think i tied some loose ends just by writing this email!! :D thanks in advance!! -- Roberto Scattini ___ _ ))_) __ )L __ ((__)(('(( ((_)
> 2) if archive_command is activated and working, the primary > server sends (preferably with rsync or some other "atomic tool") the > NEW WAL files to the standby server. Later, at some point, the primary > server will delete this files when considers that are not necessary > anymore. > Is 'scp' by itself considered an "atomic tool" for copying files to the standby server? Does "atomic" mean that the program should copy a file over using a temp file name and then renames at the end or does it mean something else? David
On Fri, 8 Feb 2008, David Wall wrote: > Is 'scp' by itself considered an "atomic tool" for copying files to the > standby server? Does "atomic" mean that the program should copy a file over > using a temp file name and then renames at the end or does it mean something > else? That's correct. You have to do it that way or the system in recovery mode can start to consume the new segment file before it has been completely copied over yet. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
> That's correct. You have to do it that way or the system in recovery > mode can start to consume the new segment file before it has been > completely copied over yet. Does pg_standby take care of this by checking file sizes or the like? In my testing with scp, we never experienced any problems, but I wonder if we were somehow "just lucky." David
On Sat, 9 Feb 2008, Roberto Scattini wrote: > -a "WAL segment file" is the same that a "log file segment"? Yes: WAL="write-ahead log". > -how often a new WAL file is generated? this depends on the server load? These are the WAL segment files that are produced by the database, the ones you're copying to the standby server. A new one comes out whenever the current one is filled. While it's not really exact, if you think of it as new one appearing after every 16MB of changes to the database that's the right general idea. > -what are the "log file segments" referenced by checkpoint_segments? > are the "binary logs" where postgres stores lasts transactions, the > ones in $DATA/pg_xlog dir? if this is true, then: > what means "Maximum distance between automatic WAL checkpoints"??? > this is how often, in "log file segments", postgres will perform a > checkpoint, generating a special checkpoint record from which to start > the redo operation, dont? A checkpoint happens after one of a few things have occurred: -The checkpoint_timeout has passed -checkpoint_segments worth of WAL files have been output since the last one -One is required to support a command (server shutdown for example) Because the 2nd of those depends on activity as mentioned above, that's why the "distance" between checkpoints can vary a bit. The checkpoint will re-use up to 2*checkpoint_segments+1 files on the primary server as part of its cleanup. > -in one WAL file i could have one, more than one and even an > incomplete "transaction"?? (where is the doc i need to read?!) > -if i have incomplete transactions on a WAL, how the standby server > processes that? what if the rest of the transaction never reaches the > standby server? Transactions that haven't been completed for whatever reason are rolled back as part of bringing the standby server up. > -how do i know exactly at which point in time (in transactions) my > standby server is, if i have to switch to primary role? You can run pg_controldata on the standby to see where it's at. > -what is the "restartpoint" named in the "Warm Standby Servers for > High Availability" page? I started to answer this one but realized I don't have a clear enough description here for you. Hopefully someone will chime in (and by someone I mean Simon) with more detail about how restartpoints are tracked and what that implies for setting -k in pg_standby for 8.2. Good questions, I've been collecting notes for a FAQ on this subject and I'll add some of these to it. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Fri, 8 Feb 2008, David Wall wrote: > Does pg_standby take care of this by checking file sizes or the like? In my > testing with scp, we never experienced any problems, but I wonder if we were > somehow "just lucky." pg_standby only processes files of exactly the length they're supposed to be. On Windows it even sleeps a bit after that to give time for things to settle. The main risky situation you could end up in is if you were using a copy program that created the whole file at its full size first then wrote the data to it. I don't think there are many programs that operate like that around and certainly scp doesn't do that. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Feb 9, 2008 5:50 AM, Greg Smith <gsmith@gregsmith.com> wrote: > On Fri, 8 Feb 2008, David Wall wrote: > > > Does pg_standby take care of this by checking file sizes or the like? In my > > testing with scp, we never experienced any problems, but I wonder if we were > > somehow "just lucky." > > pg_standby only processes files of exactly the length they're supposed to > be. On Windows it even sleeps a bit after that to give time for things to > settle. > > The main risky situation you could end up in is if you were using a copy > program that created the whole file at its full size first then wrote the > data to it. I don't think there are many programs that operate like that > around and certainly scp doesn't do that. > "atomic tool": The reason rsync is used in the archive_command is that rsync features an 'atomic copy' - that is, the in-progress destination file is created as a temp file, and then renamed when the copy is complete. In the situation above, where segments are archived straight to the directory that the slave reads from, 'cp' can cause an error whereby the slave attempts to process a partially-copied WAL segment. If this happens, postgres will emit an error like: PANIC: archive file "000000010000000000000031" has wrong size: 1810432 instead of 16777216 LOG: startup process (PID 11356) was terminated by signal 6 LOG: aborting startup due to startup process failure taken from http://archives.postgresql.org/sydpug/2006-10/msg00001.php thanks everybody!! -- Roberto Scattini ___ _ ))_) __ )L __ ((__)(('(( ((_)
Hi Roberto, > > -a "WAL segment file" is the same that a "log file segment"? > A WAL (Write Ahead Log) file is one of the numbered files in the pg_xlog directory. Keep in mind that you'll be archiving (in some cases) more than just WAL files, for example you might see other files appear in your archive directory when you do a PITR backup. > -what are the "log file segments" referenced by checkpoint_segments? > are the "binary logs" where postgres stores lasts transactions, the > ones in $DATA/pg_xlog dir? if this is true, then: > what means "Maximum distance between automatic WAL checkpoints"??? > this is how often, in "log file segments", postgres will perform a > checkpoint, generating a special checkpoint record from which to start > the redo operation, dont? > A single WAL file isn't indicative of a checkpoint. Rather, PostgreSQL will fill "checkpoint segments" WAL files and then checkpoint (in normal operation - though there are exceptions to this). So if checkpoint_segments is set to 3, then you'll see 48 MB of WAL files be generated prior to a checkpoint (3 * 16 MB WAL files). In the event of a crash, if PostgreSQL auto-recovers (not the warm-standby server taking over, the same instance that crashed re-starting) then it processes from the last checkpoint forward. Simply put, the WAL files contain the "differences" between what is in the PostgreSQL shared buffer pool and what is in the files that represent the database on disk. When a checkpoint occurres, the buffers "sync" to disk, so there are no differences between what is in memory and on disk (so the old WAL files would then be obsolete for automatic crash recovery, and a new set can be started). > -what is the "restartpoint" named in the "Warm Standby Servers for > High Availability" page? > (http://www.postgresql.org/docs/8.2/static/warm-standby.html) i cant > find a definition in any other doc... > The restart point would be the time when you issued a pg_start_backup() to take a PITR backup of your main server. > -how often a new WAL file is generated? this depends on the server load? > Yes. Server load and checkpoint timeout values would influence this. Also any manual file switches that you do... > -in one WAL file i could have one, more than one and even an > incomplete "transaction"?? (where is the doc i need to read?!) > When replay occurs only completed transactions will be replayed. Of course, transactions won't span a checkpoint in the WAL files. > -if i have incomplete transactions on a WAL, how the standby server > processes that? what if the rest of the transaction never reaches the > standby server? > Any transactions that are not completed will be discarded. > -how do i know exactly at which point in time (in transactions) my > standby server is, if i have to switch to primary role? You really don't. You can read your log files to find the last WAL that has been replayed, and if you know the last WAL generated on the primary (pg_controldata) then you should be able to figure out how many WAL files away you are. There are techniques to avoid the loss of any WAL files in the event of a crash (synchronous warm standby), that we teach in our performance tuning course...but I'm sure you can find directions on how to implement these online someplace. > -how many archive files is safe to keep in the standby server? right > now, i have "-k 100" in the pg_standby opts but in the pg_standby > README says: > "You should be wary against setting this number too low, > since this may mean you cannot restart the standby. This > is because the last restartpoint marked in the WAL files > may be many files in the past and can vary considerably. > This should be set to a value exceeding the number of WAL > files that can be recovered in 2*checkpoint_timeout seconds, > according to the value in the warm standby postgresql.conf. > It is wholly unrelated to the setting of checkpoint_segments > on either primary or standby." > This number would indicate the number of files you need to "keep" to ensure that if recovery is re-started it can continue successfully. I believe this is the number of WAL files since the last checkpoint, but if that is so it is a variable value. That's because a long-running transaction could end up exceeding checkpoint_segments, in which case PostgreSQL will exceed that number. The easiest thing is to use the '%r' parameter so PostgreSQL can tell pg_standby what to do.... If not, I would just leave it at 0 and periodically prune old files. I'm not 100% certain on this (but it certainly makes logical sense). I don't use pg_standby, I typically use a shell script to do this...which affords me a greater deal of customization. The main advantage to pg_standby is that it's a more-or-less "hands off" approach that is cross-platform compatible (Windows and Unix variants). Signalling components could be added to pg_standby at some point... > i cant use the %r because im in 8.2 and not thinking in upgrade by now... > this is related to the restartpoint in my previous question... but i > dont know exactly what it is so i cant make a more detailed analysis. > > finally: does anybody have a config like this one working with > heartbeat? is that recommendable? > Yes. However your configuration could lose transactions in the event of a crash of the primary (assuming you can't access its pg_xlog directory after the crash). If you're using heartbeat then you probably have the two servers relatively close together, and should consider a more robust solution if your hardware can support it. Hope that helps.. -- Chander Ganesan Open Technology Group, Inc. One Copley Parkway, Suite 210 Morrisville, NC 27560 Phone: 877-258-8987/919-463-0999 http://www.otg-nc.com Ask me about our Expert Comprehensive PostgreSQL, PostGIS & UMN Mapserver training.
On Sat, 2008-02-09 at 07:49 -0500, Chander Ganesan wrote: > Signalling components could be added to pg_standby at some point... What sort of thing are you looking for? pg_standby accepts a trigger file as well as various types of signal. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs wrote:
We use signals in shell scripts to trigger (in some cases) a WAL copy, and in others an immediate wake-from-sleep to copy a final WAL file to trigger a failover.
I didn't see anything about signals in the documentation at (http://www.postgresql.org/docs/8.3/static/pgstandby.html).On Sat, 2008-02-09 at 07:49 -0500, Chander Ganesan wrote:Signalling components could be added to pg_standby at some point...What sort of thing are you looking for? pg_standby accepts a trigger file as well as various types of signal
We use signals in shell scripts to trigger (in some cases) a WAL copy, and in others an immediate wake-from-sleep to copy a final WAL file to trigger a failover.
-- Chander Ganesan Open Technology Group, Inc. One Copley Parkway, Suite 210 Morrisville, NC 27560 Phone: 877-258-8987/919-463-0999 http://www.otg-nc.com