Обсуждение: pg_start_backup and pg_stop_backup Re: Re: [COMMITTERS] pgsql: Make CheckRequiredParameterValues() depend upon correct
On Wed, Apr 28, 2010 at 4:43 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > This doesn't contain any changes to pg_start_backup() yet, that's a > separate issue and still under discussion. I'm thinking of changing pg_start_backup and pg_stop_backup so that they just check that wal_level >= 'archive', and changing pg_stop_backup so that it doesn't wait for archiving when archive_mode is OFF. This change is very simple and enables us to take a base backup for SR even if archive_mode is OFF. Thought? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, 2010-04-28 at 19:40 +0900, Fujii Masao wrote: > On Wed, Apr 28, 2010 at 4:43 PM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: > > This doesn't contain any changes to pg_start_backup() yet, that's a > > separate issue and still under discussion. > > I'm thinking of changing pg_start_backup and pg_stop_backup so that > they just check that wal_level >= 'archive', and changing pg_stop_backup > so that it doesn't wait for archiving when archive_mode is OFF. > > This change is very simple and enables us to take a base backup for SR > even if archive_mode is OFF. Thought? Makes sense. I'm wondering whether this could cause problems with people taking hot backups that aren't aimed at SR. Perhaps we could have 2 new functions whose names are more closely linked to the exact purpose: pg_start_replication_copy() etc.. which then act exactly as you suggest. -- Simon Riggs www.2ndQuadrant.com
On Wed, Apr 28, 2010 at 6:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Wed, 2010-04-28 at 19:40 +0900, Fujii Masao wrote: >> On Wed, Apr 28, 2010 at 4:43 PM, Heikki Linnakangas >> <heikki.linnakangas@enterprisedb.com> wrote: >> > This doesn't contain any changes to pg_start_backup() yet, that's a >> > separate issue and still under discussion. >> >> I'm thinking of changing pg_start_backup and pg_stop_backup so that >> they just check that wal_level >= 'archive', and changing pg_stop_backup >> so that it doesn't wait for archiving when archive_mode is OFF. >> >> This change is very simple and enables us to take a base backup for SR >> even if archive_mode is OFF. Thought? > > Makes sense. > > I'm wondering whether this could cause problems with people taking hot > backups that aren't aimed at SR. Perhaps we could have 2 new functions > whose names are more closely linked to the exact purpose: > pg_start_replication_copy() etc.. > which then act exactly as you suggest. Hmm. That seems a bit complicated. Why can't we just let people use the existing functions the way they always have? ...Robert
On Wed, 2010-04-28 at 06:56 -0400, Robert Haas wrote: > On Wed, Apr 28, 2010 at 6:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Wed, 2010-04-28 at 19:40 +0900, Fujii Masao wrote: > >> On Wed, Apr 28, 2010 at 4:43 PM, Heikki Linnakangas > >> <heikki.linnakangas@enterprisedb.com> wrote: > >> > This doesn't contain any changes to pg_start_backup() yet, that's a > >> > separate issue and still under discussion. > >> > >> I'm thinking of changing pg_start_backup and pg_stop_backup so that > >> they just check that wal_level >= 'archive', and changing pg_stop_backup > >> so that it doesn't wait for archiving when archive_mode is OFF. > >> > >> This change is very simple and enables us to take a base backup for SR > >> even if archive_mode is OFF. Thought? > > > > Makes sense. > > > > I'm wondering whether this could cause problems with people taking hot > > backups that aren't aimed at SR. Perhaps we could have 2 new functions > > whose names are more closely linked to the exact purpose: > > pg_start_replication_copy() etc.. > > which then act exactly as you suggest. > > Hmm. That seems a bit complicated. Why can't we just let people use > the existing functions the way they always have? We can, but I already gave a reason why we should not. IIRC it was you that suggested changing the names of things if the behaviour changes. -- Simon Riggs www.2ndQuadrant.com
Robert Haas wrote: > On Wed, Apr 28, 2010 at 6:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> On Wed, 2010-04-28 at 19:40 +0900, Fujii Masao wrote: >>> On Wed, Apr 28, 2010 at 4:43 PM, Heikki Linnakangas >>> <heikki.linnakangas@enterprisedb.com> wrote: >>>> This doesn't contain any changes to pg_start_backup() yet, that's a >>>> separate issue and still under discussion. >>> I'm thinking of changing pg_start_backup and pg_stop_backup so that >>> they just check that wal_level >= 'archive', and changing pg_stop_backup >>> so that it doesn't wait for archiving when archive_mode is OFF. >>> >>> This change is very simple and enables us to take a base backup for SR >>> even if archive_mode is OFF. Thought? >> Makes sense. >> >> I'm wondering whether this could cause problems with people taking hot >> backups that aren't aimed at SR. Perhaps we could have 2 new functions >> whose names are more closely linked to the exact purpose: >> pg_start_replication_copy() etc.. >> which then act exactly as you suggest. > > Hmm. That seems a bit complicated. Why can't we just let people use > the existing functions the way they always have? Well, it would be nice to allow using pg_start_backup() on the primary when streaming replication is enabled, even if archiving isn't. Otherwise the only way to get the base backup for the standby is to shut down primary first, or use filesystem snapshot etc. The straightforward way to enable that would be to allow pg_start_backup() when wal_level >= 'archive', regardless of archive_mode. However, I'm worried that someone might take an online backup without archiving (and replication), not realizing that it's not safe. That risk is there already, though, if you restore from an online backup and forget to create recovery.conf. It will start up in inconsistent state. The proposed change would make it easier to make that mistake. I'm not sure what to do about it, maybe throw a warning if you start up a database and there's a backup_label file in the data directory. Something like: WARNING: database system was interrupted while backup was in progress HINT: If you are restoring from an online backup, you must use a WAL archive for the restore, or the database can be in inconsistent state That would also occur if the primary database crashes while a backup is being taken, in which case the warning can be ignored. Or maybe we should check in pg_start_backup() that either archive_mode or streaming replication (max_wal_senders > 0) is enabled. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, Apr 28, 2010 at 8:28 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Or maybe we should check in pg_start_backup() that either archive_mode > or streaming replication (max_wal_senders > 0) is enabled. I agree that pg_start_backup checks not only wal_level but also that. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On Wed, Apr 28, 2010 at 7:22 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Wed, 2010-04-28 at 06:56 -0400, Robert Haas wrote: >> On Wed, Apr 28, 2010 at 6:52 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> > On Wed, 2010-04-28 at 19:40 +0900, Fujii Masao wrote: >> >> On Wed, Apr 28, 2010 at 4:43 PM, Heikki Linnakangas >> >> <heikki.linnakangas@enterprisedb.com> wrote: >> >> > This doesn't contain any changes to pg_start_backup() yet, that's a >> >> > separate issue and still under discussion. >> >> >> >> I'm thinking of changing pg_start_backup and pg_stop_backup so that >> >> they just check that wal_level >= 'archive', and changing pg_stop_backup >> >> so that it doesn't wait for archiving when archive_mode is OFF. >> >> >> >> This change is very simple and enables us to take a base backup for SR >> >> even if archive_mode is OFF. Thought? >> > >> > Makes sense. >> > >> > I'm wondering whether this could cause problems with people taking hot >> > backups that aren't aimed at SR. Perhaps we could have 2 new functions >> > whose names are more closely linked to the exact purpose: >> > pg_start_replication_copy() etc.. >> > which then act exactly as you suggest. >> >> Hmm. That seems a bit complicated. Why can't we just let people use >> the existing functions the way they always have? > > We can, but I already gave a reason why we should not. > > IIRC it was you that suggested changing the names of things if the > behaviour changes. Absolutely, but I'm arguing that we shouldn't change the behavior in the first place. At least as I understand it, even when not using archive_mode, streaming replication, or hot standby, it's still perfectly legal to use pg_start_backup() to take a hot backup. I don't see why we would either (a) break that use case or (b) create another function that does the same thing but with one extra error check. ...Robert
Robert Haas wrote: > At least as I understand it, even when not using > archive_mode, streaming replication, or hot standby, it's still > perfectly legal to use pg_start_backup() to take a hot backup. Nope. The correct procedure to take a hot backup is described in http://www.postgresql.org/docs/8.4/interactive/continuous-archiving.html#BACKUP-TIPS. It involves setting archive_mode=on, and archive_command to a shell command that normally just returns true, except when backup is in progress. You can't take a hot backup without archiving (or streaming) at least temporarily. (except with filesystem-level snapshot capabilities). Which is unfortunate, really. I wish we had a mode where the server simply refrained from removing/recycling WAL segments while the backup is running. You could then just: 1. pg_start_backup() 2. tar the data directory, except for pg_xlog 3. tar pg_xlog 4. pg_stop_backup(). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Which is unfortunate, really. I wish we had a mode where the server
> simply refrained from removing/recycling WAL segments while the backup
> is running. You could then just:
> 1. pg_start_backup()
> 2. tar the data directory, except for pg_xlog
> 3. tar pg_xlog
> 4. pg_stop_backup().
I think there's a termination issue there --- the safe stop point
would (appear to be) past whatever WAL you'd copied during step 3.
Still, the possibility of adding modes such as this seems to me to be a
good argument for not inventing a new version of pg_start_backup/
pg_stop_backup every time.
        regards, tom lane
			
		On Wed, 2010-04-28 at 11:10 -0400, Robert Haas wrote: > > > > IIRC it was you that suggested changing the names of things if the > > behaviour changes. > > Absolutely, but I'm arguing that we shouldn't change the behavior in > the first place. At least as I understand it... I feel like you're just arguing against whatever I say - your reasoning makes no sense. Masao would not have proposed it as a change if it already worked like that, would he? Just reading the thread would tell you that much. Plus, you clearly don't know how it works now, so not sure why you're commenting at all, its just minor stuff and a few ideas. -- Simon Riggs www.2ndQuadrant.com
On Wed, Apr 28, 2010 at 11:25 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Robert Haas wrote: >> At least as I understand it, even when not using >> archive_mode, streaming replication, or hot standby, it's still >> perfectly legal to use pg_start_backup() to take a hot backup. > > Nope. The correct procedure to take a hot backup is described in > http://www.postgresql.org/docs/8.4/interactive/continuous-archiving.html#BACKUP-TIPS. > It involves setting archive_mode=on, and archive_command to a shell > command that normally just returns true, except when backup is in > progress. You can't take a hot backup without archiving (or streaming) > at least temporarily. (except with filesystem-level snapshot capabilities). Oh. Well, in that case the proposed change seems reasonable... but what do you mean by "except with filesystem-level snapshot capabilities"? ...Robert
On Wed, 2010-04-28 at 12:44 -0400, Robert Haas wrote: > On Wed, Apr 28, 2010 at 11:25 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: > > Robert Haas wrote: > >> At least as I understand it, even when not using > >> archive_mode, streaming replication, or hot standby, it's still > >> perfectly legal to use pg_start_backup() to take a hot backup. > > > > Nope. The correct procedure to take a hot backup is described in > > http://www.postgresql.org/docs/8.4/interactive/continuous-archiving.html#BACKUP-TIPS. > > It involves setting archive_mode=on, and archive_command to a shell > > command that normally just returns true, except when backup is in > > progress. You can't take a hot backup without archiving (or streaming) > > at least temporarily. (except with filesystem-level snapshot capabilities). > > Oh. Well, in that case the proposed change seems reasonable... but > what do you mean by "except with filesystem-level snapshot > capabilities"? Like LVM, SANS or ZFS. Joshua D. Drake > > ...Robert > -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering
Robert Haas wrote: > but > what do you mean by "except with filesystem-level snapshot > capabilities"? If you have a filesystem that supports atomic snapshots, you can take a snapshot of the filesystem the data directory resides on, and then copy the data directory from the snapshot at your leisure, without pg_start/stop_backup(). It is entirely invisible to PostgreSQL and works just like copying the data directory after an immediate shutdown. The server will perform crash recovery after restore. Virtualization software, logical volume managers and SANs tend to have such features, in addition to filesystems. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Well, it would be nice to allow using pg_start_backup() on the primary
> when streaming replication is enabled, even if archiving isn't.
> Otherwise the only way to get the base backup for the standby is to shut
> down primary first, or use filesystem snapshot etc.
I think I must be missing something: exactly how would you fire up a new
standby from such a base backup, if you weren't running archiving?
If you aren't archiving then there's no guarantee that you'll still have
a continuous WAL series starting from the start of the backup.
IOW I think that the requirement in pg_start_backup shouldn't be relaxed
without some more thought/work.
        regards, tom lane
			
		
> IOW I think that the requirement in pg_start_backup shouldn't be relaxed
> without some more thought/work.
Yeah, I was talking to Bruce about that this AM, and it seems like a
feature we *need* to have ... for 9.1.
I'm sufficiently concerned about the amount of flux HS/SR is in right
now that I'd like to declare it "good enough" and move towards release.Otherwise we'll tinker with it forever and there
willbe no 9.0.
 
"Release early, release often" *is* the OSS mantra, after all.  The
question now isn't "Is binary replication perfect" but "is it *good
enough* for some substantial portion of our users".   And I think the
answer to the latter question is, at this point, yes.
--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 
			
		Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> Well, it would be nice to allow using pg_start_backup() on the primary >> when streaming replication is enabled, even if archiving isn't. >> Otherwise the only way to get the base backup for the standby is to shut >> down primary first, or use filesystem snapshot etc. > > I think I must be missing something: exactly how would you fire up a new > standby from such a base backup, if you weren't running archiving? I was replying to Robert's thought on using pg_start/stop_backup() for taking a hot backup. Not for bootstrapping a standby. > If you aren't archiving then there's no guarantee that you'll still have > a continuous WAL series starting from the start of the backup. I wasn't really thinking of this use case, but you could set wal_keep_segments "high enough". Not a configuration I would recommend for high availability, but should be fine for setting up a streaming replication standby for testing etc. If we don't allow pg_start/stop_backup() with archive_mode=off and max_wal_senders>0, there's no way to bootstrap a streaming replication standby without archiving. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2010-04-28 at 11:11 -0700, Josh Berkus wrote: > > IOW I think that the requirement in pg_start_backup shouldn't be relaxed > > without some more thought/work. > > Yeah, I was talking to Bruce about that this AM, and it seems like a > feature we *need* to have ... for 9.1. > > I'm sufficiently concerned about the amount of flux HS/SR is in right > now that I'd like to declare it "good enough" and move towards release. > Otherwise we'll tinker with it forever and there will be no 9.0. > > "Release early, release often" *is* the OSS mantra, after all. The > question now isn't "Is binary replication perfect" but "is it *good > enough* for some substantial portion of our users". And I think the > answer to the latter question is, at this point, yes. As of exactly today, my answer, for my piece of this is also "yes". I'm not convinced that the same is true across the board. Some important changes have happened in last few days and I see more coming. -- Simon Riggs www.2ndQuadrant.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Tom Lane wrote:
>> If you aren't archiving then there's no guarantee that you'll still have
>> a continuous WAL series starting from the start of the backup.
> I wasn't really thinking of this use case, but you could set
> wal_keep_segments "high enough".
Ah.  Okay, that seems like a workable approach, at least for people with
reasonably predictable WAL loads.  We could certainly improve on it
later to make it more bulletproof, but it's usable now --- if we relax
the error checks.
(wal_keep_segments can be changed without restarting, right?)
> Not a configuration I would recommend
> for high availability, but should be fine for setting up a streaming
> replication standby for testing etc. If we don't allow
> pg_start/stop_backup() with archive_mode=off and max_wal_senders>0,
> there's no way to bootstrap a streaming replication standby without
> archiving.
Right.  +1 for weakening the tests, then.  Is there any use in looking
at wal_keep_segments as part of this test?
        regards, tom lane
			
		Heikki Linnakangas wrote: > Tom Lane wrote: >> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >>> Well, it would be nice to allow using pg_start_backup() on the primary >>> when streaming replication is enabled, even if archiving isn't. >>> Otherwise the only way to get the base backup for the standby is to shut >>> down primary first, or use filesystem snapshot etc. >> I think I must be missing something: exactly how would you fire up a new >> standby from such a base backup, if you weren't running archiving? > > I was replying to Robert's thought on using pg_start/stop_backup() for > taking a hot backup. Not for bootstrapping a standby. Scratch that, I just reread what I wrote, and starting a streaming replication standby from such a backup was exactly what I was describing.. >> If you aren't archiving then there's no guarantee that you'll still have >> a continuous WAL series starting from the start of the backup. > > I wasn't really thinking of this use case, but you could set > wal_keep_segments "high enough". Not a configuration I would recommend > for high availability, but should be fine for setting up a streaming > replication standby for testing etc. If we don't allow > pg_start/stop_backup() with archive_mode=off and max_wal_senders>0, > there's no way to bootstrap a streaming replication standby without > archiving. This still makes sense. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> Tom Lane wrote: >>> If you aren't archiving then there's no guarantee that you'll still have >>> a continuous WAL series starting from the start of the backup. > >> I wasn't really thinking of this use case, but you could set >> wal_keep_segments "high enough". > > Ah. Okay, that seems like a workable approach, at least for people with > reasonably predictable WAL loads. We could certainly improve on it > later to make it more bulletproof, but it's usable now --- if we relax > the error checks. Yeah, wal_keep_segments is wishy-woshy in general, not only with backups. > (wal_keep_segments can be changed without restarting, right?) It's PG_SIGHUP. >> Not a configuration I would recommend >> for high availability, but should be fine for setting up a streaming >> replication standby for testing etc. If we don't allow >> pg_start/stop_backup() with archive_mode=off and max_wal_senders>0, >> there's no way to bootstrap a streaming replication standby without >> archiving. > > Right. +1 for weakening the tests, then. Is there any use in looking > at wal_keep_segments as part of this test? I don't think so. There's no safe setting that would guarantee anything. We could check for wal_keep_segments>0, but any small number is the same practice. We don't insist on wal_keep_segments>0 to allow WAL streaming without archival in general, let's not treat taking the base backup differently. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Wed, 2010-04-28 at 14:21 -0400, Tom Lane wrote: > Is there any use in looking > at wal_keep_segments as part of this test? I would hope that pg_stop_backup() will have a conditional ERROR message to say ERROR backup inconsistent and cannot be used for SR HINT increase wal_keep_segments or enable archiving for your base backup I think it would also be useful to add a NOTICE to pg_start_backup() NOTICE archiving is not enabled. If we reach exceed wal_keep_segments WAL files then the backup will be invalidated. Expected time for this to happen is X (using linear extrapolation of WAL creation rate since last checkpoint) -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > On Wed, 2010-04-28 at 14:21 -0400, Tom Lane wrote: >> Is there any use in looking >> at wal_keep_segments as part of this test? > > I would hope that pg_stop_backup() will have a conditional ERROR message > to say > > ERROR backup inconsistent and cannot be used for SR > HINT increase wal_keep_segments or enable archiving for your base backup Hmm, you could start streaming the WAL before you start the backup, so the fact that you've already removed some segments that are needed to restore from the backup by the time pg_stop_backup() is called doesn't necessarily mean that the backup is useless. You'd need a stand-alone tool to do the streaming in that case, and no such tool exists yet, but I would be surprised if one doesn't appear on pgfoundry sooner or later :-). In case it's not clear to casual readers out there: You will get an error as soon as you try to start the standby, complaining that it can't find the WAL segment it needs in the primary anymore. Not silent corruption. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
* Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> [100428 14:49]: > You'd need a stand-alone tool to do the streaming in that case, and no > such tool exists yet, but I would be surprised if one doesn't appear on > pgfoundry sooner or later :-). And this tool is something I will eventually be interested in working on or collaborating on... I'm hoping to be able to build a tool that: 1) Connects to PG walsender (a la walreceiver) 2) Streams WAL from pg master 3) Saves WAL into "files" (a la archive)... i.e. I'm looking to keep a more-up-to-date PITR archive than waiting for traditional WAL file archiving... And eventually (9.1+) I'm hoping that walsender will have grown enough to allow me to configure PG to wait on the commit until the master has both sync'ed the WAL file, and received a "sync ack" from my wal-stream-save-to-file tool... Because then I'll have a situation where I can easily have a synchronous, separate machine copy of all my WAL without having to jump through hoops with stuff like drbd or MD+nbd, etc as my WAL disk... And yes, I don't personally care about streaming replication replaying WAL as it comes, or running queries in recovery... I'm looking towards PG not saying my transaction is committed unless it's safely on that machines disks (or BBcache) *and* another machine... That's the type of replication a paranoid guy like me waits for... Yes, that's possible now with exotic os/net/fs configuration, but imagine how nice it will be when it can all be done in userspace with just PG (and pg-compatible) tool, etc... -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Hmm, you could start streaming the WAL before you start the backup, so
> the fact that you've already removed some segments that are needed to
> restore from the backup by the time pg_stop_backup() is called doesn't
> necessarily mean that the backup is useless.
> You'd need a stand-alone tool to do the streaming in that case, and no
> such tool exists yet, but I would be surprised if one doesn't appear on
> pgfoundry sooner or later :-).
Yeah.  ISTM the real bottom line here is that we have only a weak grasp
on how these features will end up being used; or for that matter what
the common error scenarios will be.  I think that for the time being
we should err on the side of being permissive.  We can tighten things
up and add more nanny-ism in the warnings later on, when we have
more field experience.
        regards, tom lane
			
		Aidan Van Dyk <aidan@highrise.ca> wrote: > I'm hoping to be able to build a tool that: > > 1) Connects to PG walsender (a la walreceiver) > 2) Streams WAL from pg master > 3) Saves WAL into "files" (a la archive)... > > i.e. I'm looking to keep a more-up-to-date PITR archive than > waiting for traditional WAL file archiving... I'm interested in that, too. > I don't personally care about streaming replication replaying WAL > as it comes, or running queries in recovery... I'm with you that far, but I wouldn't want the sender to wait for remote persistence. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Aidan Van Dyk <aidan@highrise.ca> wrote: > >> I'm hoping to be able to build a tool that: >> >> 1) Connects to PG walsender (a la walreceiver) >> 2) Streams WAL from pg master >> 3) Saves WAL into "files" (a la archive)... >> >> i.e. I'm looking to keep a more-up-to-date PITR archive than >> waiting for traditional WAL file archiving... > > I'm interested in that, too. That looks like we have that integrated into walreceiver the day we have cascading support, right? Or maybe we need a special mode of operation where the receiver is (talking to) an archiver. >> I don't personally care about streaming replication replaying WAL >> as it comes, or running queries in recovery... > > I'm with you that far, but I wouldn't want the sender to wait for > remote persistence. That's synchronous replication and its set of synchronicity setting, ranging from sent on the network to the slave, fsync()ed at the slave and applied already on the slave. IMO the real fun begins when we talk about multi-slaves support and their roles (a failover slave wants the master to wait for it to have applied the WAL before to commit, a reporting slave not so much). So you'd set the Availability level on each slave and wouldn't commit on the master until each slave got what it's configured for, or something like that. SyncRep in 9.1 already sounds darn interesting :) Regards, -- dim
* Kevin Grittner <Kevin.Grittner@wicourts.gov> [100428 15:51]: > > I don't personally care about streaming replication replaying WAL > > as it comes, or running queries in recovery... > > I'm with you that far, but I wouldn't want the sender to wait for > remote persistence. I remember a presentation at pgcon a while ago, it was probaly Fujii (from NTT?) about their log streaming, and at that time, they talked about different "sync" options... So I'ld love to be able to have comits be: async (like current option) local wal sync (like current) local wal sync + walsender sent local wal sync +walsender confirmed And ideally, the "walsender sent/confirmed" would even allow making sure it was sent/confirmed to $X connections... I want to be able to guarantee it's on 2 machines, not that if my slave was connected it would be on there, but something happened and my "slave" has disconnected, so it's only got local WAL... And then on whatever "tool" is receiving the log streaming, it can be set to confirm when either: received buffer write buffer to file write buffer to file + sync write buffer to file + sync+ replay That should give you all the sync levels they talked about in their presentation... -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
On Wed, 2010-04-28 at 12:44 -0400, Robert Haas wrote: > On Wed, Apr 28, 2010 at 11:25 AM, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: > > Robert Haas wrote: > >> At least as I understand it, even when not using > >> archive_mode, streaming replication, or hot standby, it's still > >> perfectly legal to use pg_start_backup() to take a hot backup. > > > > Nope. The correct procedure to take a hot backup is described in > > http://www.postgresql.org/docs/8.4/interactive/continuous-archiving.html#BACKUP-TIPS. > > It involves setting archive_mode=on, and archive_command to a shell > > command that normally just returns true, except when backup is in > > progress. You can't take a hot backup without archiving (or streaming) > > at least temporarily. (except with filesystem-level snapshot capabilities). > > Oh. Well, in that case the proposed change seems reasonable... but > what do you mean by "except with filesystem-level snapshot > capabilities"? Like LVM, SANS or ZFS. Joshua D. Drake > > ...Robert > -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering
Aidan Van Dyk wrote: > I remember a presentation at pgcon a while ago, it was probaly Fujii > (from NTT?) about their log streaming, and at that time, they talked > about different "sync" options... It's all outlined at http://wiki.postgresql.org/wiki/Streaming_Replication#Synchronization_capability -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
Dimitri Fontaine wrote: > IMO the real fun begins when we talk about multi-slaves support and > their roles (a failover slave wants the master to wait for it to have > applied the WAL before to commit, a reporting slave not so much). So > you'd set the Availability level on each slave and wouldn't commit on > the master until each slave got what it's configured for, or something > like that. > Ultimately the commit is stuck waiting for the slowest committing sync operation on the list; it's the bottleneck. Let's presume that the commit waits can be done in parallel, after sending the transaction to every slave. Given that and the situation you describe, having per-node sync levels only turns out to be a useful optimization if the reporting slave commits slower than the failover slave does. The master is going to be stuck waiting for the slowest one of the batch regardless of whether you've optimized them individually. There is a related situation that I think a per-node sync option would be more obviously useful for: local failover slave, remote disaster recovery slave over a WAN, where you accept that a serious disaster taking out a whole data center will lose some transactions. In that situation, you'd probably want fsync for the local slave, while going async for the remote datacenter. If the commits are done in a serial fashion, tuning sync per-node would be much more valuable in many use cases. Regardless, I wouldn't want to burden the first sync rep version with this requirement. Let's wait until the current scope is cleared before trying to move the goalposts for the people working on that. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us
On Wed, 2010-04-28 at 22:17 +0200, Dimitri Fontaine wrote: > IMO the real fun begins when we talk about multi-slaves support and > their roles (a failover slave wants the master to wait for it to have > applied the WAL before to commit, a reporting slave not so much). So > you'd set the Availability level on each slave and wouldn't commit on > the master until each slave got what it's configured for, or something > like that. Just for the record, I outlined desirable semantics for this on hackers in 2008 and want to keep those ideas on the table. http://archives.postgresql.org/pgsql-hackers/2008-07/msg01001.php My view is that it should be up to the master what happens on master. An additional standby connection should not have the ability to make transactions on the master wait. If we give control to the master rather than the standby, we are then able to allow transactions on the master choose how robust they should be, just as we do with synchronous_commit. IMHO that is extremely important, since we already know that sync rep performs poorly and applications need to mitigate that in some way. Those are the objectives, the parameters to do that are a different story and we might expect much debate. One way of doing this would be to have a parameter called synchronous_replication = N, which would cause the transaction on primary to wait for at least N standbys to reply that they have the data. This would allow settings like synchronous_commit = 0 --async synchronous_commit = 1 --first reply wins == max performance synchronous_commit = 2 --multiple replies needed == max availability ... -- Simon Riggs www.2ndQuadrant.com
Tom Lane wrote: > Yeah. ISTM the real bottom line here is that we have only a weak grasp > on how these features will end up being used; or for that matter what > the common error scenarios will be. I think that for the time being > we should err on the side of being permissive. We can tighten things > up and add more nanny-ism in the warnings later on, when we have > more field experience. Ok, here's a proposed patch. Per discussion, it relaxes the checks in pg_start/stop_backup() so that they can be used as long as wal_level >= 'archive', even if archiving is disabled. If archiving is not enabled, it can't wait for the files to be archived. Instead, it prints a notice: NOTICE: WAL archiving is not enabled, you must ensure that all required WAL segments are streamed or copied through other means to restore the backup That is instead of the usual notice when archiving is enabled: NOTICE: pg_stop_backup complete, all required WAL segments have been archived -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com *** a/src/backend/access/transam/xlog.c --- b/src/backend/access/transam/xlog.c *************** *** 8200,8217 **** pg_start_backup(PG_FUNCTION_ARGS) errmsg("recovery is in progress"), errhint("WAL control functions cannot be executed during recovery."))); ! if (!XLogArchivingActive()) ! ereport(ERROR, ! (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), ! errmsg("WAL archiving is not active"), ! errhint("archive_mode must be enabled at server start."))); ! ! if (!XLogArchiveCommandSet()) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), ! errmsg("WAL archiving is not active"), ! errhint("archive_command must be defined before " ! "online backups can be made safely."))); backupidstr = text_to_cstring(backupid); --- 8200,8210 ---- errmsg("recovery is in progress"), errhint("WAL control functions cannot be executed during recovery."))); ! if (!XLogIsNeeded()) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), ! errmsg("WAL level not sufficient for making an online backup"), ! errhint("wal_level must be set to 'archive' or 'hot_standby' at server start."))); backupidstr = text_to_cstring(backupid); *************** *** 8399,8409 **** pg_stop_backup(PG_FUNCTION_ARGS) errmsg("recovery is in progress"), errhint("WAL control functions cannot be executed during recovery."))); ! if (!XLogArchivingActive()) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), ! errmsg("WAL archiving is not active"), ! errhint("archive_mode must be enabled at server start."))); /* * OK to clear forcePageWrites --- 8392,8402 ---- errmsg("recovery is in progress"), errhint("WAL control functions cannot be executed during recovery."))); ! if (!XLogIsNeeded()) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), ! errmsg("WAL level not sufficient for making an online backup"), ! errhint("wal_level must be set to 'archive' or 'hot_standby' at server start."))); /* * OK to clear forcePageWrites *************** *** 8511,8526 **** pg_stop_backup(PG_FUNCTION_ARGS) CleanupBackupHistory(); /* ! * Wait until both the last WAL file filled during backup and the history ! * file have been archived. We assume that the alphabetic sorting ! * property of the WAL files ensures any earlier WAL files are safely ! * archived as well. * * We wait forever, since archive_command is supposed to work and we * assume the admin wanted his backup to work completely. If you don't * wish to wait, you can set statement_timeout. Also, some notices are * issued to clue in anyone who might be doing this interactively. */ XLByteToPrevSeg(stoppoint, _logId, _logSeg); XLogFileName(lastxlogfilename, ThisTimeLineID, _logId, _logSeg); --- 8504,8530 ---- CleanupBackupHistory(); /* ! * If archiving is enabled, wait for all the required WAL files to be ! * archived before returning. If archiving isn't enabled, the required ! * WAL needs to be transported via streaming replication (hopefully ! * with wal_keep_segments set high enough), or some more exotic ! * mechanism like polling and copying files from pg_xlog with script. ! * We have no control over those mechanisms, so it's up to the user to ! * ensure that he gets all the required WAL. ! * ! * We wait until both the last WAL file filled during backup and the ! * history file have been archived, and assume that the alphabetic ! * sorting property of the WAL files ensures any earlier WAL files are ! * safely archived as well. * * We wait forever, since archive_command is supposed to work and we * assume the admin wanted his backup to work completely. If you don't * wish to wait, you can set statement_timeout. Also, some notices are * issued to clue in anyone who might be doing this interactively. */ + if (XLogArchivingActive()) + { + /* XXX: fix indentation before committing */ XLByteToPrevSeg(stoppoint, _logId, _logSeg); XLogFileName(lastxlogfilename, ThisTimeLineID, _logId, _logSeg); *************** *** 8559,8564 **** pg_stop_backup(PG_FUNCTION_ARGS) --- 8563,8572 ---- ereport(NOTICE, (errmsg("pg_stop_backup complete, all required WAL segments have been archived"))); + } + else + ereport(NOTICE, + (errmsg("WAL archiving is not enabled, you must ensure that all required WAL segments are streamed or copiedthrough other means to restore the backup"))); /* * We're done. As a convenience, return the ending WAL location.
On Thu, Apr 29, 2010 at 5:38 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > NOTICE: WAL archiving is not enabled, you must ensure that all required > WAL segments are streamed or copied through other means to restore the > backup I might think about dropping the words "through other means" from this sentence. ...Robert
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Tom Lane wrote:
>> Yeah.  ISTM the real bottom line here is that we have only a weak grasp
>> on how these features will end up being used; or for that matter what
>> the common error scenarios will be.  I think that for the time being
>> we should err on the side of being permissive.  We can tighten things
>> up and add more nanny-ism in the warnings later on, when we have
>> more field experience.
> Ok, here's a proposed patch. Per discussion, it relaxes the checks in
> pg_start/stop_backup() so that they can be used as long as wal_level >=
> 'archive', even if archiving is disabled.
This patch seems reasonably noncontroversial (except possibly for
message wording, which we can fine-tune later anyway).  Please apply.
9.0beta1 is going to get wrapped in only a few hours.
BTW, the documentation for these functions might need a bit of adjustment.
        regards, tom lane
			
		Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > > Tom Lane wrote: > >> If you aren't archiving then there's no guarantee that you'll still have > >> a continuous WAL series starting from the start of the backup. > > > I wasn't really thinking of this use case, but you could set > > wal_keep_segments "high enough". > > Ah. Okay, that seems like a workable approach, at least for people with > reasonably predictable WAL loads. We could certainly improve on it > later to make it more bulletproof, but it's usable now --- if we relax > the error checks. > > (wal_keep_segments can be changed without restarting, right?) Should we allow -1 to mean "keep all segments"? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
On Fri, Apr 30, 2010 at 12:22 PM, Bruce Momjian <bruce@momjian.us> wrote: > Tom Lane wrote: >> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> > Tom Lane wrote: >> >> If you aren't archiving then there's no guarantee that you'll still have >> >> a continuous WAL series starting from the start of the backup. >> >> > I wasn't really thinking of this use case, but you could set >> > wal_keep_segments "high enough". >> >> Ah. Okay, that seems like a workable approach, at least for people with >> reasonably predictable WAL loads. We could certainly improve on it >> later to make it more bulletproof, but it's usable now --- if we relax >> the error checks. >> >> (wal_keep_segments can be changed without restarting, right?) > > Should we allow -1 to mean "keep all segments"? If that's what you want to do, use archive_mode. ...Robert
Robert Haas wrote: > On Fri, Apr 30, 2010 at 12:22 PM, Bruce Momjian <bruce@momjian.us> wrote: > > Tom Lane wrote: > >> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > >> > Tom Lane wrote: > >> >> If you aren't archiving then there's no guarantee that you'll still have > >> >> a continuous WAL series starting from the start of the backup. > >> > >> > I wasn't really thinking of this use case, but you could set > >> > wal_keep_segments "high enough". > >> > >> Ah. ?Okay, that seems like a workable approach, at least for people with > >> reasonably predictable WAL loads. ?We could certainly improve on it > >> later to make it more bulletproof, but it's usable now --- if we relax > >> the error checks. > >> > >> (wal_keep_segments can be changed without restarting, right?) > > > > Should we allow -1 to mean "keep all segments"? > > If that's what you want to do, use archive_mode. Uh, I assume that will require me to store the WAL files somewhere else, rather than keeping them in /pg_xlog, which I thought was the goal. Am I missing something? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
On Fri, 2010-04-30 at 12:22 -0400, Bruce Momjian wrote: > > > > (wal_keep_segments can be changed without restarting, right?) > > Should we allow -1 to mean "keep all segments"? Why is that not called "max_wal_segments"? wal_keep_segments sounds like its been through Google translate. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > On Fri, 2010-04-30 at 12:22 -0400, Bruce Momjian wrote: > > > > > > (wal_keep_segments can be changed without restarting, right?) > > > > Should we allow -1 to mean "keep all segments"? > > Why is that not called "max_wal_segments"? wal_keep_segments sounds like > its been through Google translate. LOL, good one. I assume it was done so it would start with 'wal', but I see 'max_wal_senders', which doesn't start with 'wal' and would match your suggestion exactly. I think we should either rename 'wal_keep_segments' or 'max_wal_senders'. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
On Fri, Apr 30, 2010 at 1:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Fri, 2010-04-30 at 12:22 -0400, Bruce Momjian wrote: >> > >> > (wal_keep_segments can be changed without restarting, right?) >> >> Should we allow -1 to mean "keep all segments"? > > Why is that not called "max_wal_segments"? wal_keep_segments sounds like > its been through Google translate. Because it's not a maximum? ...Robert
On Fri, Apr 30, 2010 at 1:39 PM, Bruce Momjian <bruce@momjian.us> wrote: > Robert Haas wrote: >> On Fri, Apr 30, 2010 at 12:22 PM, Bruce Momjian <bruce@momjian.us> wrote: >> > Tom Lane wrote: >> >> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >> >> > Tom Lane wrote: >> >> >> If you aren't archiving then there's no guarantee that you'll still have >> >> >> a continuous WAL series starting from the start of the backup. >> >> >> >> > I wasn't really thinking of this use case, but you could set >> >> > wal_keep_segments "high enough". >> >> >> >> Ah. ?Okay, that seems like a workable approach, at least for people with >> >> reasonably predictable WAL loads. ?We could certainly improve on it >> >> later to make it more bulletproof, but it's usable now --- if we relax >> >> the error checks. >> >> >> >> (wal_keep_segments can be changed without restarting, right?) >> > >> > Should we allow -1 to mean "keep all segments"? >> >> If that's what you want to do, use archive_mode. > > Uh, I assume that will require me to store the WAL files somewhere else, > rather than keeping them in /pg_xlog, which I thought was the goal. Am > I missing something? Well, one of us is. Why would you want to retain all of your WAL logs in pg_xlog forever? ...Robert
On 04/30/2010 01:53 PM, Robert Haas wrote: > > Well, one of us is. Why would you want to retain all of your WAL logs > in pg_xlog forever? > > ...Robert > To create or re-synchronize SR slaves, one could change wal_keep_segments to -1, run a backup, wait for the slaves to catch up, and change it back to the default. This way no segments would be deleted until the system has reached a stable state. -- m. tharp
Robert Haas wrote: > On Fri, Apr 30, 2010 at 1:39 PM, Bruce Momjian <bruce@momjian.us> wrote: > > Robert Haas wrote: > >> On Fri, Apr 30, 2010 at 12:22 PM, Bruce Momjian <bruce@momjian.us> wrote: > >> > Tom Lane wrote: > >> >> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > >> >> > Tom Lane wrote: > >> >> >> If you aren't archiving then there's no guarantee that you'll still have > >> >> >> a continuous WAL series starting from the start of the backup. > >> >> > >> >> > I wasn't really thinking of this use case, but you could set > >> >> > wal_keep_segments "high enough". > >> >> > >> >> Ah. ?Okay, that seems like a workable approach, at least for people with > >> >> reasonably predictable WAL loads. ?We could certainly improve on it > >> >> later to make it more bulletproof, but it's usable now --- if we relax > >> >> the error checks. > >> >> > >> >> (wal_keep_segments can be changed without restarting, right?) > >> > > >> > Should we allow -1 to mean "keep all segments"? > >> > >> If that's what you want to do, use archive_mode. > > > > Uh, I assume that will require me to store the WAL files somewhere else, > > rather than keeping them in /pg_xlog, which I thought was the goal. ?Am > > I missing something? > > Well, one of us is. Why would you want to retain all of your WAL logs > in pg_xlog forever? Well, this email thread mentioned a case where you needed to increase wal_keep_segments to a sufficiently-high value, and of course figuring out such a value is harder than just having a way of turning off recycling with -1. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
On Fri, 2010-04-30 at 13:52 -0400, Robert Haas wrote: > On Fri, Apr 30, 2010 at 1:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Fri, 2010-04-30 at 12:22 -0400, Bruce Momjian wrote: > >> > > >> > (wal_keep_segments can be changed without restarting, right?) > >> > >> Should we allow -1 to mean "keep all segments"? > > > > Why is that not called "max_wal_segments"? wal_keep_segments sounds like > > its been through Google translate. > > Because it's not a maximum? I see the thinking, but why would you ever set it to be something that is *less* than the existing numbers? That would be pointless and indeed, does nothing. The only time you touch it at all is when you set it to be a value higher than the number of files that would normally be kept, and when that is the case it *will* be the maximum. So I say, max_wal_segments = 0 (default) meaning no limit, we just rotate as needed. We put a comment in the docs to say that if a value is selected less than 2*checkpoint_segments+1 then the value is overridden. -- Simon Riggs www.2ndQuadrant.com
On Fri, 2010-04-30 at 13:58 -0400, Bruce Momjian wrote: > Robert Haas wrote: > > On Fri, Apr 30, 2010 at 1:39 PM, Bruce Momjian <bruce@momjian.us> wrote: > > > Robert Haas wrote: > > >> On Fri, Apr 30, 2010 at 12:22 PM, Bruce Momjian <bruce@momjian.us> wrote: > > >> > Tom Lane wrote: > > >> >> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > > >> >> > Tom Lane wrote: > > >> >> >> If you aren't archiving then there's no guarantee that you'll still have > > >> >> >> a continuous WAL series starting from the start of the backup. > > >> >> > > >> >> > I wasn't really thinking of this use case, but you could set > > >> >> > wal_keep_segments "high enough". > > >> >> > > >> >> Ah. ?Okay, that seems like a workable approach, at least for people with > > >> >> reasonably predictable WAL loads. ?We could certainly improve on it > > >> >> later to make it more bulletproof, but it's usable now --- if we relax > > >> >> the error checks. > > >> >> > > >> >> (wal_keep_segments can be changed without restarting, right?) > > >> > > > >> > Should we allow -1 to mean "keep all segments"? > > >> > > >> If that's what you want to do, use archive_mode. > > > > > > Uh, I assume that will require me to store the WAL files somewhere else, > > > rather than keeping them in /pg_xlog, which I thought was the goal. ?Am > > > I missing something? > > > > Well, one of us is. Why would you want to retain all of your WAL logs > > in pg_xlog forever? > > Well, this email thread mentioned a case where you needed to increase > wal_keep_segments to a sufficiently-high value, and of course figuring > out such a value is harder than just having a way of turning off > recycling with -1. I think the only sensible setting is "as big as my (available) disk space". Any higher and you're going to crash, any lower and you'll invalidate your backup for no reason. -1 emulates current behaviour, BTW Still think we should rename it, in which case 0 is same as "no maximum". -- Simon Riggs www.2ndQuadrant.com
Robert Haas wrote: > On Fri, Apr 30, 2010 at 1:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> On Fri, 2010-04-30 at 12:22 -0400, Bruce Momjian wrote: >>>> (wal_keep_segments can be changed without restarting, right?) >>> Should we allow -1 to mean "keep all segments"? >> Why is that not called "max_wal_segments"? wal_keep_segments sounds like >> its been through Google translate. > > Because it's not a maximum? Yeah, min_wal_segments or something would make sense. It sounds about as good or bad as wal_keep_segments to me. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Bruce Momjian wrote: > Tom Lane wrote: >> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: >>> Tom Lane wrote: >>>> If you aren't archiving then there's no guarantee that you'll still have >>>> a continuous WAL series starting from the start of the backup. >>> I wasn't really thinking of this use case, but you could set >>> wal_keep_segments "high enough". >> Ah. Okay, that seems like a workable approach, at least for people with >> reasonably predictable WAL loads. We could certainly improve on it >> later to make it more bulletproof, but it's usable now --- if we relax >> the error checks. >> >> (wal_keep_segments can be changed without restarting, right?) > > Should we allow -1 to mean "keep all segments"? Umm, you can't keep all segments around forever, can you? Surely you have to recycle them sooner or later or you will run out of disk space. I guess you could move that responsibility to a user-written script, but we haven't traditionally encouraged or supported people to mess with the contents of pg_xlog. That would require some more thinking IMHO, not 9.0 material. In practice, you can just set wal_keep_segments to some ridiculously high value to achieve the same result. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Yeah, min_wal_segments or something would make sense. Surely it would confuse people to see they have fewer than min_wal_segments WAL segments. -Kevin
Heikki Linnakangas wrote: > Robert Haas wrote: > > On Fri, Apr 30, 2010 at 1:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > >> On Fri, 2010-04-30 at 12:22 -0400, Bruce Momjian wrote: > >>>> (wal_keep_segments can be changed without restarting, right?) > >>> Should we allow -1 to mean "keep all segments"? > >> Why is that not called "max_wal_segments"? wal_keep_segments sounds like > >> its been through Google translate. > > > > Because it's not a maximum? > > Yeah, min_wal_segments or something would make sense. It sounds about as > good or bad as wal_keep_segments to me. I admit I never liked "keep" but couldn't think of better wording. I do like the proposed wording better. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Apr 30, 2010 at 1:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> Why is that not called "max_wal_segments"? wal_keep_segments sounds like
>> its been through Google translate.
> Because it's not a maximum?
Indeed.  It would really be more like min_wal_segments, if we wanted to
name it that way.
        regards, tom lane
			
		Heikki Linnakangas wrote: > Bruce Momjian wrote: > > Tom Lane wrote: > >> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > >>> Tom Lane wrote: > >>>> If you aren't archiving then there's no guarantee that you'll still have > >>>> a continuous WAL series starting from the start of the backup. > >>> I wasn't really thinking of this use case, but you could set > >>> wal_keep_segments "high enough". > >> Ah. Okay, that seems like a workable approach, at least for people with > >> reasonably predictable WAL loads. We could certainly improve on it > >> later to make it more bulletproof, but it's usable now --- if we relax > >> the error checks. > >> > >> (wal_keep_segments can be changed without restarting, right?) > > > > Should we allow -1 to mean "keep all segments"? > > Umm, you can't keep all segments around forever, can you? Surely you > have to recycle them sooner or later or you will run out of disk space. > > I guess you could move that responsibility to a user-written script, but > we haven't traditionally encouraged or supported people to mess with the > contents of pg_xlog. That would require some more thinking IMHO, not 9.0 > material. > > In practice, you can just set wal_keep_segments to some ridiculously > high value to achieve the same result. Which is where my 'wal_keep_segments = -1' idea came from. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
On Fri, Apr 30, 2010 at 2:08 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Fri, 2010-04-30 at 13:52 -0400, Robert Haas wrote: >> On Fri, Apr 30, 2010 at 1:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> > On Fri, 2010-04-30 at 12:22 -0400, Bruce Momjian wrote: >> >> > >> >> > (wal_keep_segments can be changed without restarting, right?) >> >> >> >> Should we allow -1 to mean "keep all segments"? >> > >> > Why is that not called "max_wal_segments"? wal_keep_segments sounds like >> > its been through Google translate. >> >> Because it's not a maximum? > > I see the thinking, but why would you ever set it to be something that > is *less* than the existing numbers? That would be pointless and indeed, > does nothing. The only time you touch it at all is when you set it to be > a value higher than the number of files that would normally be kept, and > when that is the case it *will* be the maximum. > > So I say, max_wal_segments = 0 (default) meaning no limit, we just > rotate as needed. We put a comment in the docs to say that if a value is > selected less than 2*checkpoint_segments+1 then the value is overridden. As you were quick to point out to me earlier this week, I am not an expert on our write-ahead logging system; however, I think you are mistaken. Perhaps Heikki could speak to the point more definitively, but I believe that the number of segments that the system retains for WAL archiving or crash recovery is variable. The purpose of this variable is to put a floor under the number of segments that are retained so that SR slaves can catch up if they fall behind. Of course, if archiving is configured, they can do that anyway using restore_command, but you might be running SR without archiving, or you might just want to set this to a small value so that the slaves don't have to keep switching between SR and archive recovery if segments get archived or checkpointed away at inconvenient times. It doesn't make a whole lot of sense to set the floor on the number of segments retained to positive infinity, except in one specific case: archiving is disabled, and you're trying to hang on to enough segments in pg_xlog to take a hot backup. As Tom said, it would be nice to have a more elegant solution to that problem, but we can do that in a future release; it's not really the primary purpose of wal_keep_segments, anyway. It certainly would not be a good idea to make the default configuration "retain all WAL forever". If you did that, a user who sets up PostgreSQL and is not using SR or HS or hot backups will eventually and inevitably fill up their hard disk. ...Robert
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Bruce Momjian wrote:
>> Should we allow -1 to mean "keep all segments"?
> Umm, you can't keep all segments around forever, can you? Surely you
> have to recycle them sooner or later or you will run out of disk space.
You couldn't use that as a permanent setting, but it can make sense
as a transient setting, rather than having to guess how much WAL you'll
need to keep while setting up a new standby.
> In practice, you can just set wal_keep_segments to some ridiculously
> high value to achieve the same result.
True.
        regards, tom lane
			
		Bruce Momjian escribió: > Which is where my 'wal_keep_segments = -1' idea came from. Are you suggesting that -1 should mean "keep all segments that fit on disk, but if creating a new segment fails with ENOSPC, recycle the oldest one"? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Michael Tharp wrote: > On 04/30/2010 01:53 PM, Robert Haas wrote: >> >> Well, one of us is. Why would you want to retain all of your WAL logs >> in pg_xlog forever? > > To create or re-synchronize SR slaves, one could change > wal_keep_segments to -1, run a backup, wait for the slaves to catch up, > and change it back to the default. This way no segments would be deleted > until the system has reached a stable state. A slave can fall behind at any time, though. You would have to know to set wal_keep_segments to -1 before that happens. I've been thinking that in the future (read 9.1 or above), we would have a system for registering slaves in the primary server. The primary would keep track of how far each slave is, and refrain from removing WAL segments that it knows to be still needed by a slave. On the flip-side, the master wouldn't need to keep WAL around that it knows is no longer needed by any slaves. If someone has the energy, it would be possible to write a stand-alone application to do that too. It could serve old WAL files from the archive and rely recent ones from the real master. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Bruce Momjian escribi�:
>> Which is where my 'wal_keep_segments = -1' idea came from.
> Are you suggesting that -1 should mean "keep all segments that fit on
> disk, but if creating a new segment fails with ENOSPC, recycle the
> oldest one"?
No, keep means keep.  Even if there were some arguable use for "keep if
you can", a scheme like that would render the machine unusable ---
everything else on the same filesystem would be falling over.
        regards, tom lane
			
		On Fri, 2010-04-30 at 14:42 -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Fri, Apr 30, 2010 at 1:44 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > >> Why is that not called "max_wal_segments"? wal_keep_segments sounds like > >> its been through Google translate. > > > Because it's not a maximum? > > Indeed. It would really be more like min_wal_segments, if we wanted to > name it that way. Yeh, agreed: min_wal_segments. I realised while having dinner it was the opposite, so I'm pleased everybody else got there at same time. -- Simon Riggs www.2ndQuadrant.com
Kevin Grittner wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > >> Yeah, min_wal_segments or something would make sense. > > Surely it would confuse people to see they have fewer than > min_wal_segments WAL segments. Umm, they wouldn't see that, that's the point of the setting. The segments are not removed/recycled until there is min_wal_segments segments in pg_xlog. Except in the beginning when you set or increase the setting, when there isn't that many segments generated yet. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Kevin Grittner wrote: >> Surely it would confuse people to see they have fewer than >> min_wal_segments WAL segments. > > they wouldn't see that, that's the point of the setting. I was thinking, in particular, about beginners poking around to see how things look after an initdb. Perhaps that state is too transient to matter, but it struck me that you'd have fewer than the minimum at the precise time a beginner might be likely to take a look. Unless on startup (and reload?) we created min_wal_segments WAL segments if they didn't already exist. -Kevin
On Fri, 2010-04-30 at 13:41 -0500, Kevin Grittner wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > > > Yeah, min_wal_segments or something would make sense. > > Surely it would confuse people to see they have fewer than > min_wal_segments WAL segments. That does sound like a reasonable argument, though it also applies to wal_keep_segments, so isn't an argument either way. The user will be equally confused to see fewer WAL files than they have asked to "keep". min_wal_segments is much clearer, IMHO. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs <simon@2ndQuadrant.com> wrote: > On Fri, 2010-04-30 at 13:41 -0500, Kevin Grittner wrote: >> Surely it would confuse people to see they have fewer than >> min_wal_segments WAL segments. > > That does sound like a reasonable argument, though it also applies > to wal_keep_segments, so isn't an argument either way. The user > will be equally confused to see fewer WAL files than they have > asked to "keep". The definitions of "keep" in my dictionary include "to restrain from removal" and "to retain in one's possession". It defines "minimum" as "the least quantity assignable, admissible, or possible". If I'm understanding the semantics of this GUC (which I'll grant is not a sure thing), "keep" does a better job of conveying the meaning, since fewer than that are initially possible, but at least that many will be *kept* once they exist. I'm sure I'll figure it out at need, but the assertions that "minimum" more clearly defines the purpose is shaking *my* confidence that I understand what the GUC is for. -Kevin
On Mon, May 3, 2010 at 2:54 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Simon Riggs <simon@2ndQuadrant.com> wrote: >> On Fri, 2010-04-30 at 13:41 -0500, Kevin Grittner wrote: > >>> Surely it would confuse people to see they have fewer than >>> min_wal_segments WAL segments. >> >> That does sound like a reasonable argument, though it also applies >> to wal_keep_segments, so isn't an argument either way. The user >> will be equally confused to see fewer WAL files than they have >> asked to "keep". > > The definitions of "keep" in my dictionary include "to restrain from > removal" and "to retain in one's possession". It defines "minimum" > as "the least quantity assignable, admissible, or possible". It's really both of those things, so we could call it wal_min_keep_segments, but I think an even better name would be bikeshed_segments. ...Robert
> It's really both of those things, so we could call it
> wal_min_keep_segments, but I think an even better name would be
> bikeshed_segments.
Speaking from my UI perspective, I don't think users will care what we
call it.
--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 
			
		Bruce Momjian wrote: > Simon Riggs wrote: > > On Fri, 2010-04-30 at 12:22 -0400, Bruce Momjian wrote: > > > > > > > > (wal_keep_segments can be changed without restarting, right?) > > > > > > Should we allow -1 to mean "keep all segments"? > > > > Why is that not called "max_wal_segments"? wal_keep_segments sounds like > > its been through Google translate. > > LOL, good one. > > I assume it was done so it would start with 'wal', but I see > 'max_wal_senders', which doesn't start with 'wal' and would match your > suggestion exactly. I think we should either rename 'wal_keep_segments' > or 'max_wal_senders'. Uh, did we decide that 'wal_keep_segments' was the best name for this GUC setting? I know we shipped beta1 using that name. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com
Bruce Momjian <bruce@momjian.us> writes:
> Uh, did we decide that 'wal_keep_segments' was the best name for this
> GUC setting?  I know we shipped beta1 using that name.
I thought min_wal_segments was a reasonable proposal, but it wasn't
clear if there was consensus or not.
        regards, tom lane
			
		On Sat, May 8, 2010 at 10:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Bruce Momjian <bruce@momjian.us> writes: >> Uh, did we decide that 'wal_keep_segments' was the best name for this >> GUC setting? I know we shipped beta1 using that name. > > I thought min_wal_segments was a reasonable proposal, but it wasn't > clear if there was consensus or not. I think most people thought it was another reasonable choice, but I think the consensus position is probably something like "it's about the same" rather than "it's definitely better". We had one or two people with stronger opinions than that on either side, I believe. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Sat, 2010-05-08 at 23:55 -0400, Robert Haas wrote: > On Sat, May 8, 2010 at 10:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Bruce Momjian <bruce@momjian.us> writes: > >> Uh, did we decide that 'wal_keep_segments' was the best name for this > >> GUC setting? I know we shipped beta1 using that name. > > > > I thought min_wal_segments was a reasonable proposal, but it wasn't > > clear if there was consensus or not. > > I think most people thought it was another reasonable choice, but I > think the consensus position is probably something like "it's about > the same" rather than "it's definitely better". We had one or two > people with stronger opinions than that on either side, I believe. It's only a name and not worth a long discussion on. -- Simon Riggs www.2ndQuadrant.com
Robert Haas wrote: > On Sat, May 8, 2010 at 10:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Bruce Momjian <bruce@momjian.us> writes: > >> Uh, did we decide that 'wal_keep_segments' was the best name for this > >> GUC setting? ?I know we shipped beta1 using that name. > > > > I thought min_wal_segments was a reasonable proposal, but it wasn't > > clear if there was consensus or not. > > I think most people thought it was another reasonable choice, but I > think the consensus position is probably something like "it's about > the same" rather than "it's definitely better". We had one or two > people with stronger opinions than that on either side, I believe. Agreed the current name seems OK. However, was there agreement that wal_keep_segments = -1 should keep all WAL segements? I can see that as useful for cases where you are doing a dump to be transfered to the slave, and not using archive_command. This avoids the need for the "set a huge value" solution. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. +
Bruce Momjian wrote: > Robert Haas wrote: > > On Sat, May 8, 2010 at 10:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > Bruce Momjian <bruce@momjian.us> writes: > > >> Uh, did we decide that 'wal_keep_segments' was the best name for this > > >> GUC setting? ?I know we shipped beta1 using that name. > > > > > > I thought min_wal_segments was a reasonable proposal, but it wasn't > > > clear if there was consensus or not. > > > > I think most people thought it was another reasonable choice, but I > > think the consensus position is probably something like "it's about > > the same" rather than "it's definitely better". We had one or two > > people with stronger opinions than that on either side, I believe. > > Agreed the current name seems OK. However, was there agreement that > wal_keep_segments = -1 should keep all WAL segements? I can see that as > useful for cases where you are doing a dump to be transfered to the > slave, and not using archive_command. This avoids the need for the "set > a huge value" solution. The attached patch allows wal_keep_segments = -1 to keep all segements; this is particularly useful for taking a base backup, where you need all the WAL files during startup of the standby. I have documented this usage in the patch as well. I am thinking of applying this after 9.0 beta2 if there is no objection. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. + Index: doc/src/sgml/config.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v retrieving revision 1.280 diff -c -c -r1.280 config.sgml *** doc/src/sgml/config.sgml 31 May 2010 15:50:48 -0000 1.280 --- doc/src/sgml/config.sgml 2 Jun 2010 19:19:18 -0000 *************** *** 1887,1893 **** Specifies the number of past log file segments kept in the <filename>pg_xlog</> directory, in case a standby server needs to fetch them for streaming ! replication. Each segment is normally 16 megabytes. If a standby server connected to the primary falls behind by more than <varname>wal_keep_segments</> segments, the primary might remove a WAL segment still needed by the standby, in which case the --- 1887,1893 ---- Specifies the number of past log file segments kept in the <filename>pg_xlog</> directory, in case a standby server needs to fetch them for streaming ! replication. Each segment is normally 16 megabytes. If a standby server connected to the primary falls behind by more than <varname>wal_keep_segments</> segments, the primary might remove a WAL segment still needed by the standby, in which case the *************** *** 1901,1908 **** is zero (the default), the system doesn't keep any extra segments for standby purposes, and the number of old WAL segments available for standbys is determined based only on the location of the previous ! checkpoint and status of WAL archiving. ! This parameter can only be set in the <filename>postgresql.conf</> file or on the server command line. </para> </listitem> --- 1901,1909 ---- is zero (the default), the system doesn't keep any extra segments for standby purposes, and the number of old WAL segments available for standbys is determined based only on the location of the previous ! checkpoint and status of WAL archiving. If <literal>-1</> is ! specified, log file segments are kept indefinitely. This ! parameter can only be set in the <filename>postgresql.conf</> file or on the server command line. </para> </listitem> Index: doc/src/sgml/high-availability.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/high-availability.sgml,v retrieving revision 1.70 diff -c -c -r1.70 high-availability.sgml *** doc/src/sgml/high-availability.sgml 29 May 2010 09:01:10 -0000 1.70 --- doc/src/sgml/high-availability.sgml 2 Jun 2010 19:19:19 -0000 *************** *** 750,756 **** If you use streaming replication without file-based continuous archiving, you have to set <varname>wal_keep_segments</> in the master to a value high enough to ensure that old WAL segments are not recycled ! too early, while the standby might still need them to catch up. If the standby falls behind too much, it needs to be reinitialized from a new base backup. If you set up a WAL archive that's accessible from the standby, wal_keep_segments is not required as the standby can always --- 750,760 ---- If you use streaming replication without file-based continuous archiving, you have to set <varname>wal_keep_segments</> in the master to a value high enough to ensure that old WAL segments are not recycled ! too early, while the standby might still need them to catch up. This ! is particularly important when performing a base backup because the ! standby will need all WAL segments generated since the start of the ! backup; consider setting <varname>wal_keep_segments</> to ! <literal>-1</> temporarily in such cases. If the standby falls behind too much, it needs to be reinitialized from a new base backup. If you set up a WAL archive that's accessible from the standby, wal_keep_segments is not required as the standby can always Index: src/backend/access/transam/xlog.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v retrieving revision 1.414 diff -c -c -r1.414 xlog.c *** src/backend/access/transam/xlog.c 27 May 2010 00:38:39 -0000 1.414 --- src/backend/access/transam/xlog.c 2 Jun 2010 19:19:20 -0000 *************** *** 7339,7345 **** * Delete old log files (those no longer needed even for previous * checkpoint or the standbys in XLOG streaming). */ ! if (_logId || _logSeg) { /* * Calculate the last segment that we need to retain because of --- 7339,7345 ---- * Delete old log files (those no longer needed even for previous * checkpoint or the standbys in XLOG streaming). */ ! if ((_logId || _logSeg) && wal_keep_segments != -1) { /* * Calculate the last segment that we need to retain because of Index: src/backend/utils/misc/guc.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/misc/guc.c,v retrieving revision 1.554 diff -c -c -r1.554 guc.c *** src/backend/utils/misc/guc.c 2 May 2010 02:10:33 -0000 1.554 --- src/backend/utils/misc/guc.c 2 Jun 2010 19:19:22 -0000 *************** *** 1661,1667 **** NULL }, &wal_keep_segments, ! 0, 0, INT_MAX, NULL, NULL }, { --- 1661,1667 ---- NULL }, &wal_keep_segments, ! 0, -1, INT_MAX, NULL, NULL }, {
On Wed, Jun 2, 2010 at 3:20 PM, Bruce Momjian <bruce@momjian.us> wrote: > Bruce Momjian wrote: >> Robert Haas wrote: >> > On Sat, May 8, 2010 at 10:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> > > Bruce Momjian <bruce@momjian.us> writes: >> > >> Uh, did we decide that 'wal_keep_segments' was the best name for this >> > >> GUC setting? ?I know we shipped beta1 using that name. >> > > >> > > I thought min_wal_segments was a reasonable proposal, but it wasn't >> > > clear if there was consensus or not. >> > >> > I think most people thought it was another reasonable choice, but I >> > think the consensus position is probably something like "it's about >> > the same" rather than "it's definitely better". We had one or two >> > people with stronger opinions than that on either side, I believe. >> >> Agreed the current name seems OK. However, was there agreement that >> wal_keep_segments = -1 should keep all WAL segements? I can see that as >> useful for cases where you are doing a dump to be transfered to the >> slave, and not using archive_command. This avoids the need for the "set >> a huge value" solution. > > The attached patch allows wal_keep_segments = -1 to keep all segements; > this is particularly useful for taking a base backup, where you need all > the WAL files during startup of the standby. I have documented this > usage in the patch as well. > > I am thinking of applying this after 9.0 beta2 if there is no objection. +1 for the patch, but why wait until after beta2? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Wed, 2010-06-02 at 15:20 -0400, Bruce Momjian wrote: > The attached patch allows wal_keep_segments = -1 to keep all segements; > this is particularly useful for taking a base backup, where you need all > the WAL files during startup of the standby. I have documented this > usage in the patch as well. > > I am thinking of applying this after 9.0 beta2 if there is no objection. It's not clear to me why "keep all files until server breaks" is a good setting. Surely you would set this parameter to the size of your disk. Why allow it to go higher? -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > On Wed, 2010-06-02 at 15:20 -0400, Bruce Momjian wrote: > > > The attached patch allows wal_keep_segments = -1 to keep all segements; > > this is particularly useful for taking a base backup, where you need all > > the WAL files during startup of the standby. I have documented this > > usage in the patch as well. > > > > I am thinking of applying this after 9.0 beta2 if there is no objection. > > It's not clear to me why "keep all files until server breaks" is a good > setting. Surely you would set this parameter to the size of your disk. > Why allow it to go higher? Well, the -1 allows them to set it temporarily without having to compute their free disk space. Frankly, because the disk space varies, it is impossible to know exactly how large the disk is at the time it would fill up. I think the normal computation would be: 1) How long is my file system backup and restore to standby going to take2) How often do I generate a 16MB WAL file You would do some computation to figure that out, then maybe multiply it by 10x and set that for wal_keep_segments. I figured allowing a simple -1 would be easier. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. +
Robert Haas wrote: > > The attached patch allows wal_keep_segments = -1 to keep all segements; > > this is particularly useful for taking a base backup, where you need all > > the WAL files during startup of the standby. ?I have documented this > > usage in the patch as well. > > > > I am thinking of applying this after 9.0 beta2 if there is no objection. > > +1 for the patch, but why wait until after beta2? I wanted to give people enough time to review/discuss it. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. +
On Wed, 2010-06-02 at 20:28 -0400, Bruce Momjian wrote: > Simon Riggs wrote: > > On Wed, 2010-06-02 at 15:20 -0400, Bruce Momjian wrote: > > > > > The attached patch allows wal_keep_segments = -1 to keep all segements; > > > this is particularly useful for taking a base backup, where you need all > > > the WAL files during startup of the standby. I have documented this > > > usage in the patch as well. > > > > > > I am thinking of applying this after 9.0 beta2 if there is no objection. > > > > It's not clear to me why "keep all files until server breaks" is a good > > setting. Surely you would set this parameter to the size of your disk. > > Why allow it to go higher? > > Well, the -1 allows them to set it temporarily without having to compute > their free disk space. Frankly, because the disk space varies, it is > impossible to know exactly how large the disk is at the time it would > fill up. > > I think the normal computation would be: > > 1) How long is my file system backup and restore to standby > going to take > 2) How often do I generate a 16MB WAL file > > You would do some computation to figure that out, then maybe multiply it > by 10x and set that for wal_keep_segments. I figured allowing a simple > -1 would be easier. I think its much easier to find out your free disk space than it is to calculate how much WAL might be generated during backup. Disk space doesn't vary significantly on a production database. If we encourage that laziness then we will get reports that replication doesn't work and Postgres crashes. -- Simon Riggs www.2ndQuadrant.com
Simon Riggs wrote: > On Wed, 2010-06-02 at 20:28 -0400, Bruce Momjian wrote: > > Simon Riggs wrote: > > > On Wed, 2010-06-02 at 15:20 -0400, Bruce Momjian wrote: > > > > > > > The attached patch allows wal_keep_segments = -1 to keep all segements; > > > > this is particularly useful for taking a base backup, where you need all > > > > the WAL files during startup of the standby. I have documented this > > > > usage in the patch as well. > > > > > > > > I am thinking of applying this after 9.0 beta2 if there is no objection. > > > > > > It's not clear to me why "keep all files until server breaks" is a good > > > setting. Surely you would set this parameter to the size of your disk. > > > Why allow it to go higher? > > > > Well, the -1 allows them to set it temporarily without having to compute > > their free disk space. Frankly, because the disk space varies, it is > > impossible to know exactly how large the disk is at the time it would > > fill up. > > > > I think the normal computation would be: > > > > 1) How long is my file system backup and restore to standby > > going to take > > 2) How often do I generate a 16MB WAL file > > > > You would do some computation to figure that out, then maybe multiply it > > by 10x and set that for wal_keep_segments. I figured allowing a simple > > -1 would be easier. > > I think its much easier to find out your free disk space than it is to > calculate how much WAL might be generated during backup. Disk space > doesn't vary significantly on a production database. > > If we encourage that laziness then we will get reports that replication > doesn't work and Postgres crashes. Well, we don't clean out the archive directory so I don't see this as anything new. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. +
On 03/06/10 15:15, Bruce Momjian wrote: > Simon Riggs wrote: >> I think its much easier to find out your free disk space than it is to >> calculate how much WAL might be generated during backup. Disk space >> doesn't vary significantly on a production database. >> >> If we encourage that laziness then we will get reports that replication >> doesn't work and Postgres crashes. > > Well, we don't clean out the archive directory so I don't see this as > anything new. We leave that up to the DBA to clean out one way or another. We provide restartpoint_command and the %r option in restore_command to help with that. Surely we don't expect DBAs to delete old files in pg_xlog? I agree with Simon here, I think it would be better to not provide -1 as an option here. At least you better document well that you should only do that temporarily or you will eventually run out of disk space. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > On 03/06/10 15:15, Bruce Momjian wrote: > > Simon Riggs wrote: > >> I think its much easier to find out your free disk space than it is to > >> calculate how much WAL might be generated during backup. Disk space > >> doesn't vary significantly on a production database. > >> > >> If we encourage that laziness then we will get reports that replication > >> doesn't work and Postgres crashes. > > > > Well, we don't clean out the archive directory so I don't see this as > > anything new. > > We leave that up to the DBA to clean out one way or another. We provide > restartpoint_command and the %r option in restore_command to help with that. > > Surely we don't expect DBAs to delete old files in pg_xlog? I agree with > Simon here, I think it would be better to not provide -1 as an option > here. At least you better document well that you should only do that > temporarily or you will eventually run out of disk space. Using this only temporarily is mentioned in the doc patch. Do I need more? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. + Index: doc/src/sgml/config.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v retrieving revision 1.280 diff -c -c -r1.280 config.sgml *** doc/src/sgml/config.sgml 31 May 2010 15:50:48 -0000 1.280 --- doc/src/sgml/config.sgml 2 Jun 2010 19:19:18 -0000 *************** *** 1887,1893 **** Specifies the number of past log file segments kept in the <filename>pg_xlog</> directory, in case a standby server needs to fetch them for streaming ! replication. Each segment is normally 16 megabytes. If a standby server connected to the primary falls behind by more than <varname>wal_keep_segments</> segments, the primary might remove a WAL segment still needed by the standby, in which case the --- 1887,1893 ---- Specifies the number of past log file segments kept in the <filename>pg_xlog</> directory, in case a standby server needs to fetch them for streaming ! replication. Each segment is normally 16 megabytes. If a standby server connected to the primary falls behind by more than <varname>wal_keep_segments</> segments, the primary might remove a WAL segment still needed by the standby, in which case the *************** *** 1901,1908 **** is zero (the default), the system doesn't keep any extra segments for standby purposes, and the number of old WAL segments available for standbys is determined based only on the location of the previous ! checkpoint and status of WAL archiving. ! This parameter can only be set in the <filename>postgresql.conf</> file or on the server command line. </para> </listitem> --- 1901,1909 ---- is zero (the default), the system doesn't keep any extra segments for standby purposes, and the number of old WAL segments available for standbys is determined based only on the location of the previous ! checkpoint and status of WAL archiving. If <literal>-1</> is ! specified, log file segments are kept indefinitely. This ! parameter can only be set in the <filename>postgresql.conf</> file or on the server command line. </para> </listitem> Index: doc/src/sgml/high-availability.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/high-availability.sgml,v retrieving revision 1.70 diff -c -c -r1.70 high-availability.sgml *** doc/src/sgml/high-availability.sgml 29 May 2010 09:01:10 -0000 1.70 --- doc/src/sgml/high-availability.sgml 2 Jun 2010 19:19:19 -0000 *************** *** 750,756 **** If you use streaming replication without file-based continuous archiving, you have to set <varname>wal_keep_segments</> in the master to a value high enough to ensure that old WAL segments are not recycled ! too early, while the standby might still need them to catch up. If the standby falls behind too much, it needs to be reinitialized from a new base backup. If you set up a WAL archive that's accessible from the standby, wal_keep_segments is not required as the standby can always --- 750,760 ---- If you use streaming replication without file-based continuous archiving, you have to set <varname>wal_keep_segments</> in the master to a value high enough to ensure that old WAL segments are not recycled ! too early, while the standby might still need them to catch up. This ! is particularly important when performing a base backup because the ! standby will need all WAL segments generated since the start of the ! backup; consider setting <varname>wal_keep_segments</> to ! <literal>-1</> temporarily in such cases. If the standby falls behind too much, it needs to be reinitialized from a new base backup. If you set up a WAL archive that's accessible from the standby, wal_keep_segments is not required as the standby can always Index: src/backend/access/transam/xlog.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v retrieving revision 1.414 diff -c -c -r1.414 xlog.c *** src/backend/access/transam/xlog.c 27 May 2010 00:38:39 -0000 1.414 --- src/backend/access/transam/xlog.c 2 Jun 2010 19:19:20 -0000 *************** *** 7339,7345 **** * Delete old log files (those no longer needed even for previous * checkpoint or the standbys in XLOG streaming). */ ! if (_logId || _logSeg) { /* * Calculate the last segment that we need to retain because of --- 7339,7345 ---- * Delete old log files (those no longer needed even for previous * checkpoint or the standbys in XLOG streaming). */ ! if ((_logId || _logSeg) && wal_keep_segments != -1) { /* * Calculate the last segment that we need to retain because of Index: src/backend/utils/misc/guc.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/misc/guc.c,v retrieving revision 1.554 diff -c -c -r1.554 guc.c *** src/backend/utils/misc/guc.c 2 May 2010 02:10:33 -0000 1.554 --- src/backend/utils/misc/guc.c 2 Jun 2010 19:19:22 -0000 *************** *** 1661,1667 **** NULL }, &wal_keep_segments, ! 0, 0, INT_MAX, NULL, NULL }, { --- 1661,1667 ---- NULL }, &wal_keep_segments, ! 0, -1, INT_MAX, NULL, NULL }, {
Heikki Linnakangas wrote: > On 03/06/10 15:15, Bruce Momjian wrote: > > Simon Riggs wrote: > >> I think its much easier to find out your free disk space than it is to > >> calculate how much WAL might be generated during backup. Disk space > >> doesn't vary significantly on a production database. > >> > >> If we encourage that laziness then we will get reports that replication > >> doesn't work and Postgres crashes. > > > > Well, we don't clean out the archive directory so I don't see this as > > anything new. > > We leave that up to the DBA to clean out one way or another. We provide > restartpoint_command and the %r option in restore_command to help with that. > > Surely we don't expect DBAs to delete old files in pg_xlog? I agree with > Simon here, I think it would be better to not provide -1 as an option > here. At least you better document well that you should only do that > temporarily or you will eventually run out of disk space. I have updated the doc text to mention "temporarily" everywhere '-1' is mentioned. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. + Index: doc/src/sgml/config.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/config.sgml,v retrieving revision 1.280 diff -c -c -r1.280 config.sgml *** doc/src/sgml/config.sgml 31 May 2010 15:50:48 -0000 1.280 --- doc/src/sgml/config.sgml 3 Jun 2010 14:05:21 -0000 *************** *** 1901,1908 **** is zero (the default), the system doesn't keep any extra segments for standby purposes, and the number of old WAL segments available for standbys is determined based only on the location of the previous ! checkpoint and status of WAL archiving. ! This parameter can only be set in the <filename>postgresql.conf</> file or on the server command line. </para> </listitem> --- 1901,1909 ---- is zero (the default), the system doesn't keep any extra segments for standby purposes, and the number of old WAL segments available for standbys is determined based only on the location of the previous ! checkpoint and status of WAL archiving. To temporarily keep ! all log file segments, use the value <literal>-1</>. This ! parameter can only be set in the <filename>postgresql.conf</> file or on the server command line. </para> </listitem> Index: doc/src/sgml/high-availability.sgml =================================================================== RCS file: /cvsroot/pgsql/doc/src/sgml/high-availability.sgml,v retrieving revision 1.70 diff -c -c -r1.70 high-availability.sgml *** doc/src/sgml/high-availability.sgml 29 May 2010 09:01:10 -0000 1.70 --- doc/src/sgml/high-availability.sgml 3 Jun 2010 14:05:21 -0000 *************** *** 750,756 **** If you use streaming replication without file-based continuous archiving, you have to set <varname>wal_keep_segments</> in the master to a value high enough to ensure that old WAL segments are not recycled ! too early, while the standby might still need them to catch up. If the standby falls behind too much, it needs to be reinitialized from a new base backup. If you set up a WAL archive that's accessible from the standby, wal_keep_segments is not required as the standby can always --- 750,760 ---- If you use streaming replication without file-based continuous archiving, you have to set <varname>wal_keep_segments</> in the master to a value high enough to ensure that old WAL segments are not recycled ! too early, while the standby might still need them to catch up. This ! is particularly important when performing a base backup because the ! standby will need all WAL segments generated since the start of the ! backup; consider setting <varname>wal_keep_segments</> to ! <literal>-1</> temporarily in such cases. If the standby falls behind too much, it needs to be reinitialized from a new base backup. If you set up a WAL archive that's accessible from the standby, wal_keep_segments is not required as the standby can always Index: src/backend/access/transam/xlog.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v retrieving revision 1.415 diff -c -c -r1.415 xlog.c *** src/backend/access/transam/xlog.c 2 Jun 2010 09:28:44 -0000 1.415 --- src/backend/access/transam/xlog.c 3 Jun 2010 14:05:22 -0000 *************** *** 7337,7343 **** * Delete old log files (those no longer needed even for previous * checkpoint or the standbys in XLOG streaming). */ ! if (_logId || _logSeg) { /* * Calculate the last segment that we need to retain because of --- 7337,7343 ---- * Delete old log files (those no longer needed even for previous * checkpoint or the standbys in XLOG streaming). */ ! if ((_logId || _logSeg) && wal_keep_segments != -1) { /* * Calculate the last segment that we need to retain because of Index: src/backend/utils/misc/guc.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/misc/guc.c,v retrieving revision 1.554 diff -c -c -r1.554 guc.c *** src/backend/utils/misc/guc.c 2 May 2010 02:10:33 -0000 1.554 --- src/backend/utils/misc/guc.c 3 Jun 2010 14:05:25 -0000 *************** *** 1661,1667 **** NULL }, &wal_keep_segments, ! 0, 0, INT_MAX, NULL, NULL }, { --- 1661,1667 ---- NULL }, &wal_keep_segments, ! 0, -1, INT_MAX, NULL, NULL }, {
Bruce Momjian <bruce@momjian.us> writes:
> Heikki Linnakangas wrote:
>> Surely we don't expect DBAs to delete old files in pg_xlog? I agree with 
>> Simon here, I think it would be better to not provide -1 as an option 
>> here. At least you better document well that you should only do that 
>> temporarily or you will eventually run out of disk space.
> I have updated the doc text to mention "temporarily" everywhere '-1' is
> mentioned.
FWIW, I've come to agree with Simon.  Allowing -1 doesn't do anything
that you can't do with a large positive setting, and what it does do
is to encourage people to set the variable to an unsafe value as a
substitute for thinking.
        regards, tom lane
			
		Excerpts from Bruce Momjian's message of jue jun 03 08:36:28 -0400 2010: > Using this only temporarily is mentioned in the doc patch. Do I need > more? Yeah, it's far too easy to miss. Besides, I think the wording you used is ambiguous -- it can be read as "the server will temporarily keep all segments if you set it to -1", which is not the same thing at all. If you can't add a 20-point-font red blinking warning with a pink dancing elephant in a tutu, maybe it's best to not offer the dangerous setting in the first place. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote: > Excerpts from Bruce Momjian's message of jue jun 03 08:36:28 -0400 2010: > > > Using this only temporarily is mentioned in the doc patch. Do I need > > more? > > Yeah, it's far too easy to miss. Besides, I think the wording you used > is ambiguous -- it can be read as "the server will temporarily keep all > segments if you set it to -1", which is not the same thing at all. If > you can't add a 20-point-font red blinking warning with a pink dancing > elephant in a tutu, maybe it's best to not offer the dangerous setting > in the first place. Well, it seems enough people don't want this features that I am not going to add it. If we decide we want it later, we can add it. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. +
Heikki Linnakangas wrote: > > We leave that up to the DBA to clean out one way or another. We > provide restartpoint_command and the %r option in restore_command to > help with that. > > I was in fact just looking into this, and I see that there is no example restartpoint_comand script given in the docs, nor in the wiki. A sample of such a command would be useful. This is all going to feel a bit strange to lots of users, and the more we can hold their hands the better off we and they will be. cheers andrew