Обсуждение: Postgres 8.4: archive_timeout vs. checkpoint_timeout
Hi folks, I want to refer to a question Rob did back in 2008 at http://archives.postgresql.org/pgsql-general/2008-07/msg01167.php as we are currently running into a similar question: We are using warm standby via PITR using a shared drive between master and slave node. Our setup currently is set to archive_timeout = 60s and checkpoint_timeout = 600s. We expected that now every minute a WAL-file is written to the share, but somehow we might misunderstood some part of the documentation as in periods with low traffic on database the interval between WAL files is >1min up to ten minutes. However, the goal was to have a WAL file every minute so disaster recovering can be done fast with a minimum of lost data. Question is: What did we miss? Do we need to put checkpoint_timeout also to 60s and does this makes sense at all? Cheers, Frank
On Thu, Oct 6, 2011 at 3:47 AM, Frank Lanitz <frank@frank.uvena.de> wrote:
The 8.4 docs lack this detail, but the 9.0 docs explain this. I don't believe it's a behavior change; I think it's just more clarification in the documents ( http://www.postgresql.org/docs/9.0/interactive/runtime-config-wal.html#GUC-ARCHIVE-TIMEOUT )
" When this parameter is greater than zero, the server will switch to a new segment file whenever this many seconds have elapsed since the last segment file switch, ***and there has been any database activity, including a single checkpoint.***" (emphasis mine)
Tom said something similar in the thread you referenced:
http://archives.postgresql.org/pgsql-general/2008-07/msg01166.php
"One possible connection is that an xlog file switch will not actually happen unless some xlog output has been generated since the last switch.
If you were watching an otherwise-idle system then maybe the checkpoint records are needed to make it look like a switch is needed. OTOH if
it's *that* idle then the checkpoints should be no-ops too."
 
If there was any data, it's existence in the transaction log would trigger the archive_timeout behavior. With no database activity, you aren't missing anything.
  
You are getting what you need (maximum 60s between data and the corresponding data being sent through archive_command), just not exactly what you thought you asked for.
Hi folks,
I want to refer to a question Rob did back in 2008 at
http://archives.postgresql.org/pgsql-general/2008-07/msg01167.php as we
are currently running into a similar question:
We are using warm standby via PITR using a shared drive between master
and slave node.
Our setup currently is set to archive_timeout = 60s and
checkpoint_timeout = 600s.
We expected that now every minute a WAL-file is written to the share,
but somehow we might misunderstood some part of the documentation as in
periods with low traffic on database the interval between WAL files is
>1min up to ten minutes.
The 8.4 docs lack this detail, but the 9.0 docs explain this. I don't believe it's a behavior change; I think it's just more clarification in the documents ( http://www.postgresql.org/docs/9.0/interactive/runtime-config-wal.html#GUC-ARCHIVE-TIMEOUT )
" When this parameter is greater than zero, the server will switch to a new segment file whenever this many seconds have elapsed since the last segment file switch, ***and there has been any database activity, including a single checkpoint.***" (emphasis mine)
Tom said something similar in the thread you referenced:
http://archives.postgresql.org/pgsql-general/2008-07/msg01166.php
"One possible connection is that an xlog file switch will not actually happen unless some xlog output has been generated since the last switch.
If you were watching an otherwise-idle system then maybe the checkpoint records are needed to make it look like a switch is needed. OTOH if
it's *that* idle then the checkpoints should be no-ops too."
However, the goal was to have a WAL file every minute so disaster
recovering can be done fast with a minimum of lost data.
If there was any data, it's existence in the transaction log would trigger the archive_timeout behavior. With no database activity, you aren't missing anything.
Question is: What did we miss? Do we need to put checkpoint_timeout also
to 60s and does this makes sense at all?
You are getting what you need (maximum 60s between data and the corresponding data being sent through archive_command), just not exactly what you thought you asked for.
If you absolutely must have a file every in order to sleep well, you can lower checkpoint_timeout. Keep in mind the cost of checkpoints.
Derrick
Hi, Thanks for your response. Am 07.10.2011 22:05, schrieb Derrick Rice: > On Thu, Oct 6, 2011 at 3:47 AM, Frank Lanitz <frank@frank.uvena.de > <mailto:frank@frank.uvena.de>> wrote: > > Hi folks, > > I want to refer to a question Rob did back in 2008 at > http://archives.postgresql.org/pgsql-general/2008-07/msg01167.php as we > are currently running into a similar question: > We are using warm standby via PITR using a shared drive between master > and slave node. > > Our setup currently is set to archive_timeout = 60s and > checkpoint_timeout = 600s. > > We expected that now every minute a WAL-file is written to the share, > but somehow we might misunderstood some part of the documentation as in > periods with low traffic on database the interval between WAL files is > >1min up to ten minutes. > > > The 8.4 docs lack this detail, but the 9.0 docs explain this. I don't > believe it's a behavior change; I think it's just more clarification in > the documents ( > http://www.postgresql.org/docs/9.0/interactive/runtime-config-wal.html#GUC-ARCHIVE-TIMEOUT > ) > > " When this parameter is greater than zero, the server will switch to a > new segment file whenever this many seconds have elapsed since the last > segment file switch, ***and there has been any database activity, > including a single checkpoint.***" (emphasis mine) > > Tom said something similar in the thread you referenced: > > http://archives.postgresql.org/pgsql-general/2008-07/msg01166.php > > "One possible connection is that an xlog file switch will not actually > happen unless some xlog output has been generated since the last switch. > If you were watching an otherwise-idle system then maybe the checkpoint > records are needed to make it look like a switch is needed. OTOH if > it's *that* idle then the checkpoints should be no-ops too." We are recognizing import failures on slave after we lower the archive_timeout below the checkpoint_timeout. Did I understand it correctly that these errors might get caused by this? > However, the goal was to have a WAL file every minute so disaster > recovering can be done fast with a minimum of lost data. > > > > If there was any data, it's existence in the transaction log would > trigger the archive_timeout behavior. With no database activity, you > aren't missing anything. > > > Question is: What did we miss? Do we need to put checkpoint_timeout also > to 60s and does this makes sense at all? > > > You are getting what you need (maximum 60s between data and the > corresponding data being sent through archive_command), just not exactly > what you thought you asked for. > > If you absolutely must have a file every in order to sleep well, you can > lower checkpoint_timeout. Keep in mind the cost of checkpoints. We will have to think about this. Cheers, Frank