Обсуждение: Re: [BUGS] Incomplete docs for restore_command for hot standby

Поиск
Список
Период
Сортировка

Re: [BUGS] Incomplete docs for restore_command for hot standby

От
"Markus Bertheau"
Дата:
2008/2/22, Simon Riggs <simon@2ndquadrant.com>:
> On Thu, 2008-02-21 at 08:01 +0600, Markus Bertheau wrote:
>  >
>  > Section 24.3.3.1 states about restore_command:
>  >
>  > "The command will be asked for file names that are not present in the
>  > archive; it must return nonzero when so asked."
>  >
>  > Section 24.4.1 further states:
>  >
>  > "The magic that makes the two loosely coupled servers work together is
>  > simply a restore_command used on the standby that waits for the next
>  > WAL file to become available from the primary."
>  >
>  > It is not clear from the first paragraph, whether the non-existing
>  > file that restore_command is being asked for is a not-yet-generated
>  > WAL file or something different. If it was a not-yet-generated WAL
>  > file, restore_command for replication would have to wait for it to
>  > appear. If it was something different, restore_command for replication
>  > would have to return an error right away. (Because else it would hang
>  > indefinitely, waiting for a file that is not going to appear). Yet I
>  > couldn't find hints in the documentation as to how these two cases can
>  > be detected by restore_command, i.e. how restore_command should tell a
>  > request for a WAL file from a request for a non-WAL file.
>
>
> The two sentences aren't mutually exclusive, especially when you
>  consider they are discussing two different use cases. Why not read up on
>  pg_standby anyway?

I read about pg_standby, but this is not about solving a particular problem but
about missing information in the docs.

>  > Practice (http://archives.postgresql.org/sydpug/2006-10/msg00001.php)
>  > shows that this is a problem, and people use unproved heuristics
>  > ('history' substring in the requested file name).
>
>
> Old email written during beta. Read at your own peril.

The email may be old, but the problem at hand is still relevant.

>  > Additionally, 24.3.3 contains slightly misleading information:
>  >
>  > "It is important that the command return nonzero exit status on
>  > failure. The command will be asked for log files that are not present
>  > in the archive; it must return nonzero when so asked. This is not an
>  > error condition."
>  >
>  > This suggests that all non-existing files that restore_command will be
>  > asked for are log files. One could therefore reasonably assume that
>  > restore_command for replication should wait on all non-existing files.
>  > 24.3.3.1 later corrects this by stating that not only log files may be
>  > requested, but nevertheless.
>
>
> If you have some suggested changes, I'd be happy to hear them.
>
>  Probably additions are better than just changes though.

What about this:

*** a/doc/src/sgml/backup.sgml
--- b/doc/src/sgml/backup.sgml
***************
*** 1001,1011 **** restore_command = 'cp /mnt/server/archivedir/%f %p'

     <para>
      It is important that the command return nonzero exit status on failure.
!     The command <emphasis>will</> be asked for log files that are not present
!     in the archive; it must return nonzero when so asked.  This is not an
!     error condition.  Be aware also that the base name of the <literal>%p</>
!     path will be different from <literal>%f</>; do not expect them to be
!     interchangeable.
     </para>

     <para>
--- 1001,1011 ----

     <para>
      It is important that the command return nonzero exit status on failure.
!     The command <emphasis>will</> be asked for log and other files that are
!     not present in the archive; it must return nonzero when so asked.  This is
!     not an error condition.  Be aware also that the base name of the
!     <literal>%p</> path will be different from <literal>%f</>; do not expect
!     them to be interchangeable.
     </para>

     <para>
***************
*** 1576,1594 **** archive_command = 'local_backup_script.sh'

     <para>
      The magic that makes the two loosely coupled servers work together is
!     simply a <varname>restore_command</> used on the standby that waits
!     for the next WAL file to become available from the primary. The
!     <varname>restore_command</> is specified in the
      <filename>recovery.conf</> file on the standby server. Normal recovery
      processing would request a file from the WAL archive, reporting failure
      if the file was unavailable.  For standby processing it is normal for
!     the next file to be unavailable, so we must be patient and wait for
!     it to appear. A waiting <varname>restore_command</> can be written as
!     a custom script that loops after polling for the existence of the next
!     WAL file. There must also be some way to trigger failover, which should
!     interrupt the <varname>restore_command</>, break the loop and return
!     a file-not-found error to the standby server. This ends recovery and
!     the standby will then come up as a normal server.
     </para>

     <para>
--- 1576,1596 ----

     <para>
      The magic that makes the two loosely coupled servers work together is
!     simply a <varname>restore_command</> used on the standby that, when asked
!     for the a WAL file, waits for it to become available from the primary.
!     The <varname>restore_command</> is specified in the
      <filename>recovery.conf</> file on the standby server. Normal recovery
      processing would request a file from the WAL archive, reporting failure
      if the file was unavailable.  For standby processing it is normal for
!     the next WAL file to be unavailable, so we must be patient and wait for
!     it to appear. For non-WAL files though the script must still report
!     failure. WAL files can be distinguished from non-WAL files by FIXME. A
!     waiting <varname>restore_command</> can be written as a custom script that
!     loops after polling for the existence of the next WAL file. There must
!     also be some way to trigger failover, which should interrupt the
!     <varname>restore_command</>, break the loop and return a file-not-found
!     error to the standby server. This ends recovery and the standby will then
!     come up as a normal server.
     </para>

     <para>

The FIXME of course needs replacement by someone in the know.

Markus Bertheau
Blog: http://www.bluetwanger.de/blog/

Re: [BUGS] Incomplete docs for restore_command for hot standby

От
Bruce Momjian
Дата:
Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------


Markus Bertheau wrote:
> 2008/2/22, Simon Riggs <simon@2ndquadrant.com>:
> > On Thu, 2008-02-21 at 08:01 +0600, Markus Bertheau wrote:
> >  >
> >  > Section 24.3.3.1 states about restore_command:
> >  >
> >  > "The command will be asked for file names that are not present in the
> >  > archive; it must return nonzero when so asked."
> >  >
> >  > Section 24.4.1 further states:
> >  >
> >  > "The magic that makes the two loosely coupled servers work together is
> >  > simply a restore_command used on the standby that waits for the next
> >  > WAL file to become available from the primary."
> >  >
> >  > It is not clear from the first paragraph, whether the non-existing
> >  > file that restore_command is being asked for is a not-yet-generated
> >  > WAL file or something different. If it was a not-yet-generated WAL
> >  > file, restore_command for replication would have to wait for it to
> >  > appear. If it was something different, restore_command for replication
> >  > would have to return an error right away. (Because else it would hang
> >  > indefinitely, waiting for a file that is not going to appear). Yet I
> >  > couldn't find hints in the documentation as to how these two cases can
> >  > be detected by restore_command, i.e. how restore_command should tell a
> >  > request for a WAL file from a request for a non-WAL file.
> >
> >
> > The two sentences aren't mutually exclusive, especially when you
> >  consider they are discussing two different use cases. Why not read up on
> >  pg_standby anyway?
>
> I read about pg_standby, but this is not about solving a particular problem but
> about missing information in the docs.
>
> >  > Practice (http://archives.postgresql.org/sydpug/2006-10/msg00001.php)
> >  > shows that this is a problem, and people use unproved heuristics
> >  > ('history' substring in the requested file name).
> >
> >
> > Old email written during beta. Read at your own peril.
>
> The email may be old, but the problem at hand is still relevant.
>
> >  > Additionally, 24.3.3 contains slightly misleading information:
> >  >
> >  > "It is important that the command return nonzero exit status on
> >  > failure. The command will be asked for log files that are not present
> >  > in the archive; it must return nonzero when so asked. This is not an
> >  > error condition."
> >  >
> >  > This suggests that all non-existing files that restore_command will be
> >  > asked for are log files. One could therefore reasonably assume that
> >  > restore_command for replication should wait on all non-existing files.
> >  > 24.3.3.1 later corrects this by stating that not only log files may be
> >  > requested, but nevertheless.
> >
> >
> > If you have some suggested changes, I'd be happy to hear them.
> >
> >  Probably additions are better than just changes though.
>
> What about this:
>
> *** a/doc/src/sgml/backup.sgml
> --- b/doc/src/sgml/backup.sgml
> ***************
> *** 1001,1011 **** restore_command = 'cp /mnt/server/archivedir/%f %p'
>
>      <para>
>       It is important that the command return nonzero exit status on failure.
> !     The command <emphasis>will</> be asked for log files that are not present
> !     in the archive; it must return nonzero when so asked.  This is not an
> !     error condition.  Be aware also that the base name of the <literal>%p</>
> !     path will be different from <literal>%f</>; do not expect them to be
> !     interchangeable.
>      </para>
>
>      <para>
> --- 1001,1011 ----
>
>      <para>
>       It is important that the command return nonzero exit status on failure.
> !     The command <emphasis>will</> be asked for log and other files that are
> !     not present in the archive; it must return nonzero when so asked.  This is
> !     not an error condition.  Be aware also that the base name of the
> !     <literal>%p</> path will be different from <literal>%f</>; do not expect
> !     them to be interchangeable.
>      </para>
>
>      <para>
> ***************
> *** 1576,1594 **** archive_command = 'local_backup_script.sh'
>
>      <para>
>       The magic that makes the two loosely coupled servers work together is
> !     simply a <varname>restore_command</> used on the standby that waits
> !     for the next WAL file to become available from the primary. The
> !     <varname>restore_command</> is specified in the
>       <filename>recovery.conf</> file on the standby server. Normal recovery
>       processing would request a file from the WAL archive, reporting failure
>       if the file was unavailable.  For standby processing it is normal for
> !     the next file to be unavailable, so we must be patient and wait for
> !     it to appear. A waiting <varname>restore_command</> can be written as
> !     a custom script that loops after polling for the existence of the next
> !     WAL file. There must also be some way to trigger failover, which should
> !     interrupt the <varname>restore_command</>, break the loop and return
> !     a file-not-found error to the standby server. This ends recovery and
> !     the standby will then come up as a normal server.
>      </para>
>
>      <para>
> --- 1576,1596 ----
>
>      <para>
>       The magic that makes the two loosely coupled servers work together is
> !     simply a <varname>restore_command</> used on the standby that, when asked
> !     for the a WAL file, waits for it to become available from the primary.
> !     The <varname>restore_command</> is specified in the
>       <filename>recovery.conf</> file on the standby server. Normal recovery
>       processing would request a file from the WAL archive, reporting failure
>       if the file was unavailable.  For standby processing it is normal for
> !     the next WAL file to be unavailable, so we must be patient and wait for
> !     it to appear. For non-WAL files though the script must still report
> !     failure. WAL files can be distinguished from non-WAL files by FIXME. A
> !     waiting <varname>restore_command</> can be written as a custom script that
> !     loops after polling for the existence of the next WAL file. There must
> !     also be some way to trigger failover, which should interrupt the
> !     <varname>restore_command</>, break the loop and return a file-not-found
> !     error to the standby server. This ends recovery and the standby will then
> !     come up as a normal server.
>      </para>
>
>      <para>
>
> The FIXME of course needs replacement by someone in the know.
>
> Markus Bertheau
> Blog: http://www.bluetwanger.de/blog/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [BUGS] Incomplete docs for restore_command for hot standby

От
Bruce Momjian
Дата:
Your patch has been added to the PostgreSQL unapplied patches list at:

    http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------


Markus Bertheau wrote:
> 2008/2/22, Simon Riggs <simon@2ndquadrant.com>:
> > On Thu, 2008-02-21 at 08:01 +0600, Markus Bertheau wrote:
> >  >
> >  > Section 24.3.3.1 states about restore_command:
> >  >
> >  > "The command will be asked for file names that are not present in the
> >  > archive; it must return nonzero when so asked."
> >  >
> >  > Section 24.4.1 further states:
> >  >
> >  > "The magic that makes the two loosely coupled servers work together is
> >  > simply a restore_command used on the standby that waits for the next
> >  > WAL file to become available from the primary."
> >  >
> >  > It is not clear from the first paragraph, whether the non-existing
> >  > file that restore_command is being asked for is a not-yet-generated
> >  > WAL file or something different. If it was a not-yet-generated WAL
> >  > file, restore_command for replication would have to wait for it to
> >  > appear. If it was something different, restore_command for replication
> >  > would have to return an error right away. (Because else it would hang
> >  > indefinitely, waiting for a file that is not going to appear). Yet I
> >  > couldn't find hints in the documentation as to how these two cases can
> >  > be detected by restore_command, i.e. how restore_command should tell a
> >  > request for a WAL file from a request for a non-WAL file.
> >
> >
> > The two sentences aren't mutually exclusive, especially when you
> >  consider they are discussing two different use cases. Why not read up on
> >  pg_standby anyway?
>
> I read about pg_standby, but this is not about solving a particular problem but
> about missing information in the docs.
>
> >  > Practice (http://archives.postgresql.org/sydpug/2006-10/msg00001.php)
> >  > shows that this is a problem, and people use unproved heuristics
> >  > ('history' substring in the requested file name).
> >
> >
> > Old email written during beta. Read at your own peril.
>
> The email may be old, but the problem at hand is still relevant.
>
> >  > Additionally, 24.3.3 contains slightly misleading information:
> >  >
> >  > "It is important that the command return nonzero exit status on
> >  > failure. The command will be asked for log files that are not present
> >  > in the archive; it must return nonzero when so asked. This is not an
> >  > error condition."
> >  >
> >  > This suggests that all non-existing files that restore_command will be
> >  > asked for are log files. One could therefore reasonably assume that
> >  > restore_command for replication should wait on all non-existing files.
> >  > 24.3.3.1 later corrects this by stating that not only log files may be
> >  > requested, but nevertheless.
> >
> >
> > If you have some suggested changes, I'd be happy to hear them.
> >
> >  Probably additions are better than just changes though.
>
> What about this:
>
> *** a/doc/src/sgml/backup.sgml
> --- b/doc/src/sgml/backup.sgml
> ***************
> *** 1001,1011 **** restore_command = 'cp /mnt/server/archivedir/%f %p'
>
>      <para>
>       It is important that the command return nonzero exit status on failure.
> !     The command <emphasis>will</> be asked for log files that are not present
> !     in the archive; it must return nonzero when so asked.  This is not an
> !     error condition.  Be aware also that the base name of the <literal>%p</>
> !     path will be different from <literal>%f</>; do not expect them to be
> !     interchangeable.
>      </para>
>
>      <para>
> --- 1001,1011 ----
>
>      <para>
>       It is important that the command return nonzero exit status on failure.
> !     The command <emphasis>will</> be asked for log and other files that are
> !     not present in the archive; it must return nonzero when so asked.  This is
> !     not an error condition.  Be aware also that the base name of the
> !     <literal>%p</> path will be different from <literal>%f</>; do not expect
> !     them to be interchangeable.
>      </para>
>
>      <para>
> ***************
> *** 1576,1594 **** archive_command = 'local_backup_script.sh'
>
>      <para>
>       The magic that makes the two loosely coupled servers work together is
> !     simply a <varname>restore_command</> used on the standby that waits
> !     for the next WAL file to become available from the primary. The
> !     <varname>restore_command</> is specified in the
>       <filename>recovery.conf</> file on the standby server. Normal recovery
>       processing would request a file from the WAL archive, reporting failure
>       if the file was unavailable.  For standby processing it is normal for
> !     the next file to be unavailable, so we must be patient and wait for
> !     it to appear. A waiting <varname>restore_command</> can be written as
> !     a custom script that loops after polling for the existence of the next
> !     WAL file. There must also be some way to trigger failover, which should
> !     interrupt the <varname>restore_command</>, break the loop and return
> !     a file-not-found error to the standby server. This ends recovery and
> !     the standby will then come up as a normal server.
>      </para>
>
>      <para>
> --- 1576,1596 ----
>
>      <para>
>       The magic that makes the two loosely coupled servers work together is
> !     simply a <varname>restore_command</> used on the standby that, when asked
> !     for the a WAL file, waits for it to become available from the primary.
> !     The <varname>restore_command</> is specified in the
>       <filename>recovery.conf</> file on the standby server. Normal recovery
>       processing would request a file from the WAL archive, reporting failure
>       if the file was unavailable.  For standby processing it is normal for
> !     the next WAL file to be unavailable, so we must be patient and wait for
> !     it to appear. For non-WAL files though the script must still report
> !     failure. WAL files can be distinguished from non-WAL files by FIXME. A
> !     waiting <varname>restore_command</> can be written as a custom script that
> !     loops after polling for the existence of the next WAL file. There must
> !     also be some way to trigger failover, which should interrupt the
> !     <varname>restore_command</>, break the loop and return a file-not-found
> !     error to the standby server. This ends recovery and the standby will then
> !     come up as a normal server.
>      </para>
>
>      <para>
>
> The FIXME of course needs replacement by someone in the know.
>
> Markus Bertheau
> Blog: http://www.bluetwanger.de/blog/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: [BUGS] Incomplete docs for restore_command for hot standby

От
Simon Riggs
Дата:
On Mon, 2008-02-25 at 17:56 +0600, Markus Bertheau wrote:
> 2008/2/22, Simon Riggs <simon@2ndquadrant.com>:

> > If you have some suggested changes, I'd be happy to hear them.
> >
> >  Probably additions are better than just changes though.
>
> What about this:
>
> *** a/doc/src/sgml/backup.sgml
> --- b/doc/src/sgml/backup.sgml
> ***************

...

> The FIXME of course needs replacement by someone in the know.

Doc patch edited to include all of Markus' points, tidy up some related
text and fix typos.

Good to apply to HEAD.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

  PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk

Вложения

Re: [BUGS] Incomplete docs for restore_command for hotstandby

От
Heikki Linnakangas
Дата:
Simon Riggs wrote:
> On Mon, 2008-02-25 at 17:56 +0600, Markus Bertheau wrote:
>> The FIXME of course needs replacement by someone in the know.
>
> Doc patch edited to include all of Markus' points, tidy up some related
> text and fix typos.
>
> Good to apply to HEAD.

Committed to HEAD with minor fixes.

What's our policy wrt. back-patching doc changes? This seems applicable
to older versions as well, but do we do that?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: [BUGS] Incomplete docs for restore_command for hotstandby

От
Bruce Momjian
Дата:
Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Mon, 2008-02-25 at 17:56 +0600, Markus Bertheau wrote:
> >> The FIXME of course needs replacement by someone in the know.
> >
> > Doc patch edited to include all of Markus' points, tidy up some related
> > text and fix typos.
> >
> > Good to apply to HEAD.
>
> Committed to HEAD with minor fixes.
>
> What's our policy wrt. back-patching doc changes? This seems applicable
> to older versions as well, but do we do that?

I do backpatch of doc changes if the change is serious.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +