Обсуждение: Usability improvements for pg_stop_backup()
Hackers,
Since Gabrielle has improved archiving with pg_stat_archiver in 9.4, I'd
like to go further and improve the usability of pg_stop_backup().
However, based on my IRC discussion with Vik, there might not be
consensus on what the right behavior *should* be.  This is for 9.5, of
course.
Currently, if archive_command is failing, pg_stop_backup() will hang
forever.  The only way to figure out what's wrong with pg_stop_backup()
is to tail the PostgreSQL logs.  This is difficult for users to
troubleshoot, and strongly resists any kind of automation.
Yes, we can work around this by setting statement_timeout, but that has
two issues (a) the user has to remember to do it before the problem
occurs, and (b) it won't differentiate between archive failure and other
reasons it might time out.
As such, I propose that pg_stop_backup() should error with an
appropriate error message ("Could not archive WAL segments") after three
archiving attempts.  We could also add an optional parameter to raise
the number of attempts from the default of three.
An alternative, if we were doing this from scratch, would be for
pg_stop_backup to return false or -1 or something if it couldn't
archive; there are reasons why a user might not care that
archive_command was failing (shared storage comes to mind).  However,
that would be a surprising break with backwards compatability, since
currently users don't check the result value of pg_stop_backup().
Thoughts?
-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
			
		Josh Berkus <josh@agliodbs.com> wrote:
> Currently, if archive_command is failing, pg_stop_backup() will hang
> forever.  The only way to figure out what's wrong with pg_stop_backup()
> is to tail the PostgreSQL logs.  This is difficult for users to
> troubleshoot, and strongly resists any kind of automation.
That is bad.
> Yes, we can work around this by setting statement_timeout, but that has
> two issues (a) the user has to remember to do it before the problem
> occurs, and (b) it won't differentiate between archive failure and other
> reasons it might time out.
Clearly not a long-term solution.
> As such, I propose that pg_stop_backup() should error with an
> appropriate error message ("Could not archive WAL segments") after
> three
> archiving attempts.  We could also add an optional parameter to raise
> the number of attempts from the default of three.
That sounds sane to me.
> An alternative, if we were doing this from scratch, would be for
> pg_stop_backup to return false or -1 or something if it couldn't
> archive; there are reasons why a user might not care that
> archive_command was failing (shared storage comes to mind).  However,
> that would be a surprising break with backwards compatability, since
> currently users don't check the result value of pg_stop_backup().
Some might, which is a stronger argument against changing what get
returned.  Even in a green field though, I would argue that
pg_stop_backup() should return information about the minimum range
of WAL files needed to perform a consistent recovery -- or possibly
duplicate everything in the backup history file.  An error seems
much more appropriate to indicate that the user does not have a
valid backup.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company