Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standbyserver

Поиск
Список
Период
Сортировка
От David Steele
Тема Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standbyserver
Дата
Msg-id bdd9cf10-879b-7066-dbc7-1f3983627bf2@pgmasters.net
обсуждение исходный текст
Ответ на Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standbyserver  (Stephen Frost <sfrost@snowman.net>)
Ответы Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standbyserver  (David Steele <david@pgmasters.net>)
Список pgsql-hackers
On 7/24/17 12:28 PM, Stephen Frost wrote:
> 
> * David Steele (david@pgmasters.net) wrote:
> 
>> While this patch brings pg_stop_backup() in line with the
>> documentation, it also introduces a behavioral change compared to
>> 9.6.  Currently, the default 9.6 behavior on a standby is to return
>> immediately from pg_stop_backup(), but this patch will cause it to
>> block by default instead.  Since action on the master may be
>> required to unblock the process, I see this as a pretty significant
>> change.  I still think we should fix and backpatch, but I wanted to
>> point this out.
> 
> This will need to be highlighted in the release notes for 9.6.4 also,
> assuming there is agreement to back-patch this, and we'll need to update
> the documentation in 9.6 also to be clearer about what happens on a
> standby.

Agreed.

>> "If the WAL segments required for this backup have not been archived
>> then it might be necessary to force a segment switch on the
>> primary."
> 
> I'm a bit on the fence regarding the distinction here for the
> backup-from-standby and this errhint().  The issue is that the WAL for
> the backup hasn't been archived yet and that may be because of either:
> 
> archive_command is failing
> OR
> No WAL is getting generated
> 
> In either case, both of these apply to the primary and the standby.  As
> such, I'm inclined to try to mention both in the original errhint()
> instead of making two different ones.  I'm open to suggestions on this,
> of course, but my thinking would be more like:
> 
> -----
> Either archive_command is failing or not enough WAL has been generated
> to require a segment switch.  Run pg_switch_wal() to request a WAL
> switch and monitor your logs to check that your archive_command is
> executing properly.
> -----

Yes, that seems more concise.  I like the idea of not having to maintain 
two separate hints.

> And then change pg_switch_wal()'s errhint for when it's run on a replica
> to be:
> 
> ----
> HINT: WAL control functions cannont be executed during recovery; they
> should be executed on the primary instead.
> ----

Looks good to me.  This explanation is useful in general.

>> 2) At backup.sgml L1015 it says that pg_stop_backup() will
>> automatically switch the WAL segment.  There should be a caveat here
>> for standby backups, like:
>>
>> When called on a master it terminates the backup mode and performs
>> an automatic switch to the next WAL segment.  The reason for the
>> switch is to arrange for the last WAL segment written during the
>> backup interval to be ready to archive.  When called on a standby
>> the WAL segment switch must be performed manually on the master if
>> it does not happen due to normal write activity.
> 
> s/master/primary/g

Yes.

>> 3) The fact that this fix requires "archive_mode = always" seems
>> like a pretty big caveat, thought I don't have any ideas on how to
>> make it better without big code changes.  Maybe it would be help to
>> change:
>>
>> + the backup is taken on a standby, <function>pg_stop_backup</> waits
>> + for WAL to be archived when <varname>archive_mode</>
>>
>> To:
>>
>> + the backup is taken on a standby, <function>pg_stop_backup</> waits
>> + for WAL to be archived *only* when <varname>archive_mode</>
> 
> I'm thinking of rewording that a bit to say "When archive_mode = always"
> instead, as I think that might be a little clearer.

Sure.

>> Perhaps this should be noted in the pg_basebackup docs as well?
> 
> That seems like it's probably a good idea too, as people might wonder
> why pg_basebackup hasn't finished yet if it's waiting for WAL to be
> archived.

Yes, and this is another behavioral change to consider -- one that is 
more likely to impact users than the change to pg_stop_backup().  If 
pg_basebackup is not currently waiting for WAL on a standby (but does on 
a primary) then that's pretty serious.

Any thoughts on this, Magnus?

-- 
-David
david@pgmasters.net



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: [HACKERS] Buildfarm failure and dubious coding in predicate.c
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server