Re: Replication failure, slave requesting old segments

Поиск
Список
Период
Сортировка
От Adrian Klaver
Тема Re: Replication failure, slave requesting old segments
Дата
Msg-id 708ff81b-9f5f-9726-fec7-59e90243947d@aklaver.com
обсуждение исходный текст
Ответ на Re: Replication failure, slave requesting old segments  (Stephen Frost <sfrost@snowman.net>)
Ответы Re: Replication failure, slave requesting old segments  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-general
On 08/13/2018 05:39 AM, Stephen Frost wrote:
> Greetings,
> 
> * Phil Endecott (spam_from_pgsql_lists@chezphil.org) wrote:
>> Adrian Klaver wrote:
>>> On 08/12/2018 02:56 PM, Phil Endecott wrote:
>>>> Anyway.  Do others agree that my issue was the result of
>>>> wal_keep_segments=0 ?
>>>
>>> Only as a sub-issue of the slave losing contact with the master. The basic
>>> problem is maintaining two separate operations, archiving and streaming,
>>> in sync. If either or some combination of both lose synchronization then
>>> it is anyone's guess on what is appropriate for wal_keep_segments.
> 
> Uh, no, having an archive_command and a restore_command configures
> exactly should remove the need to worry about what wal_keep_segments is
> set to because anything not on the primary really should be available
> through what's been archived and PG shouldn't have any trouble figuring
> that out and working with it.
> 
> If all you've got is streaming replication then, sure, you have no idea
> what to set wal_keep_segments to because the replica could be offline
> for an indeterminate amount of time, but as long as you're keeping track
> of all the WAL through archive_command, that shouldn't be an issue.

Therein lies the rub. As I stated previously the bigger issue is syncing 
two different operations, archiving and streaming. The OP got caught 
short assuming the archiving would handle the situation where the 
streaming was down for a period. In his particular setup and for this 
particular situation a wal_keep_segments of 1 would have helped. I do 
not see this as a default value though as it depends on too many 
variables outside the reach of the database, mostly notably the success 
of the archive command. First is the command even valid, two is the 
network link reliable, three is there even a network link, is there more 
then one network link, four is the restore command valid? That is just 
of the top of my head, more caffeine and I could come up with more. 
Saying that having archiving, streaming and a wal_keep_segments=1 has 
you covered, is misleading. I don't see it as detrimental to performance 
but I do see more posts down the road from folks who are surprised when 
it does not cover their case. Personally I think it better to be up 
front that this requires more thought or a third party solution that has 
done the thinking.

> 
>> Really?  I thought the intention was that the system should be
>> able to recover reliably when the slave reconnects after a
>> period of downtime, subject only to there being sufficient
>> network/CPU/disk bandwidth etc. for it to eventually catch up.
> 
> Yes, that's correct, the replica should always be able to catch back up
> presuming there's no gaps in the WAL between when the replica failed and
> where the primary is at.
> 
> Thanks!
> 
> Stephen
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com


В списке pgsql-general по дате отправления:

Предыдущее
От: Raghavendra Rao J S V
Дата:
Сообщение: Re: is there any adverse effect on DB if I set autovacuum scale factor to zero?
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: Replication failure, slave requesting old segments