Re: Replication failure, slave requesting old segments

Поиск
Список
Период
Сортировка
От Phil Endecott
Тема Re: Replication failure, slave requesting old segments
Дата
Msg-id 1534110971810@dmwebmail.dmwebmail.chezphil.org
обсуждение исходный текст
Ответ на Re: Replication failure, slave requesting old segments  (Stephen Frost <sfrost@snowman.net>)
Ответы Re: Replication failure, slave requesting old segments
Список pgsql-general
Stephen Frost wrote:
> * Phil Endecott (spam_from_pgsql_lists@chezphil.org) wrote:
>> Stephen Frost wrote:
>> >* Phil Endecott (spam_from_pgsql_lists@chezphil.org) wrote:
>> >>2018-08-11 00:12:15.536 UTC [7954] LOG:  restored log file "0000000100000007000000D0" from archive
>> >>2018-08-11 00:12:15.797 UTC [7954] LOG:  redo starts at 7/D0F956C0
>> >>2018-08-11 00:12:16.068 UTC [7954] LOG:  consistent recovery state reached at 7/D0FFF088
>> 
>> Are the last two log lines above telling us anything useful?  Is that
>> saying that, of the 16 MB (0x1000000 byte) WAL file it restored only as
>> far as byte 0xf956c0 or 0xfff088?  Is that what we would expect?  Is it
>> trying to use the streaming connection to get the missing bytes from
>> FFFF088 to FFFFFFFF?  Is that just an empty gap at the end of the file
>> due to the next record being too big to fit?
>
> The short answer is that, yes, the next record was likely too big to
> fit, but that's what the replica was trying to figure out and couldn't
> because D0 was gone from the primary already.  One thing which should
> help this would be to use physical replication slots on the primary,
> which would keep it from throwing away WAL files until it knows the
> replica has them, but that runs the risk of ending up with lots of extra
> WAL on the primary if the replica is gone for a while.  If you'd prefer
> to avoid that then having wal_keep_segments set to '1' would avoid this
> particular issue as well, I'd expect.
>
> I do wonder if perhaps we should just default to having it as '1' to
> avoid exactly this case, as it seems like perhaps PG archived D0 and
> then flipped to D1 and got rid of D0, which is all pretty reasonable,
> except that a replica trying to catch up is going to end up asking for
> D0 from the primary because it didn't know if there was anything else
> that should have been in D0..

OK.  I think this is perhaps a documentation bug, maybe a missing
warning when the master reads its configuration, and maybe (as you say)
a bad default value.

Specifically, section 26.2.5 of the docs says:

"If you use streaming replication without file-based continuous archiving,
the server might recycle old WAL segments before the standby has received
them. If this occurs, the standby will need to be reinitialized from a new
base backup. You can avoid this by setting wal_keep_segments to a value
large enough to ensure that WAL segments are not recycled too early, or by
configuring a replication slot for the standby. If you set up a WAL archive
that's accessible from the standby, these solutions are not required, since
the standby can always use the archive to catch up provided it retains enough
segments."

OR, maybe the WAL reader that process the files that restore_command fetches
could be smart enough to realise that it can skip over the gap at the end?

Anyway.  Do others agree that my issue was the result of 
wal_keep_segments=0 ?


Regards, Phil.






В списке pgsql-general по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: Replication failure, slave requesting old segments
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: Replication failure, slave requesting old segments