Re: Standby trying "restore_command" before local WAL

Поиск
Список
Период
Сортировка
От David Steele
Тема Re: Standby trying "restore_command" before local WAL
Дата
Msg-id 010562f7-b0f4-d41e-2343-e93b72f1f4d6@pgmasters.net
обсуждение исходный текст
Ответ на Re: Standby trying "restore_command" before local WAL  (Stephen Frost <sfrost@snowman.net>)
Список pgsql-hackers
On 8/7/18 11:42 AM, Stephen Frost wrote:
> 
>>> CRC's are per WAL record, and possibly some WAL records might not be ok
>>> to replay, or at least we need to make sure that we replay the right set
>>> of WAL in the right order even when there are partial WAL files being
>>> given to PG (that aren't named that way...).  The more I think about
>>> this, I think we really need to avoid partial WAL files entirely- what
>>> are we going to do when we get to the end of one?  We'd need to request
>>> the full one from the restore command anyway, so seems like we should
>>> just go ahead and get it from the archive, the question is if there's an
>>> easy/cheap way to detect partial WAL files in pg_wal.
>>
>> As explained above, I don't think this is actually a problem. The checksums
>> do cover the whole file thanks to chaining, and there are ways to detect
>> partial segments. IMHO it's fine if we replay a segment and then find out it
>> was partial and that we need to fetch it from archive anyway and re-apply it
>> - it should not be very common case, except when the user does something
>> silly.
> 
> As long as we *do* go off and try to fetch that WAL file and replay it,
> and don't assume that the end of that partial WAL file means the end of
> WAL replay, then I think you may be right and that it'd be fine, but it
> does seem a bit risky to me.

This assumes that the local partial is a subset of the archived full WAL
segment, which should be true in most cases but I don't think we can
discount the possibility that it isn't.  Split-brain is certainly a way
to get to differing partials, though in that case things are already
pretty bad.

I've seen some pretty messed up situations and usually it is best to
treat the WAL archive as the ground truth.  If the archive_command is
smart enough not to overwrite WAL segments that already exist with
different versions then it should be a reliable record that all servers
can be replayed from (split-brains aside).  I think it's best to treat
the local WAL with some suspicion unless it is known to be good, i.e.
just restored from archive.

I do agree that most inconsistencies could be detected and throw an
error, but only if the WAL in the repository is examined, which means
making a round-trip there anyway.

At the very least, it seems that simple enabling "read from pg_wal
first" is not a good idea without making other changes to ensure it is
done correctly.

Regards,
-- 
-David
david@pgmasters.net


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: pgsql: Fix run-time partition pruning for appends with multiple source
Следующее
От: David Steele
Дата:
Сообщение: Re: Standby trying "restore_command" before local WAL