On 12/13/18 11:53 AM, David Steele wrote:
> Hackers,
>
> The alphabetical ordering of pgarch_readyXlog() means that on promotion
> 000000010000000100000001.partial will be archived before 00000002.history.
>
> This appears harmless, but the .history files are what other potential
> primaries use to decide what timeline they should pick. The additional
> latency of compressing/transferring the much larger partial file means
> that archiving of the .history file is delayed and greatly increases the
> chance that another primary will promote to the same timeline.
>
> Teach pgarch_readyXlog() to return .history files first (and in order)
> to reduce the window where this can happen. This won't prevent all
> conflicts, but it is a simple change and should greatly reduce
> real-world occurrences.
>
> I also think we should consider back-patching this change. It's hard to
> imagine that archive commands would have trouble with this reordering
> and the current ordering causes real pain in HA clusters.
Some gcc versions wanted more parens, so updated in attached.
--
-David
david@pgmasters.net