On 9/18/19 8:58 PM, David Steele wrote:
> On 9/18/19 9:40 PM, Ron wrote:
>> I'm concerned with one pgbackrest process stepping over another one and
>> the restore (or the "pg_ctl start" recovery phase) accidentally
>> corrupting the production database by writing WAL files to the original
>> cluster.
> This is not an issue unless you seriously game the system. When a
> cluster is promoted it selects a new timeline and all WAL will be
> archived to the repo on that new timeline. It's possible to promote a
> cluster without a timeline switch by tricking it but this is obviously a
> bad idea.
What's a timeline switchover?
> So, if you promote the new cluster and forget to disable archive_command
> there will be no conflict because the clusters will be generating WAL on
> separate timelines.
No cluster promotion even contemplated.
The point of the exercise would be to create an older copy of the cluster --
while the production cluster is still running, while production jobs are
still pumping data into the production database -- from before the time of
the data loss, and query it in an attempt to recover the records which were
deleted.
> In the case of a future failover a higher timeline will be selected so
> there still won't be a conflict.
>
> Unfortunately, that dead WAL from the rogue cluster will persist in the
> repo until an PostgreSQL upgrade because expire doesn't know when it can
> be removed since it has no context. We're not quite sure how to handle
> this but it seems a relatively minor issue, at least as far as
> consistency is concerned.
>
> If you do have a split-brain situation where two primaries are archiving
> on the same timeline then first-in wins. WAL from the losing primary
> will be rejected.
>
> Regards,
--
Angular momentum makes the world go 'round.