Hi, Alexey!
I expect, that it will be a good idea to allow pg_rewind to look for a restore_command
+1
Normally you do not expect huge progress on failed master. But you still can get a lot of WAL if you have network partition and scheduled tasks like pg_repack.
I'm not actually aware what kind of problems led to these but we were considering some automation to fetch WALs for failed master to improve rewinded\resetuped ratio.
I prepared a proof of concept patch (please, find attached), which does exactly what I described above. I played with it a little and it seems to be working, tests were accordingly updated to verify this archive retrieval functionality too.
Patch is relatively simple excepting the one part: if we want to parse recovery.conf (with all possible includes, etc.) and get restore_command, then we should use guc-file.l parser, which is heavily linked to backend, e.g. in error reporting part. So I copied it and made frontend-safe version guc-file-fe.l. Personally, I don't think it's a good idea, but nothing else came to mind. It is also possible to leave the only one option -- passing restore_command as command line argument.
I think it is better to load restore_command from recovery.conf.
I didn't actually try patch yet, but the idea seems interesting. Will you add it to the commitfest?
Best regards, Andrey Borodin.