>> Maybe we should just bite the bullet and change the WAL format for >> heap_freeze (inventing an all-new record type, not repurposing the old >> one, and allowing WAL replay to continue to accept the old one). The >> implication for users would be that they'd have to update slave servers >> before the master when installing the update; which is unpleasant, but >> better than living with a known data corruption case.
> Agreed. It may suck, but it sucks less.
> How badly will it break if they do the upgrade in the wrong order though. > Will the slaves just stop (I assume this?) or is there a risk of a > wrong-order upgrade causing extra breakage?
I assume what would happen is the slave would PANIC upon seeing a WAL record code it didn't recognize. Installing the updated version should allow it to resume functioning. Would be good to test this, but if it doesn't work like that, that'd be another bug to fix IMO. We've always foreseen the possible need to do something like this, so it ought to work reasonably cleanly.
I wonder if we should for the future have the START_REPLICATION command (or the IDENTIFY_SYSTEM would probably make more sense - or even adding a new command like IDENTIFY_CLIENT. The point is, something in the replication protocol) have walreceiver include it's version sent to the master. That way we could have the walsender identify a walreceiver that's too old and disconnect it right away - with a much nicer error message than a PANIC. Right now, walreceiver knows the version of the walsender (through pqserverversion), but AFAICT there is no way for the walsender to know which version of the receiver is connected.