On 1/22/15 7:54 PM, Stephen Frost wrote:
> * Bruce Momjian (bruce@momjian.us) wrote:
>> >On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:
>>> > >Or do you - as the text edited in your patch, but not the quote above -
>>> > >mean to run pg_upgrade just on the primary and then rsync?
>> >
>> >No, I was going to run it on both, then rsync.
> I'm pretty sure this is all a lot easier than you believe it to be. If
> you want to recreate what pg_upgrade does to a cluster then the simplest
> thing to do is rsync before removing any of the hard links. rsync will
> simply recreate the same hard link tree that pg_upgrade created when it
> ran, and update files which were actually changed (the catalog tables).
>
> The problem, as mentioned elsewhere, is that you have to checksum all
> the files because the timestamps will differ. You can actually get
> around that with rsync if you really want though- tell it to only look
> at file sizes instead of size+time by passing in --size-only.
What if instead of trying to handle that on the rsync side, we changed pg_upgrade so that it created hardlinks that had
thesame timestamp as the original file?
That said, the whole timestamp race condition in rsync gives me the heebie-jeebies. For normal workloads maybe it's not
thatbig a deal, but when dealing with fixed-size data (ie: Postgres blocks)? Eww.
How horribly difficult would it be to allow pg_upgrade to operate on multiple servers? Could we have it create a shell
scriptinstead of directly modifying things itself? Or perhaps some custom "command file" that could then be replayed by
pg_upgradeon another server? Of course, that's assuming that replicas are compatible enough with masters for that to
work...
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com