Обсуждение: pg_upgrade and rsync

Поиск
Список
Период
Сортировка

pg_upgrade and rsync

От
Bruce Momjian
Дата:
It is possible to upgrade on pg_upgrade on streaming standby servers by
making them master servers, running pg_upgrade on them, then shuting
down all servers and using rsync to make the standby servers match the
real master.

However, Josh Berkus reported that he found that hint bits set on the
master server but not on the standby servers made rsync too slow.

The attached pg_upgrade doc patch recommends using wal_log_hints to make
hint bits on the standby match the master.  One question I have is
whether hint bits are set by read-only transactions on standby servers.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Вложения

Re: pg_upgrade and rsync

От
Heikki Linnakangas
Дата:
On 01/22/2015 09:20 PM, Bruce Momjian wrote:
> One question I have is whether hint bits are set by read-only
> transactions on standby servers.

No. See comments in MarkBufferDirtyHint:

>         /*
>          * If we need to protect hint bit updates from torn writes, WAL-log a
>          * full page image of the page. This full page image is only necessary
>          * if the hint bit update is the first change to the page since the
>          * last checkpoint.
>          *
>          * We don't check full_page_writes here because that logic is included
>          * when we call XLogInsert() since the value changes dynamically.
>          */
>         if (XLogHintBitIsNeeded() && (bufHdr->flags & BM_PERMANENT))
>         {
>             /*
>              * If we're in recovery we cannot dirty a page because of a hint.
>              * We can set the hint, just not dirty the page as a result so the
>              * hint is lost when we evict the page or shutdown.
>              *
>              * See src/backend/storage/page/README for longer discussion.
>              */
>             if (RecoveryInProgress())
>                 return;


- Heikki



Re: pg_upgrade and rsync

От
Jim Nasby
Дата:
On 1/22/15 2:19 PM, Heikki Linnakangas wrote:
> On 01/22/2015 09:20 PM, Bruce Momjian wrote:
>> One question I have is whether hint bits are set by read-only
>> transactions on standby servers.
>
> No. See comments in MarkBufferDirtyHint:
>
>>         /*
>>          * If we need to protect hint bit updates from torn writes, WAL-log a
>>          * full page image of the page. This full page image is only necessary
>>          * if the hint bit update is the first change to the page since the
>>          * last checkpoint.
>>          *
>>          * We don't check full_page_writes here because that logic is included
>>          * when we call XLogInsert() since the value changes dynamically.
>>          */
>>         if (XLogHintBitIsNeeded() && (bufHdr->flags & BM_PERMANENT))
>>         {
>>             /*
>>              * If we're in recovery we cannot dirty a page because of a hint.
>>              * We can set the hint, just not dirty the page as a result so the
>>              * hint is lost when we evict the page or shutdown.
>>              *
>>              * See src/backend/storage/page/README for longer discussion.
>>              */
>>             if (RecoveryInProgress())
>>                 return;

What if XLogHintBitIsNeeded is false? That would be the case if we're not wall logging hints *on the standby*.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: pg_upgrade and rsync

От
Heikki Linnakangas
Дата:
On 01/22/2015 10:34 PM, Jim Nasby wrote:
> On 1/22/15 2:19 PM, Heikki Linnakangas wrote:
>> On 01/22/2015 09:20 PM, Bruce Momjian wrote:
>>> One question I have is whether hint bits are set by read-only
>>> transactions on standby servers.
>>
>> No. See comments in MarkBufferDirtyHint:
>>
>>>          /*
>>>           * If we need to protect hint bit updates from torn writes, WAL-log a
>>>           * full page image of the page. This full page image is only necessary
>>>           * if the hint bit update is the first change to the page since the
>>>           * last checkpoint.
>>>           *
>>>           * We don't check full_page_writes here because that logic is included
>>>           * when we call XLogInsert() since the value changes dynamically.
>>>           */
>>>          if (XLogHintBitIsNeeded() && (bufHdr->flags & BM_PERMANENT))
>>>          {
>>>              /*
>>>               * If we're in recovery we cannot dirty a page because of a hint.
>>>               * We can set the hint, just not dirty the page as a result so the
>>>               * hint is lost when we evict the page or shutdown.
>>>               *
>>>               * See src/backend/storage/page/README for longer discussion.
>>>               */
>>>              if (RecoveryInProgress())
>>>                  return;
>
> What if XLogHintBitIsNeeded is false? That would be the case if we're not wall logging hints *on the standby*.

Then the page will be updated without writing a WAL record. Just like in 
the master, if wal_log_hints is off. wal_log_hints works the same on the 
master or the standby.

- Heikki



Re: pg_upgrade and rsync

От
Bruce Momjian
Дата:
On Thu, Jan 22, 2015 at 10:48:37PM +0200, Heikki Linnakangas wrote:
> >>>          * If we need to protect hint bit updates from torn writes, WAL-log a
> >>>          * full page image of the page. This full page image is only necessary
> >>>          * if the hint bit update is the first change to the page since the
> >>>          * last checkpoint.
> >>>          *
> >>>          * We don't check full_page_writes here because that logic is included
> >>>          * when we call XLogInsert() since the value changes dynamically.
> >>>          */
> >>>         if (XLogHintBitIsNeeded() && (bufHdr->flags & BM_PERMANENT))
> >>>         {
> >>>             /*
> >>>              * If we're in recovery we cannot dirty a page because of a hint.
> >>>              * We can set the hint, just not dirty the page as a result so the
> >>>              * hint is lost when we evict the page or shutdown.
> >>>              *
> >>>              * See src/backend/storage/page/README for longer discussion.
> >>>              */
> >>>             if (RecoveryInProgress())
> >>>                 return;
> >
> >What if XLogHintBitIsNeeded is false? That would be the case if we're not wall logging hints *on the standby*.
> 
> Then the page will be updated without writing a WAL record. Just
> like in the master, if wal_log_hints is off. wal_log_hints works the
> same on the master or the standby.

[ see below for why this entire idea might not work ]

OK, I was confused by your previous "no".  It means we do update hint
bits on read-only slave queries --- we just don't WAL log them.  In
fact, we can't update hint bits on the standby if we are wal logging
them ---- is that right?

My text was saying:
these differences can be reduced by using a fresh standby and byenabling <xref linkend="guc-wal-log-hints">.
(While<varname>wal_log_hints</>transfers hint bits from the primary tostandbys, additional hint bits are still set on
thestandbys byread-only queries.)
 

meaning if you don't run any read-only queries on the standby, the files
will be same on master/standby because the hint bits will be the same,
and rsync will not copy the files.

This brings up the other problem that the mod times of the files are
likely to be different between master and slave --- should I recommend
to only use rsync --checksum?

I would really like to get a way to pg_upgrade the standbys but we have
never really be able to get a solution.  Ideally we would update just
the system table files, and if the order of pg_upgrade file renames is
exactly the same, everything else would match, but I can't imagine what
such an API would look like.  Have pg_upgrade spit out a list of files
to be copied?

In fact, these are the relfilenodes pg_upgrade preserves:
*  While pg_class.oid and pg_class.relfilenode are initially the same*  in a cluster, they can diverge due to CLUSTER,
REINDEX,or VACUUM*  FULL.  In the new cluster, pg_class.oid and pg_class.relfilenode will*  be the same and will match
theold pg_class.oid value.  Because of*  this, old/new pg_class.relfilenode values will not match if CLUSTER,*
REINDEX,or VACUUM FULL have been performed in the old cluster.**  We control all assignments of pg_type.oid because
theseoids are stored*  in user composite type values.**  We control all assignments of pg_enum.oid because these oids
arestored*  in user tables as enum values.**  We control all assignments of pg_authid.oid because these oids are
stored* in pg_largeobject_metadata.
 

so if the table/index relfilenodes no longer match the oid on the old
cluster, due to CLUSTER, REINDEX, or VACUUM FULL, the file name will not
match on the new cluster and rsync will copy the entire file.  In fact,
rsync is going to copy it to the wrong file name, and delete the right
file.  

I am going to now conclude that rsync is never going to work for this,
unless we have pg_upgrade preserve relfilenodes as well.  However, I am
not even sure that is possible due to conflicts with system table
relfilenodes created in the new cluster.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: pg_upgrade and rsync

От
Jim Nasby
Дата:
On 1/22/15 5:43 PM, Bruce Momjian wrote:
> This brings up the other problem that the mod times of the files are
> likely to be different between master and slave --- should I recommend
> to only use rsync --checksum?

I don't think so. AIUI if the timestamps are different the very next thing it does is run the checksum (which is
expensive).So --checksum is just going to hurt.
 

> I am going to now conclude that rsync is never going to work for this,
> unless we have pg_upgrade preserve relfilenodes as well.  However, I am
> not even sure that is possible due to conflicts with system table
> relfilenodes created in the new cluster.

We've previously talked about required steps before an upgrade; perhaps we need a way to force an OID/relfilenode
changeon the old server prior to upgrade.
 

Or, thinking outside the box here... could this type of stuff be done in postgres itself so we could generate wal
that'sshipped to standby's? That would allow doing this as part of the formal upgrade process without the need for
preliminarysteps.
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: pg_upgrade and rsync

От
Andres Freund
Дата:
On 2015-01-22 14:20:51 -0500, Bruce Momjian wrote:
> It is possible to upgrade on pg_upgrade on streaming standby servers by
> making them master servers, running pg_upgrade on them, then shuting
> down all servers and using rsync to make the standby servers match the
> real master.

Isn't that a pretty crazy procedure? If you need to shut down all
servers anyway, you can just rsync after having run pg_upgrade on the
master, no? Rsync won't really transfer less just because you ran a
similar thing on the standby.

Even if this would allow to avoid some traffic for fsync: There's
absolutely no guarantee that the standby's pg_upgrade results in a all
that similar data directory. Far from everything in postgres is
deterministic - it's easy to hit timing differences that result in
noticeable differences.

Or do you - as the text edited in your patch, but not the quote above -
mean to run pg_upgrade just on the primary and then rsync?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_upgrade and rsync

От
Bruce Momjian
Дата:
On Thu, Jan 22, 2015 at 06:04:24PM -0600, Jim Nasby wrote:
> On 1/22/15 5:43 PM, Bruce Momjian wrote:
> >This brings up the other problem that the mod times of the files
> >are likely to be different between master and slave --- should I
> >recommend to only use rsync --checksum?
>
> I don't think so. AIUI if the timestamps are different the very next
> thing it does is run the checksum (which is expensive). So --checksum
> is just going to hurt.

Oh, OK, good.

> >I am going to now conclude that rsync is never going to work for
> >this, unless we have pg_upgrade preserve relfilenodes as well.
> >However, I am not even sure that is possible due to conflicts with
> >system table relfilenodes created in the new cluster.
>
> We've previously talked about required steps before an upgrade;
> perhaps we need a way to force an OID/relfilenode change on the old
> server prior to upgrade.

Actually, the idea I had forgotten is that we are not rsyncing between
old and new clusters here, but between two servers who are both new
after running pg_upgrade.  Their relfilenodes match their oid, and the
oids match, so we should be fine.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: pg_upgrade and rsync

От
Bruce Momjian
Дата:
On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:
> On 2015-01-22 14:20:51 -0500, Bruce Momjian wrote:
> > It is possible to upgrade on pg_upgrade on streaming standby servers by
> > making them master servers, running pg_upgrade on them, then shuting
> > down all servers and using rsync to make the standby servers match the
> > real master.
> 
> Isn't that a pretty crazy procedure? If you need to shut down all

Yes, it is crazy, but so is pg_upgrade.  ;-)

> servers anyway, you can just rsync after having run pg_upgrade on the
> master, no? Rsync won't really transfer less just because you ran a
> similar thing on the standby.

Uh, yeah, it will, because the files can get renamed as part of the
upgrade (relfilenode now matches oid), so by running the upgrade, file
names are going to match up.  I didn't think rsync could handle renaming
of files without recopying the entire file.

> Even if this would allow to avoid some traffic for fsync: There's
> absolutely no guarantee that the standby's pg_upgrade results in a all
> that similar data directory. Far from everything in postgres is
> deterministic - it's easy to hit timing differences that result in
> noticeable differences.

Right, some non-deterministic things would change, but I thought
runnning upgrade on the standby would help.  However, now that I think
of it, we don't preserver the database directory name and assume
dbs will will get the same oid and therefore same database directory
name on both, but if you use -j, things are going to happen in random
order.  Oops.

Oh well.

> Or do you - as the text edited in your patch, but not the quote above -
> mean to run pg_upgrade just on the primary and then rsync?

No, I was going to run it on both, then rsync.

I am thinking the fix for standys would be similar to what we recommand
for upgrades with link mode using a rsync-created copy, e.g. use rsync
while the master is running to create a copy of the standby, then shut
down the master and run rsync again.  However, at that point, you might
as well just take a base backup and be done with it.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: pg_upgrade and rsync

От
Stephen Frost
Дата:
* Bruce Momjian (bruce@momjian.us) wrote:
> On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:
> > Or do you - as the text edited in your patch, but not the quote above -
> > mean to run pg_upgrade just on the primary and then rsync?
>
> No, I was going to run it on both, then rsync.

I'm pretty sure this is all a lot easier than you believe it to be.  If
you want to recreate what pg_upgrade does to a cluster then the simplest
thing to do is rsync before removing any of the hard links.  rsync will
simply recreate the same hard link tree that pg_upgrade created when it
ran, and update files which were actually changed (the catalog tables).

The problem, as mentioned elsewhere, is that you have to checksum all
the files because the timestamps will differ.  You can actually get
around that with rsync if you really want though- tell it to only look
at file sizes instead of size+time by passing in --size-only.  I have to
admit that for *my* taste, at least, that's getting pretty darn
optimistic.  It *should* work, but I'd definitely recommend testing it
about a billion times in various ways before trusting it or recommending
it to anyone else.  I expect you'd need --inplace also, for cases where
the sizes are different and rsync wants to modify the file on the
destination to match the one on the source.
Thanks,
    Stephen

Re: pg_upgrade and rsync

От
David Steele
Дата:
On 1/22/15 8:54 PM, Stephen Frost wrote:
> The problem, as mentioned elsewhere, is that you have to checksum all
> the files because the timestamps will differ.  You can actually get
> around that with rsync if you really want though- tell it to only look
> at file sizes instead of size+time by passing in --size-only.  I have to
> admit that for *my* taste, at least, that's getting pretty darn
> optimistic.  It *should* work, but I'd definitely recommend testing it
> about a billion times in various ways before trusting it or recommending
> it to anyone else.  I expect you'd need --inplace also, for cases where
> the sizes are different and rsync wants to modify the file on the
> destination to match the one on the source.
>
I would definitely not feel comfortable using --size-only.

In addition, there is a possible race condition in rsync where a file
that is modified in the same second after rsync starts to copy will not
be picked up in a subsequent rsync unless --checksum is used.  This is
fairly easy to prove and is shown here:

https://github.com/pgmasters/backrest/blob/dev/test/lib/BackRestTest/BackupTest.pm#L1667

That means the rsync hot, then rsync cold method of updating a standby
is not *guaranteed* to work unless checksums are used.  This may seem
like an edge case, but for a small, active database it looks like it
could be a real issue.

--
- David Steele
david@pgmasters.net



Re: pg_upgrade and rsync

От
Stephen Frost
Дата:
* David Steele (david@pgmasters.net) wrote:
> On 1/22/15 8:54 PM, Stephen Frost wrote:
> > The problem, as mentioned elsewhere, is that you have to checksum all
> > the files because the timestamps will differ.  You can actually get
> > around that with rsync if you really want though- tell it to only look
> > at file sizes instead of size+time by passing in --size-only.  I have to
> > admit that for *my* taste, at least, that's getting pretty darn
> > optimistic.  It *should* work, but I'd definitely recommend testing it
> > about a billion times in various ways before trusting it or recommending
> > it to anyone else.  I expect you'd need --inplace also, for cases where
> > the sizes are different and rsync wants to modify the file on the
> > destination to match the one on the source.
>
> I would definitely not feel comfortable using --size-only.

Yeah, it also occurs to me that if any of the catalog tables end up
being the same size between the master and the replica that they
wouldn't get copied and that'd make for one very interesting result, and
not a good one.

> In addition, there is a possible race condition in rsync where a file
> that is modified in the same second after rsync starts to copy will not
> be picked up in a subsequent rsync unless --checksum is used.  This is
> fairly easy to prove and is shown here:
>
> https://github.com/pgmasters/backrest/blob/dev/test/lib/BackRestTest/BackupTest.pm#L1667

Right, though that isn't really an issue in this specific case- we're
talking about post-pg_upgrade but before the upgraded cluster has
actually been started, so nothing should be modifying these files.

> That means the rsync hot, then rsync cold method of updating a standby
> is not *guaranteed* to work unless checksums are used.  This may seem
> like an edge case, but for a small, active database it looks like it
> could be a real issue.

That's certainly a good point though.
Thanks!
    Stephen

Re: pg_upgrade and rsync

От
David Steele
Дата:
On 1/22/15 10:05 PM, Stephen Frost wrote:<br /><blockquote cite="mid:20150123030509.GJ3854@tamriel.snowman.net"
type="cite"><blockquotetype="cite"><pre wrap="">In addition, there is a possible race condition in rsync where a file 
that is modified in the same second after rsync starts to copy will not
be picked up in a subsequent rsync unless --checksum is used.  This is
fairly easy to prove and is shown here:

<a class="moz-txt-link-freetext"
href="https://github.com/pgmasters/backrest/blob/dev/test/lib/BackRestTest/BackupTest.pm#L1667">https://github.com/pgmasters/backrest/blob/dev/test/lib/BackRestTest/BackupTest.pm#L1667</a>
</pre></blockquote><pre wrap="">
Right, though that isn't really an issue in this specific case- we're
talking about post-pg_upgrade but before the upgraded cluster has
actually been started, so nothing should be modifying these files.
</pre></blockquote><br /> Indeed.  This was really directed more at what Bruce said:<br /><blockquote><pre wrap="">I am
thinkingthe fix for standys would be similar to what we recommand 
for upgrades with link mode using a rsync-created copy, e.g. use rsync
while the master is running to create a copy of the standby, then shut
down the master and run rsync again.  However, at that point, you might
as well just take a base backup and be done with it.
</pre></blockquote><pre class="moz-signature" cols="72">--
- David Steele
<a class="moz-txt-link-abbreviated" href="mailto:david@pgmasters.net">david@pgmasters.net</a></pre>

Re: pg_upgrade and rsync

От
Jim Nasby
Дата:
On 1/22/15 7:54 PM, Stephen Frost wrote:
> * Bruce Momjian (bruce@momjian.us) wrote:
>> >On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:
>>> > >Or do you - as the text edited in your patch, but not the quote above -
>>> > >mean to run pg_upgrade just on the primary and then rsync?
>> >
>> >No, I was going to run it on both, then rsync.
> I'm pretty sure this is all a lot easier than you believe it to be.  If
> you want to recreate what pg_upgrade does to a cluster then the simplest
> thing to do is rsync before removing any of the hard links.  rsync will
> simply recreate the same hard link tree that pg_upgrade created when it
> ran, and update files which were actually changed (the catalog tables).
>
> The problem, as mentioned elsewhere, is that you have to checksum all
> the files because the timestamps will differ.  You can actually get
> around that with rsync if you really want though- tell it to only look
> at file sizes instead of size+time by passing in --size-only.

What if instead of trying to handle that on the rsync side, we changed pg_upgrade so that it created hardlinks that had
thesame timestamp as the original file?
 

That said, the whole timestamp race condition in rsync gives me the heebie-jeebies. For normal workloads maybe it's not
thatbig a deal, but when dealing with fixed-size data (ie: Postgres blocks)? Eww.
 

How horribly difficult would it be to allow pg_upgrade to operate on multiple servers? Could we have it create a shell
scriptinstead of directly modifying things itself? Or perhaps some custom "command file" that could then be replayed by
pg_upgradeon another server? Of course, that's assuming that replicas are compatible enough with masters for that to
work...
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: pg_upgrade and rsync

От
Stephen Frost
Дата:
* Jim Nasby (Jim.Nasby@BlueTreble.com) wrote:
> On 1/22/15 7:54 PM, Stephen Frost wrote:
> >* Bruce Momjian (bruce@momjian.us) wrote:
> >>>On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:
> >>>> >Or do you - as the text edited in your patch, but not the quote above -
> >>>> >mean to run pg_upgrade just on the primary and then rsync?
> >>>
> >>>No, I was going to run it on both, then rsync.
> >I'm pretty sure this is all a lot easier than you believe it to be.  If
> >you want to recreate what pg_upgrade does to a cluster then the simplest
> >thing to do is rsync before removing any of the hard links.  rsync will
> >simply recreate the same hard link tree that pg_upgrade created when it
> >ran, and update files which were actually changed (the catalog tables).
> >
> >The problem, as mentioned elsewhere, is that you have to checksum all
> >the files because the timestamps will differ.  You can actually get
> >around that with rsync if you really want though- tell it to only look
> >at file sizes instead of size+time by passing in --size-only.
>
> What if instead of trying to handle that on the rsync side, we changed pg_upgrade so that it created hardlinks that
hadthe same timestamp as the original file? 

So, two things, I chatted w/ Bruce and he was less concerned about the
lack of being able to match up the timestamps than I was.  He has a
point though- the catalog tables are going to get copied anyway since
they won't be hard links and checking that all the other files match in
size and that both the master and the standby are at the same xlog
position should give you a pretty good feeling that everything matches
up sufficiently.

Second, I don't follow what you mean by having pg_upgrade change the
hardlinks to have the same timestamp- for starters, the timestamp is in
the inode and not the actual hard link (two files hard linked together
won't have different timestamps..) and second, the problem isn't on the
master side- it's on the standby side.  The standby's files will have
timestamps different from the master and there really isn't much to be
done about that.

> That said, the whole timestamp race condition in rsync gives me the heebie-jeebies. For normal workloads maybe it's
notthat big a deal, but when dealing with fixed-size data (ie: Postgres blocks)? Eww. 

The race condition is a problem for pg_start/stop_backup and friends.
In this instance, everything will be shut down when the rsync is
running, so there isn't a timestamp race condition to worry about.

> How horribly difficult would it be to allow pg_upgrade to operate on multiple servers? Could we have it create a
shellscript instead of directly modifying things itself? Or perhaps some custom "command file" that could then be
replayedby pg_upgrade on another server? Of course, that's assuming that replicas are compatible enough with masters
forthat to work... 

Yeah, I had suggested that to Bruce also, but it's not clear why that
would be any different from an rsync --size-only in the end, presuming
everything went according to plan.
Thanks,
    Stephen

Re: pg_upgrade and rsync

От
Andres Freund
Дата:
On 2015-01-22 20:54:47 -0500, Stephen Frost wrote:
> * Bruce Momjian (bruce@momjian.us) wrote:
> > On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:
> > > Or do you - as the text edited in your patch, but not the quote above -
> > > mean to run pg_upgrade just on the primary and then rsync?
> > 
> > No, I was going to run it on both, then rsync.
> 
> I'm pretty sure this is all a lot easier than you believe it to be.  If
> you want to recreate what pg_upgrade does to a cluster then the simplest
> thing to do is rsync before removing any of the hard links.  rsync will
> simply recreate the same hard link tree that pg_upgrade created when it
> ran, and update files which were actually changed (the catalog tables).

I don't understand why that'd be better than simply fixing (yes, that's
imo the correct term) pg_upgrade to retain relfilenodes across the
upgrade. Afaics there's no conflict risk and it'd make the clusters much
more similar, which would be good; independent of rsyncing standbys.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_upgrade and rsync

От
Stephen Frost
Дата:
* Andres Freund (andres@2ndquadrant.com) wrote:
> On 2015-01-22 20:54:47 -0500, Stephen Frost wrote:
> > * Bruce Momjian (bruce@momjian.us) wrote:
> > > On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:
> > > > Or do you - as the text edited in your patch, but not the quote above -
> > > > mean to run pg_upgrade just on the primary and then rsync?
> > >
> > > No, I was going to run it on both, then rsync.
> >
> > I'm pretty sure this is all a lot easier than you believe it to be.  If
> > you want to recreate what pg_upgrade does to a cluster then the simplest
> > thing to do is rsync before removing any of the hard links.  rsync will
> > simply recreate the same hard link tree that pg_upgrade created when it
> > ran, and update files which were actually changed (the catalog tables).
>
> I don't understand why that'd be better than simply fixing (yes, that's
> imo the correct term) pg_upgrade to retain relfilenodes across the
> upgrade. Afaics there's no conflict risk and it'd make the clusters much
> more similar, which would be good; independent of rsyncing standbys.

That's an entirely orthogonal discussion from the original one though,
no?  That wouldn't actually help with what Bruce is trying to do, which
is to duplicate the results of the pg_upgrade from the master over to
the standby.  Even if the relfilenodes were the same across the upgrade,
I don't think it'd be a good idea to run pg_upgrade on the standby and
hope the results match close enough to the master that you can trust
updates to the catalog tables on the standby from the master going
forward to work..

Trying to pg_upgrade both the master and the standby, to me at least,
seems like an even *worse* approach than trusting rsync with -H and
--size-only..
Thanks,
    Stephen

Re: pg_upgrade and rsync

От
Andres Freund
Дата:
On 2015-01-23 13:52:54 -0500, Stephen Frost wrote:
> * Andres Freund (andres@2ndquadrant.com) wrote:
> > On 2015-01-22 20:54:47 -0500, Stephen Frost wrote:
> > > * Bruce Momjian (bruce@momjian.us) wrote:
> > > > On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:
> > > > > Or do you - as the text edited in your patch, but not the quote above -
> > > > > mean to run pg_upgrade just on the primary and then rsync?
> > > > 
> > > > No, I was going to run it on both, then rsync.
> > > 
> > > I'm pretty sure this is all a lot easier than you believe it to be.  If
> > > you want to recreate what pg_upgrade does to a cluster then the simplest
> > > thing to do is rsync before removing any of the hard links.  rsync will
> > > simply recreate the same hard link tree that pg_upgrade created when it
> > > ran, and update files which were actually changed (the catalog tables).
> > 
> > I don't understand why that'd be better than simply fixing (yes, that's
> > imo the correct term) pg_upgrade to retain relfilenodes across the
> > upgrade. Afaics there's no conflict risk and it'd make the clusters much
> > more similar, which would be good; independent of rsyncing standbys.
> 
> That's an entirely orthogonal discussion from the original one though,
> no?

Don't think so.

> That wouldn't actually help with what Bruce is trying to do, which
> is to duplicate the results of the pg_upgrade from the master over to
> the standby.

Well, it'd pretty much obliviate the need to run pg_upgrade on the
standby. As there's no renamed files you don't need to muck around with
leaving hardlinks in place and such just so that rsync recognizes
unchanged files.

> Trying to pg_upgrade both the master and the standby, to me at least,
> seems like an even *worse* approach than trusting rsync with -H and
> --size-only..

I think running pg_upgrade on the standby is a dangerous folly.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_upgrade and rsync

От
Stephen Frost
Дата:
* Andres Freund (andres@2ndquadrant.com) wrote:
> On 2015-01-23 13:52:54 -0500, Stephen Frost wrote:
> > That wouldn't actually help with what Bruce is trying to do, which
> > is to duplicate the results of the pg_upgrade from the master over to
> > the standby.
>
> Well, it'd pretty much obliviate the need to run pg_upgrade on the
> standby. As there's no renamed files you don't need to muck around with
> leaving hardlinks in place and such just so that rsync recognizes
> unchanged files.

Uh, pg_upgrade always either creates a hard link tree or copies
everything over.  If I follow what you're suggesting, pg_upgrade would
need a new 'in-place' mode that removes all of the catalog tables from
the old cluster and puts the new catalog tables into place and leaves
everything else alone.

I don't really think I'd want to go there either..
Thanks,
    Stephen

Re: pg_upgrade and rsync

От
Andres Freund
Дата:
On 2015-01-23 14:05:10 -0500, Stephen Frost wrote:
> * Andres Freund (andres@2ndquadrant.com) wrote:
> > On 2015-01-23 13:52:54 -0500, Stephen Frost wrote:
> > > That wouldn't actually help with what Bruce is trying to do, which
> > > is to duplicate the results of the pg_upgrade from the master over to
> > > the standby.
> > 
> > Well, it'd pretty much obliviate the need to run pg_upgrade on the
> > standby. As there's no renamed files you don't need to muck around with
> > leaving hardlinks in place and such just so that rsync recognizes
> > unchanged files.
> 
> Uh, pg_upgrade always either creates a hard link tree or copies
> everything over.

Yes. The problem is that the filenames after pg_upgrade aren't the same
as before. Which means that a simple rsync call won't be able to save
anything because the standby's filenames differ.  What you can do is
rsync both cluster directories (i.e. the old and the post pg_upgrade
ones) and use rsync -H, right? Without transferring both -H won't detect
the hardlinks as they need to be in the synced set. That's pretty
cumbersome/complicated, and far from cheap.

> If I follow what you're suggesting, pg_upgrade would
> need a new 'in-place' mode that removes all of the catalog tables from
> the old cluster and puts the new catalog tables into place and leaves
> everything else alone.

No. Except that it'd preserve the relfilenodes (i.e. the filenames of
relations) it'd work exactly the same as today. The standby is simply
updated by rsyncing the new data directory of the primary to the
standby.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_upgrade and rsync

От
Stephen Frost
Дата:
* Andres Freund (andres@2ndquadrant.com) wrote:
> On 2015-01-23 14:05:10 -0500, Stephen Frost wrote:
> > * Andres Freund (andres@2ndquadrant.com) wrote:
> > > On 2015-01-23 13:52:54 -0500, Stephen Frost wrote:
> > > > That wouldn't actually help with what Bruce is trying to do, which
> > > > is to duplicate the results of the pg_upgrade from the master over to
> > > > the standby.
> > >
> > > Well, it'd pretty much obliviate the need to run pg_upgrade on the
> > > standby. As there's no renamed files you don't need to muck around with
> > > leaving hardlinks in place and such just so that rsync recognizes
> > > unchanged files.
> >
> > Uh, pg_upgrade always either creates a hard link tree or copies
> > everything over.
>
> Yes. The problem is that the filenames after pg_upgrade aren't the same
> as before. Which means that a simple rsync call won't be able to save
> anything because the standby's filenames differ.  What you can do is
> rsync both cluster directories (i.e. the old and the post pg_upgrade
> ones) and use rsync -H, right? Without transferring both -H won't detect
> the hardlinks as they need to be in the synced set. That's pretty
> cumbersome/complicated, and far from cheap.

The filenames don't need to be the same for rsync -H to work.  You
specifically do *not* want to independently rsync the old and new
clusters- you need to run a single rsync (and one for each tablespace)
with -H and then it'll realize that the old cluster on both systems is
identical and will just recreate the hard links, and copy the completely
new files (the catalog tables).

> > If I follow what you're suggesting, pg_upgrade would
> > need a new 'in-place' mode that removes all of the catalog tables from
> > the old cluster and puts the new catalog tables into place and leaves
> > everything else alone.
>
> No. Except that it'd preserve the relfilenodes (i.e. the filenames of
> relations) it'd work exactly the same as today. The standby is simply
> updated by rsyncing the new data directory of the primary to the
> standby.

You'd have to replace the existing data directory on the master to do
that, which pg_upgrade was designed specifically to not do, in case
things went poorly.  You'd still have to deal with the tablespace
directories being renamed also, since we include the major version and
catalog build in the directory name..

This whole process really isn't all that complicated in the end..

my_data_dir/old_cluster
my_data_dir/new_cluster

pg_upgrade
rsync -H --size-only my_data_dir/ standby:/path/to/my_data_dir
start the clusters
remove the old cluster on the master and standby.
Thanks,
    Stephen

Re: pg_upgrade and rsync

От
Andres Freund
Дата:
On 2015-01-23 14:27:51 -0500, Stephen Frost wrote:
> * Andres Freund (andres@2ndquadrant.com) wrote:
> > On 2015-01-23 14:05:10 -0500, Stephen Frost wrote:
> > > If I follow what you're suggesting, pg_upgrade would
> > > need a new 'in-place' mode that removes all of the catalog tables from
> > > the old cluster and puts the new catalog tables into place and leaves
> > > everything else alone.
> > 
> > No. Except that it'd preserve the relfilenodes (i.e. the filenames of
> > relations) it'd work exactly the same as today. The standby is simply
> > updated by rsyncing the new data directory of the primary to the
> > standby.
> 
> You'd have to replace the existing data directory on the master to do
> that, which pg_upgrade was designed specifically to not do, in case
> things went poorly.

Why? Just rsync the new data directory onto the old directory on the
standbys. That's fine and simple.

> You'd still have to deal with the tablespace directories being renamed
> also, since we include the major version and catalog build in the
> directory name..

True.


Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_upgrade and rsync

От
Stephen Frost
Дата:
* Andres Freund (andres@2ndquadrant.com) wrote:
> On 2015-01-23 14:27:51 -0500, Stephen Frost wrote:
> > * Andres Freund (andres@2ndquadrant.com) wrote:
> > > On 2015-01-23 14:05:10 -0500, Stephen Frost wrote:
> > > > If I follow what you're suggesting, pg_upgrade would
> > > > need a new 'in-place' mode that removes all of the catalog tables from
> > > > the old cluster and puts the new catalog tables into place and leaves
> > > > everything else alone.
> > >
> > > No. Except that it'd preserve the relfilenodes (i.e. the filenames of
> > > relations) it'd work exactly the same as today. The standby is simply
> > > updated by rsyncing the new data directory of the primary to the
> > > standby.
> >
> > You'd have to replace the existing data directory on the master to do
> > that, which pg_upgrade was designed specifically to not do, in case
> > things went poorly.
>
> Why? Just rsync the new data directory onto the old directory on the
> standbys. That's fine and simple.

That still doesn't address the need to use --size-only, it would just
mean that you don't need to use -H.  If anything the -H part is the
aspect which worries me the least about this approach.
Thanks,
    Stephen

Re: pg_upgrade and rsync

От
Bruce Momjian
Дата:
On Fri, Jan 23, 2015 at 02:34:36PM -0500, Stephen Frost wrote:
> > Why? Just rsync the new data directory onto the old directory on the
> > standbys. That's fine and simple.
> 
> That still doesn't address the need to use --size-only, it would just
> mean that you don't need to use -H.  If anything the -H part is the
> aspect which worries me the least about this approach.

It took me a while to understand what Stephen was saying, so let me
explain the details so everyone can get on the same page.

First, let's look at the downsides of using non-hardlink rsync against a
slave cluster, whether we run pg_upgrade on the slave or not:
o  must preserve db directory and relfilenodes (4 new things for   pg_upgrade to preserve)o  must checksum files
becausethere is no way to distinguish   user tables/indexes (which don't need to be copied) from system
tables/indexes(which must be copied so it is in sync with   the master)o  must use log_wal_hints when the slave is
installedso the   checksums match
 

So, even if if all the checksums work, it will be slow/expensive.

Stephen's idea is quite interesting.  You run pg_upgrade on the master,
then, before you start the new server, you run rsync with special flags
and sync the old _and_ new clusters on the master with just the old
cluster on the standby (yeah, odd).  Yes, this is very odd, and where I
got lost too.

First, this only works when pg_upgrade is run in --link mode.  What
rsync --hard-links --size-only is going to do is record which files have
hard links, remember their inode numbers, and cross-reference the
hard-linked files.  When doing the rsync remote comparisons, the
master's old relfilenodes will match the standby's old relfilenodes, and
because we are using --size-only, they will be considered identical and
not copied, or even checksumed.  When it goes to do the standby's new
cluster, none of the directories will exist, so they will all be copied
along with the system objects (they are small), but the user
tables/indexes will be identified as already existing in the slave's old
cluster so it will hard-link to those standby's old cluster files. Once
rsync is complete, you can delete the old cluster on master and standby.
This is effectively duplicating the way pg_upgrade works.

What is interesting is that this will work on any version of pg_upgrade,
with no modifications, as long as link mode is used.  You _cannot_ run
initdb on the standby, as this will create system files that would
prevent the master's system files from being copied.  This is also going
to remove your recovery.conf on the standby, and replace your
postgresql.conf with the master's, so any modifications you made to the
standby will have to be saved and restored in to the new cluster before
starting.

I plan to run some tests soon to verify this method works, and if so, I
can document it in the pg_upgrade manual page.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: pg_upgrade and rsync

От
Bruce Momjian
Дата:
On Fri, Jan 23, 2015 at 02:34:36PM -0500, Stephen Frost wrote:
> > > You'd have to replace the existing data directory on the master to do
> > > that, which pg_upgrade was designed specifically to not do, in case
> > > things went poorly.
> >
> > Why? Just rsync the new data directory onto the old directory on the
> > standbys. That's fine and simple.
>
> That still doesn't address the need to use --size-only, it would just
> mean that you don't need to use -H.  If anything the -H part is the
> aspect which worries me the least about this approach.

I can now confirm that it works, just as Stephen said.  I was able to
upgrade a standby cluster that contained the regression database, and
the pg_dump output was perfect.

I am attaching doc instruction that I will add to all branches as soon
as someone else confirms my results.  You will need to use rsync
--itemize-changes to see the hard links being created, e.g.:

   hf+++++++++ pgsql/data/base/16415/28188 => pgsql.old/data/base/16384/28188

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Вложения

Re: pg_upgrade and rsync

От
Jim Nasby
Дата:
On 1/23/15 12:40 PM, Stephen Frost wrote:
>> >That said, the whole timestamp race condition in rsync gives me the heebie-jeebies. For normal workloads maybe it's
notthat big a deal, but when dealing with fixed-size data (ie: Postgres blocks)? Eww.
 
> The race condition is a problem for pg_start/stop_backup and friends.
> In this instance, everything will be shut down when the rsync is
> running, so there isn't a timestamp race condition to worry about.

Yeah, I'm more concerned about people that use rsync to take base backups. Do we need to explicitly advise against
that?Is there a way to work around this with a sleep after pg_start_backup to make sure all timestamps must be
different?(Admittedly I haven't fully wrapped my head around this yet.)
 

>> >How horribly difficult would it be to allow pg_upgrade to operate on multiple servers? Could we have it create a
shellscript instead of directly modifying things itself? Or perhaps some custom "command file" that could then be
replayedby pg_upgrade on another server? Of course, that's assuming that replicas are compatible enough with masters
forthat to work...
 
> Yeah, I had suggested that to Bruce also, but it's not clear why that
> would be any different from an rsync --size-only in the end, presuming
> everything went according to plan.

Yeah, if everything is shut down maybe we're OK.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: pg_upgrade and rsync

От
Stephen Frost
Дата:
Jim,

* Jim Nasby (Jim.Nasby@BlueTreble.com) wrote:
> On 1/23/15 12:40 PM, Stephen Frost wrote:
> >>>That said, the whole timestamp race condition in rsync gives me the heebie-jeebies. For normal workloads maybe
it'snot that big a deal, but when dealing with fixed-size data (ie: Postgres blocks)? Eww. 
> >The race condition is a problem for pg_start/stop_backup and friends.
> >In this instance, everything will be shut down when the rsync is
> >running, so there isn't a timestamp race condition to worry about.
>
> Yeah, I'm more concerned about people that use rsync to take base backups. Do we need to explicitly advise against
that?Is there a way to work around this with a sleep after pg_start_backup to make sure all timestamps must be
different?(Admittedly I haven't fully wrapped my head around this yet.) 

I've thought about it a fair bit actually and I agree that there is some
risk to using rsync for *incremental* base backups.  That is, you have
a setup where you loop with:

pg_start_backup
rsync -> dest
pg_stop_backup

without using -I, changing what 'dest' is, or making sure it's empty
every time.  The problem is the 1s-level granularity used on the
timestamp.  A possible set of operations, all within 1s, is:

file changed
rsync starts copying the file
file changed again (somewhere prior to where rsync is at)
rsync finishes the file copy

Now, this isn't actually a problem for the first time that file is
backed up- the issue is if that file isn't changed again.  rsync won't
re-copy it, but that change that rsync missed won't be in the WAL
history for the *second* backup that's done (only the first), leading to
a case where that file would end up corrupted.

This is a pretty darn narrow situation and one that I doubt many people
will hit, but I do think it's possible.

A way to address this would be to grab all timestamps for all files
at the start of the backup and re-copy any files whose times are changed
after that point (or which were being changed at the time the check was
done, or perhaps simply any file which has a timestamp after the
starting timestamp of the backup).

> >>>How horribly difficult would it be to allow pg_upgrade to operate on multiple servers? Could we have it create a
shellscript instead of directly modifying things itself? Or perhaps some custom "command file" that could then be
replayedby pg_upgrade on another server? Of course, that's assuming that replicas are compatible enough with masters
forthat to work... 
> >Yeah, I had suggested that to Bruce also, but it's not clear why that
> >would be any different from an rsync --size-only in the end, presuming
> >everything went according to plan.
>
> Yeah, if everything is shut down maybe we're OK.

Regarding this, yes, I think it 'should' work, but it would definitely
be good to test it quite a bit before relying on it..
Thanks,
    Stephen

Re: pg_upgrade and rsync

От
David Steele
Дата:
On 1/26/15 5:11 PM, Jim Nasby wrote:
>> The race condition is a problem for pg_start/stop_backup and friends.
>> In this instance, everything will be shut down when the rsync is
>> running, so there isn't a timestamp race condition to worry about.
>
> Yeah, I'm more concerned about people that use rsync to take base
> backups. Do we need to explicitly advise against that? Is there a way
> to work around this with a sleep after pg_start_backup to make sure
> all timestamps must be different? (Admittedly I haven't fully wrapped
> my head around this yet.)
A sleep in pg_start_backup() won't work.  The race condition is in rsync
if the file is modified in the same second after it is copied.  Waiting
until the beginning of the next second in pg_start_backup() would
actually make a bigger window where the issue can occur.

I solved this problem in PgBackRest (an alternative to barman, etc.) by
waiting the remainder of the second after the manifest is built before
copying.  That way, if a file is modified in the second after the
manifest is built that later version will still be copied.  Any mods
after that will be copied in the next backup (as they should be).
PgBackRest does not use rsync, tar, etc.) so I was able to code around
the issue.

The interesting thing about this race condition is that it does not
affect the backup where it occurs.  It affects the next backup when the
modified file does not get copied because the timestamp is the same as
the previous backup.  Of course using checksums will solve the problem
in rsync but that's expensive.

Thus my comment earlier that the hot rsync / cold rsync method is not
absolutely safe.  If you do checksums on the cold rsync then you might
as well just use them the first time - you'll have the same downtime
either way.

I've written tests to show the rsync vulnerability and another to show
that this can affect a running database.  However, to reproduce it
reliably you need to force a checkpoint or have them happening pretty
close together.

--
- David Steele
david@pgmasters.ne



Re: pg_upgrade and rsync

От
Jim Nasby
Дата:
On 1/26/15 5:08 PM, David Steele wrote:
> I've written tests to show the rsync vulnerability and another to show
> that this can affect a running database.  However, to reproduce it
> reliably you need to force a checkpoint or have them happening pretty
> close together.

Related to this and Stephen's comment about testing... ISTM it would be very useful to have a published suite of tests
forPITR backups, perhaps even utilizing special techniques in Postgres to expose potential failure conditions.
Similarly,it'd also be nice to have a suite of tests you could run to validate a backup that you've restored.
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: pg_upgrade and rsync

От
Robert Haas
Дата:
On Fri, Jan 23, 2015 at 1:48 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2015-01-22 20:54:47 -0500, Stephen Frost wrote:
>> * Bruce Momjian (bruce@momjian.us) wrote:
>> > On Fri, Jan 23, 2015 at 01:19:33AM +0100, Andres Freund wrote:
>> > > Or do you - as the text edited in your patch, but not the quote above -
>> > > mean to run pg_upgrade just on the primary and then rsync?
>> >
>> > No, I was going to run it on both, then rsync.
>>
>> I'm pretty sure this is all a lot easier than you believe it to be.  If
>> you want to recreate what pg_upgrade does to a cluster then the simplest
>> thing to do is rsync before removing any of the hard links.  rsync will
>> simply recreate the same hard link tree that pg_upgrade created when it
>> ran, and update files which were actually changed (the catalog tables).
>
> I don't understand why that'd be better than simply fixing (yes, that's
> imo the correct term) pg_upgrade to retain relfilenodes across the
> upgrade. Afaics there's no conflict risk and it'd make the clusters much
> more similar, which would be good; independent of rsyncing standbys.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pg_upgrade and rsync

От
Robert Haas
Дата:
On Sat, Jan 24, 2015 at 10:04 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Fri, Jan 23, 2015 at 02:34:36PM -0500, Stephen Frost wrote:
>> > > You'd have to replace the existing data directory on the master to do
>> > > that, which pg_upgrade was designed specifically to not do, in case
>> > > things went poorly.
>> >
>> > Why? Just rsync the new data directory onto the old directory on the
>> > standbys. That's fine and simple.
>>
>> That still doesn't address the need to use --size-only, it would just
>> mean that you don't need to use -H.  If anything the -H part is the
>> aspect which worries me the least about this approach.
>
> I can now confirm that it works, just as Stephen said.  I was able to
> upgrade a standby cluster that contained the regression database, and
> the pg_dump output was perfect.
>
> I am attaching doc instruction that I will add to all branches as soon
> as someone else confirms my results.  You will need to use rsync
> --itemize-changes to see the hard links being created, e.g.:
>
>    hf+++++++++ pgsql/data/base/16415/28188 => pgsql.old/data/base/16384/28188

My rsync manual page (on two different systems) mentions nothing about
remote_dir, so I'd be quite unable to follow your proposed directions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pg_upgrade and rsync

От
Stephen Frost
Дата:
* Robert Haas (robertmhaas@gmail.com) wrote:
> On Sat, Jan 24, 2015 at 10:04 PM, Bruce Momjian <bruce@momjian.us> wrote:
> > On Fri, Jan 23, 2015 at 02:34:36PM -0500, Stephen Frost wrote:
> >> > > You'd have to replace the existing data directory on the master to do
> >> > > that, which pg_upgrade was designed specifically to not do, in case
> >> > > things went poorly.
> >> >
> >> > Why? Just rsync the new data directory onto the old directory on the
> >> > standbys. That's fine and simple.
> >>
> >> That still doesn't address the need to use --size-only, it would just
> >> mean that you don't need to use -H.  If anything the -H part is the
> >> aspect which worries me the least about this approach.
> >
> > I can now confirm that it works, just as Stephen said.  I was able to
> > upgrade a standby cluster that contained the regression database, and
> > the pg_dump output was perfect.
> >
> > I am attaching doc instruction that I will add to all branches as soon
> > as someone else confirms my results.  You will need to use rsync
> > --itemize-changes to see the hard links being created, e.g.:
> >
> >    hf+++++++++ pgsql/data/base/16415/28188 => pgsql.old/data/base/16384/28188
>
> My rsync manual page (on two different systems) mentions nothing about
> remote_dir, so I'd be quite unable to follow your proposed directions.

The example listed works, but only when it's a local rsync:

rsync --archive --hard-links --size-only old_dir new_dir remote_dir

Perhaps a better example (or additional one) would be with a remote
rsync, including clarification of old and new dir, like so:

(run in /var/lib/postgresql)
rsync --archive --hard-links --size-only \ 9.3/main \ 9.4/main \ server:/var/lib/postgresql/

Note that 9.3/main and 9.4/main are two source directories for rsync to
copy over, while server:/var/lib/postgresql/ is a remote destination
directory.  The above directories match a default Debian/Ubuntu install.
Thanks!
    Stephen

Re: pg_upgrade and rsync

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Jan 23, 2015 at 1:48 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> I don't understand why that'd be better than simply fixing (yes, that's
>> imo the correct term) pg_upgrade to retain relfilenodes across the
>> upgrade. Afaics there's no conflict risk and it'd make the clusters much
>> more similar, which would be good; independent of rsyncing standbys.

> +1.

That's certainly impossible for the system catalogs, which means you
have to be able to deal with relfilenode discrepancies for them, which
means that maintaining the same relfilenodes for user tables is of
dubious value.
        regards, tom lane



Re: pg_upgrade and rsync

От
Robert Haas
Дата:
On Tue, Jan 27, 2015 at 9:36 AM, Stephen Frost <sfrost@snowman.net> wrote:
> The example listed works, but only when it's a local rsync:
>
> rsync --archive --hard-links --size-only old_dir new_dir remote_dir
>
> Perhaps a better example (or additional one) would be with a remote
> rsync, including clarification of old and new dir, like so:
>
> (run in /var/lib/postgresql)
> rsync --archive --hard-links --size-only \
>   9.3/main \
>   9.4/main \
>   server:/var/lib/postgresql/
>
> Note that 9.3/main and 9.4/main are two source directories for rsync to
> copy over, while server:/var/lib/postgresql/ is a remote destination
> directory.  The above directories match a default Debian/Ubuntu install.

My point is that Bruce's patch suggests looking for "remote_dir" in
the rsync documentation, but no such term appears there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pg_upgrade and rsync

От
Robert Haas
Дата:
On Tue, Jan 27, 2015 at 9:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Jan 23, 2015 at 1:48 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>>> I don't understand why that'd be better than simply fixing (yes, that's
>>> imo the correct term) pg_upgrade to retain relfilenodes across the
>>> upgrade. Afaics there's no conflict risk and it'd make the clusters much
>>> more similar, which would be good; independent of rsyncing standbys.
>
>> +1.
>
> That's certainly impossible for the system catalogs, which means you
> have to be able to deal with relfilenode discrepancies for them, which
> means that maintaining the same relfilenodes for user tables is of
> dubious value.

Why is that impossible for the system catalogs?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pg_upgrade and rsync

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Jan 27, 2015 at 9:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> That's certainly impossible for the system catalogs, which means you
>> have to be able to deal with relfilenode discrepancies for them, which
>> means that maintaining the same relfilenodes for user tables is of
>> dubious value.

> Why is that impossible for the system catalogs?

New versions aren't guaranteed to have the same system catalogs, let alone
the same relfilenodes for them.
        regards, tom lane



Re: pg_upgrade and rsync

От
Stephen Frost
Дата:
* Robert Haas (robertmhaas@gmail.com) wrote:
> On Tue, Jan 27, 2015 at 9:36 AM, Stephen Frost <sfrost@snowman.net> wrote:
> > The example listed works, but only when it's a local rsync:
> >
> > rsync --archive --hard-links --size-only old_dir new_dir remote_dir
> >
> > Perhaps a better example (or additional one) would be with a remote
> > rsync, including clarification of old and new dir, like so:
> >
> > (run in /var/lib/postgresql)
> > rsync --archive --hard-links --size-only \
> >   9.3/main \
> >   9.4/main \
> >   server:/var/lib/postgresql/
> >
> > Note that 9.3/main and 9.4/main are two source directories for rsync to
> > copy over, while server:/var/lib/postgresql/ is a remote destination
> > directory.  The above directories match a default Debian/Ubuntu install.
>
> My point is that Bruce's patch suggests looking for "remote_dir" in
> the rsync documentation, but no such term appears there.

Ah, well, perhaps we could simply add a bit of clarification to this:

for details on specifying <option>remote_dir</>

like so:

for details on specifying the destination <option>remote_dir</>

?

On my system, the rsync man page has '[DEST]' in the synopsis, but it
doesn't actually go on to specifically define what 'DEST' is, rather
referring to it later as 'destination' or 'remote directory'.

I'm sure other suggestions would be welcome if they'd help clarify.
Thanks,
    Stephen

Re: pg_upgrade and rsync

От
Andres Freund
Дата:
On 2015-01-27 10:20:48 -0500, Robert Haas wrote:
> On Tue, Jan 27, 2015 at 9:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> >> On Fri, Jan 23, 2015 at 1:48 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> >>> I don't understand why that'd be better than simply fixing (yes, that's
> >>> imo the correct term) pg_upgrade to retain relfilenodes across the
> >>> upgrade. Afaics there's no conflict risk and it'd make the clusters much
> >>> more similar, which would be good; independent of rsyncing standbys.
> >
> >> +1.
> >
> > That's certainly impossible for the system catalogs, which means you
> > have to be able to deal with relfilenode discrepancies for them, which
> > means that maintaining the same relfilenodes for user tables is of
> > dubious value.
> 
> Why is that impossible for the system catalogs?

Maybe it's not impossible for existing catalogs, but it's certainly
complicated. But I don't think it's all that desirable anyway - they're
not the same relation after the pg_upgrade anyway (initdb/pg_dump
filled them). That's different for the user defined relations.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: pg_upgrade and rsync

От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Tue, Jan 27, 2015 at 9:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> That's certainly impossible for the system catalogs, which means you
> >> have to be able to deal with relfilenode discrepancies for them, which
> >> means that maintaining the same relfilenodes for user tables is of
> >> dubious value.
>
> > Why is that impossible for the system catalogs?
>
> New versions aren't guaranteed to have the same system catalogs, let alone
> the same relfilenodes for them.

Indeed, new versions almost certainly have wholly new system catalogs.

While there might be a reason to keep the relfilenodes the same, it
doesn't actually help with the pg_upgrade use-case we're currently
discussing (at least, not without additional help).  The problem is that
we certainly must transfer all the new catalogs, but how would rsync
know that those catalog files have to be transferred but not the user
relations?  Using --size-only would mean that system catalogs whose
sizes happen to match after the upgrade wouldn't be transferred and that
would certainly lead to a corrupt situation.

Andres proposed a helper script which would go through the entire tree
on the remote side and set all the timestamps on the remote side to
match those on the local side (prior to the pg_upgrade).  If all the
relfilenodes remained the same and the timestamps on the catalog tables
all changed then it might work to do (without using --size-only):

stop-cluster
set-timestamp-script
pg_upgrade
rsync new_data_dir -> remote:existing_cluster

This would mean that any other files which happened to be changed by
pg_upgrade beyond the catalog tables would also get copied across.  The
issue that I see with that is that if the pg_upgrade process does touch
anything outside of the system catalogs, then its documented revert
mechanism (rename the control file and start the old cluster back up,
prior to having started the new cluster) wouldn't be valid.  Requiring
an extra script which runs around changing timestamps on files is a bit
awkward too, though I suppose possible, and then we'd also have to
document that this process only works with $version of pg_upgrade that
does the preservation of the relfilenodes.

I suppose there's also technically a race condition to consider, if the
whole thing is scripted and pg_upgrade manages to change an existing
file in the same second that the old cluster did then that file wouldn't
be recognized by the rsync as having been updated.  That's not too hard
to address though- just wait a second somewhere in there.  Still, I'm
not really sure that this approach really gains us much over the
approach that Bruce is proposing.
Thanks,
    Stephen

Re: pg_upgrade and rsync

От
Jim Nasby
Дата:
On 1/27/15 9:29 AM, Stephen Frost wrote:
>> My point is that Bruce's patch suggests looking for "remote_dir" in
>> >the rsync documentation, but no such term appears there.
> Ah, well, perhaps we could simply add a bit of clarification to this:
>
> for details on specifying <option>remote_dir</>

The whole remote_dir discussion made me think of something... would --link-dest be any help here?
       --link-dest=DIR              This option behaves like --copy-dest, but unchanged files are hard linked from DIR
tothe des-              tination  directory.   The  files  must be identical in all preserved attributes (e.g. permis-
           sions, possibly ownership) in order for the files to be linked together.  An example:
 
                rsync -av --link-dest=$PWD/prior_dir host:src_dir/ new_dir/

-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: pg_upgrade and rsync

От
David Steele
Дата:
On 1/27/15 6:09 PM, Jim Nasby wrote:
> The whole remote_dir discussion made me think of something... would
> --link-dest be any help here?

I'm pretty sure --link-dest would not be effective in this case.  The
problem exists on the source side and --link-dest only operates on the
destination.

--
- David Steele
david@pgmasters.net



Re: pg_upgrade and rsync

От
Bruce Momjian
Дата:
On Mon, Jan 26, 2015 at 05:41:59PM -0500, Stephen Frost wrote:
> I've thought about it a fair bit actually and I agree that there is some
> risk to using rsync for *incremental* base backups.  That is, you have
> a setup where you loop with:
> 
> pg_start_backup
> rsync -> dest
> pg_stop_backup
> 
> without using -I, changing what 'dest' is, or making sure it's empty
> every time.  The problem is the 1s-level granularity used on the
> timestamp.  A possible set of operations, all within 1s, is:
> 
> file changed
> rsync starts copying the file
> file changed again (somewhere prior to where rsync is at)
> rsync finishes the file copy
> 
> Now, this isn't actually a problem for the first time that file is
> backed up- the issue is if that file isn't changed again.  rsync won't
> re-copy it, but that change that rsync missed won't be in the WAL
> history for the *second* backup that's done (only the first), leading to
> a case where that file would end up corrupted.

Interesting problem, but doesn't rsync use sub-second accuracy?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: pg_upgrade and rsync

От
Bruce Momjian
Дата:
On Tue, Jan 27, 2015 at 09:36:58AM -0500, Stephen Frost wrote:
> The example listed works, but only when it's a local rsync:
>
> rsync --archive --hard-links --size-only old_dir new_dir remote_dir
>
> Perhaps a better example (or additional one) would be with a remote
> rsync, including clarification of old and new dir, like so:
>
> (run in /var/lib/postgresql)
> rsync --archive --hard-links --size-only \
>   9.3/main \
>   9.4/main \
>   server:/var/lib/postgresql/
>
> Note that 9.3/main and 9.4/main are two source directories for rsync to
> copy over, while server:/var/lib/postgresql/ is a remote destination
> directory.  The above directories match a default Debian/Ubuntu install.

OK, sorry everyone was confused by 'remote_dir'.  Does this new patch
help?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Вложения

Re: pg_upgrade and rsync

От
David Steele
Дата:
On 1/27/15 9:32 PM, Bruce Momjian wrote
> Now, this isn't actually a problem for the first time that file is
> backed up- the issue is if that file isn't changed again.  rsync won't
> re-copy it, but that change that rsync missed won't be in the WAL
> history for the *second* backup that's done (only the first), leading to
> a case where that file would end up corrupted.
> Interesting problem, but doesn't rsync use sub-second accuracy?
>
According to my empirical testing on Linux and OSX the answer is no:
rsync does not use sub-second accuracy.  This seems to be true even on
file systems like ext4 that support millisecond mod times, at least it
was true on Ubuntu 12.04 running ext4.

Even on my laptop there is a full half-second of vulnerability for
rsync.  Faster systems may have a larger window.

--
- David Steele
david@pgmasters.net



Re: pg_upgrade and rsync

От
Bruce Momjian
Дата:
On Tue, Jan 27, 2015 at 09:44:51PM -0500, David Steele wrote:
> On 1/27/15 9:32 PM, Bruce Momjian wrote
> > Now, this isn't actually a problem for the first time that file is
> > backed up- the issue is if that file isn't changed again.  rsync won't
> > re-copy it, but that change that rsync missed won't be in the WAL
> > history for the *second* backup that's done (only the first), leading to
> > a case where that file would end up corrupted.
> > Interesting problem, but doesn't rsync use sub-second accuracy?
> >
> According to my empirical testing on Linux and OSX the answer is no:
> rsync does not use sub-second accuracy.  This seems to be true even on
> file systems like ext4 that support millisecond mod times, at least it
> was true on Ubuntu 12.04 running ext4.
> 
> Even on my laptop there is a full half-second of vulnerability for
> rsync.  Faster systems may have a larger window.

OK, bummer.  Well, I don't think we ever recommend to run rsync without
checksums, but the big problem is that rsync doesn't do checksums by
default.  :-(

pg_upgrade recommends using two rsyncs:
  To make a valid copy of the old cluster, use <command>rsync</> to create  a dirty copy of the old cluster while the
serveris running, then shut  down the old server and run <command>rsync</> again to update the copy  with any changes
tomake it consistent.  You might want to exclude some
 

I am afraid that will not work as it could miss changes, right?  When
would the default mod-time checking every be safe?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: pg_upgrade and rsync

От
David Steele
Дата:
On 1/27/15 9:51 PM, Bruce Momjian wrote:
>> According to my empirical testing on Linux and OSX the answer is no:
>> rsync does not use sub-second accuracy.  This seems to be true even on
>> file systems like ext4 that support millisecond mod times, at least it
>> was true on Ubuntu 12.04 running ext4.
>>
>> Even on my laptop there is a full half-second of vulnerability for
>> rsync.  Faster systems may have a larger window.
> OK, bummer.  Well, I don't think we ever recommend to run rsync without
> checksums, but the big problem is that rsync doesn't do checksums by
> default.  :-(
>
> pg_upgrade recommends using two rsyncs:
>
>    To make a valid copy of the old cluster, use <command>rsync</> to create
>    a dirty copy of the old cluster while the server is running, then shut
>    down the old server and run <command>rsync</> again to update the copy
>    with any changes to make it consistent.  You might want to exclude some
>
> I am afraid that will not work as it could miss changes, right?  When
> would the default mod-time checking every be safe?
>
According to my testing the default mod-time checking is never
completely safe in rsync.  I've worked around this in PgBackRest by
building the manifest and then waiting until the start of the next
second before starting to copy.  It was the only way I could make the
incremental backups reliable without requiring checksums (which are
optional as in rsync for performance).  Of course, you still have to
trust the clock for this to work.

This is definitely an edge case.  Not only does the file have to be
modified in the same second *after* rsync has done the copy, but the
file also has to not be modified in *any other subsequent second* before
the next incremental backup.  If the file is busy enough to have a
collision with rsync in that second, then it is very likely to be
modified before the next incremental backup which is generally a day or
so later.  And, of course, the backup where the issue occurs is fine -
it's the next backup that is invalid.

However, the hot/cold backup scheme as documented does make the race
condition more likely since the two backups are done in close proximity
temporally.  Ultimately, the most reliable method is to use checksums.

For me the biggest issue is that there is no way to discover if a db in
consistent no matter how much time/resources you are willing to spend.
I could live with the idea of the occasional bad backup (since I keep as
many as possible), but having no way to know whether it is good or not
is very frustrating.  I know data checksums are a step in that
direction, but they are a long way from providing the optimal solution.
I've implemented rigorous checksums in PgBackRest but something closer
to the source would be even better.

--
- David Steele
david@pgmasters.net



Re: pg_upgrade and rsync

От
Stephen Frost
Дата:
* Jim Nasby (Jim.Nasby@BlueTreble.com) wrote:
> On 1/27/15 9:29 AM, Stephen Frost wrote:
> >>My point is that Bruce's patch suggests looking for "remote_dir" in
> >>>the rsync documentation, but no such term appears there.
> >Ah, well, perhaps we could simply add a bit of clarification to this:
> >
> >for details on specifying <option>remote_dir</>
>
> The whole remote_dir discussion made me think of something... would --link-dest be any help here?

No.
Thanks,
    Stephen

Re: pg_upgrade and rsync

От
Stephen Frost
Дата:
* Bruce Momjian (bruce@momjian.us) wrote:
> Interesting problem, but doesn't rsync use sub-second accuracy?

No.  Simple test will show:

touch xx/aa ; rsync -avv xx yy ; sleep 0.5 ; touch xx/aa ; rsync -avv xx yy

Run that a few times and you'll see it report "xx/aa is uptodate"
sometimes, depending on when exactly where the sleep falls during the
second.
Thanks,
    Stephen

Re: pg_upgrade and rsync

От
Stephen Frost
Дата:
Bruce,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Tue, Jan 27, 2015 at 09:36:58AM -0500, Stephen Frost wrote:
> > The example listed works, but only when it's a local rsync:
> >
> > rsync --archive --hard-links --size-only old_dir new_dir remote_dir
> >
> > Perhaps a better example (or additional one) would be with a remote
> > rsync, including clarification of old and new dir, like so:
> >
> > (run in /var/lib/postgresql)
> > rsync --archive --hard-links --size-only \
> >   9.3/main \
> >   9.4/main \
> >   server:/var/lib/postgresql/
> >
> > Note that 9.3/main and 9.4/main are two source directories for rsync to
> > copy over, while server:/var/lib/postgresql/ is a remote destination
> > directory.  The above directories match a default Debian/Ubuntu install.
>
> OK, sorry everyone was confused by 'remote_dir'.  Does this new patch
> help?

Looks better, but --links is not the same as --hard-links.  The example
is right, the but documentation below it mentions "<option>--link</>"
which is for symlinks, not hard links.

This also should really include a discussion about dealing with
tablespaces, since the example command won't deal with them.
Thanks,
    Stephen

Re: pg_upgrade and rsync

От
Josh Berkus
Дата:
Bruce, Stephen, etc.:

So, I did a test trial of this and it seems like it didn't solve the
issue of huge rsyncs.

That is, the only reason to do this whole business via rsync, instead of
doing a new basebackup of each replica, is to cut down on data transfer
time by not resyncing the data from the old base directory.  But in
practice, the majority of the database files seem like they get
transmitted anyway.  Maybe I'm misreading the rsync ouput?

Here's the setup:

3 Ubuntu 14.04 servers on AWS (tiny instance)
Running PostgreSQL 9.3.5
Set up in cascading replication

108 --> 107 --> 109

The goal was to test this with cascading, but I didn't get that far.

I set up a pgbench workload, read-write on the master and read-only on
the two replicas, to simulate a load-balanced workload.  I was *not*
logging hint bits.

I then followed this sequence:

1) Install 9.4 packages on all servers.
2) Shut down the master.
3) pg_upgrade the master using --link
4) shut down replica 107
5) rsync the master's $PGDATA from the replica:

rsync -aHv --size-only -e ssh --itemize-changes
172.31.4.108:/var/lib/postgresql/ /var/lib/postgresql/

... and got:

.d..t...... 9.4/main/pg_xlog/
>f+++++++++ 9.4/main/pg_xlog/0000000700000001000000CB
.d..t...... 9.4/main/pg_xlog/archive_status/

sent 126892 bytes  received 408645000 bytes  7640596.11 bytes/sec
total size is 671135675  speedup is 1.64

So that's 390MB of data transfer.

If I look at the original directory:

postgres@paul: du --max-depth=1 -h
4.0K    ./.cache
20K     ./.ssh
424M    ./9.3
4.0K    ./.emacs.d
51M     ./9.4
56K     ./bench
474M    .

So 390MB were transferred out of a possible 474MB.  That certainly seems
like we're still transferring the majority of the data, even though I
verified that the hard links are being sent as hard links.  No?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: pg_upgrade and rsync

От
Josh Berkus
Дата:
On 01/28/2015 02:10 PM, Josh Berkus wrote:
> So 390MB were transferred out of a possible 474MB.  That certainly seems
> like we're still transferring the majority of the data, even though I
> verified that the hard links are being sent as hard links.  No?

Looks like the majority of that was pg_xlog.  Going to tear this down
and start over, and --exclude pg_xlog.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: pg_upgrade and rsync

От
Josh Berkus
Дата:
On 01/28/2015 02:28 PM, Josh Berkus wrote:
> On 01/28/2015 02:10 PM, Josh Berkus wrote:
>> So 390MB were transferred out of a possible 474MB.  That certainly seems
>> like we're still transferring the majority of the data, even though I
>> verified that the hard links are being sent as hard links.  No?
> 
> Looks like the majority of that was pg_xlog.  Going to tear this down
> and start over, and --exclude pg_xlog.
> 

So, having redone this without the pg_xlog lag, this appears to work in
terms of cutting down the rsync volume.

I'm concerned about putting this in the main docs, though.  This is a
complex, and fragile procedure, which is very easy to get wrong, and
hard to explain for a generic case.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: pg_upgrade and rsync

От
Stephen Frost
Дата:
* Josh Berkus (josh@agliodbs.com) wrote:
> On 01/28/2015 02:28 PM, Josh Berkus wrote:
> > On 01/28/2015 02:10 PM, Josh Berkus wrote:
> >> So 390MB were transferred out of a possible 474MB.  That certainly seems
> >> like we're still transferring the majority of the data, even though I
> >> verified that the hard links are being sent as hard links.  No?
> >
> > Looks like the majority of that was pg_xlog.  Going to tear this down
> > and start over, and --exclude pg_xlog.
> >
>
> So, having redone this without the pg_xlog lag, this appears to work in
> terms of cutting down the rsync volume.
>
> I'm concerned about putting this in the main docs, though.  This is a
> complex, and fragile procedure, which is very easy to get wrong, and
> hard to explain for a generic case.

So, for my 2c, I'm on the fence about it.  On the one hand, I agree,
it's a bit of a complex process to get right.  On the other hand, it's
far better if we put something out there along the lines of "if you
really want to, this is how to do it" than having folks try to fumble
through to find the correct steps themselves.
Thanks,
    Stephen

Re: pg_upgrade and rsync

От
Josh Berkus
Дата:
> So, for my 2c, I'm on the fence about it.  On the one hand, I agree,
> it's a bit of a complex process to get right.  On the other hand, it's
> far better if we put something out there along the lines of "if you
> really want to, this is how to do it" than having folks try to fumble
> through to find the correct steps themselves.

So, here's the correct steps for Bruce, because his current doc does not
cover all of these.  I really think this should go in as a numbered set
of steps; the current doc has some steps as steps, and other stuff
buried in paragraphs.

1. Install the new version binaries on both servers, alongside the old
version.

2. If not done by the package install, initdb the new version's data
directory.

3. Check that the replica is not very lagged.  If it is, wait for
traffic to die down and for it to catch up.

4. Shut down the master using -m fast or -m smart for a clean shutdown.It is not necessary to shut down the replicas
yet.

5. pg_upgrade the master using the --link option.  Do not start the new
version yet.

6. create a data directory for the new version on the replica.  This
directory should be empty; if it was initdb'd by the installation
package, then delete its contents.

7. shut down postgres on the replica.

8. rsync both the old and new data directories from the master to the
replica, using the --size-only and -H hard links options.  For example,
if both 9.3 and 9.4 are in /var/lib/postgresql, do:

rsync -aHv --size-only -e ssh --itemize-changes /var/lib/postgresql/
replica-host:/var/lib/postgresql/

9. Create a recovery.conf file in the replica's data directory with the
appropriate parameters.

10. Start the master, then the replica


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: pg_upgrade and rsync

От
Bruce Momjian
Дата:
On Tue, Jan 27, 2015 at 10:16:48PM -0500, David Steele wrote:
> This is definitely an edge case.  Not only does the file have to be
> modified in the same second *after* rsync has done the copy, but the
> file also has to not be modified in *any other subsequent second* before
> the next incremental backup.  If the file is busy enough to have a
> collision with rsync in that second, then it is very likely to be
> modified before the next incremental backup which is generally a day or
> so later.  And, of course, the backup where the issue occurs is fine -
> it's the next backup that is invalid.
>
> However, the hot/cold backup scheme as documented does make the race
> condition more likely since the two backups are done in close proximity
> temporally.  Ultimately, the most reliable method is to use checksums.
>
> For me the biggest issue is that there is no way to discover if a db in
> consistent no matter how much time/resources you are willing to spend.
> I could live with the idea of the occasional bad backup (since I keep as
> many as possible), but having no way to know whether it is good or not
> is very frustrating.  I know data checksums are a step in that
> direction, but they are a long way from providing the optimal solution.
> I've implemented rigorous checksums in PgBackRest but something closer
> to the source would be even better.

Agreed.  I have update the two mentions of rsync in our docs to clarify
this.  Thank you.

The patch also has pg_upgrade doc improvements suggested by comments
from Josh Berkus.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Вложения

Re: pg_upgrade and rsync

От
Andrew Dunstan
Дата:
On 01/29/2015 12:26 AM, Josh Berkus wrote:
>> So, for my 2c, I'm on the fence about it.  On the one hand, I agree,
>> it's a bit of a complex process to get right.  On the other hand, it's
>> far better if we put something out there along the lines of "if you
>> really want to, this is how to do it" than having folks try to fumble
>> through to find the correct steps themselves.
> So, here's the correct steps for Bruce, because his current doc does not
> cover all of these.  I really think this should go in as a numbered set
> of steps; the current doc has some steps as steps, and other stuff
> buried in paragraphs.
>
> 1. Install the new version binaries on both servers, alongside the old
> version.
>
> 2. If not done by the package install, initdb the new version's data
> directory.
>
> 3. Check that the replica is not very lagged.  If it is, wait for
> traffic to die down and for it to catch up.
>
> 4. Shut down the master using -m fast or -m smart for a clean shutdown.
>   It is not necessary to shut down the replicas yet.
>
> 5. pg_upgrade the master using the --link option.  Do not start the new
> version yet.
>
> 6. create a data directory for the new version on the replica.  This
> directory should be empty; if it was initdb'd by the installation
> package, then delete its contents.
>
> 7. shut down postgres on the replica.
>
> 8. rsync both the old and new data directories from the master to the
> replica, using the --size-only and -H hard links options.  For example,
> if both 9.3 and 9.4 are in /var/lib/postgresql, do:
>
> rsync -aHv --size-only -e ssh --itemize-changes /var/lib/postgresql/
> replica-host:/var/lib/postgresql/
>
> 9. Create a recovery.conf file in the replica's data directory with the
> appropriate parameters.
>
> 10. Start the master, then the replica
>
>

I find steps 2 and 6 confusing.

cheers

andrew



Re: pg_upgrade and rsync

От
Bruce Momjian
Дата:
On Thu, Jan 29, 2015 at 10:21:30AM -0500, Andrew Dunstan wrote:
> 
> On 01/29/2015 12:26 AM, Josh Berkus wrote:
> >>So, for my 2c, I'm on the fence about it.  On the one hand, I agree,
> >>it's a bit of a complex process to get right.  On the other hand, it's
> >>far better if we put something out there along the lines of "if you
> >>really want to, this is how to do it" than having folks try to fumble
> >>through to find the correct steps themselves.
> >So, here's the correct steps for Bruce, because his current doc does not
> >cover all of these.  I really think this should go in as a numbered set
> >of steps; the current doc has some steps as steps, and other stuff
> >buried in paragraphs.
> >
> >1. Install the new version binaries on both servers, alongside the old
> >version.
> >
> >2. If not done by the package install, initdb the new version's data
> >directory.
> >
> >3. Check that the replica is not very lagged.  If it is, wait for
> >traffic to die down and for it to catch up.
> >
> >4. Shut down the master using -m fast or -m smart for a clean shutdown.
> >  It is not necessary to shut down the replicas yet.
> >
> >5. pg_upgrade the master using the --link option.  Do not start the new
> >version yet.
> >
> >6. create a data directory for the new version on the replica.  This
> >directory should be empty; if it was initdb'd by the installation
> >package, then delete its contents.
> >
> >7. shut down postgres on the replica.
> >
> >8. rsync both the old and new data directories from the master to the
> >replica, using the --size-only and -H hard links options.  For example,
> >if both 9.3 and 9.4 are in /var/lib/postgresql, do:
> >
> >rsync -aHv --size-only -e ssh --itemize-changes /var/lib/postgresql/
> >replica-host:/var/lib/postgresql/
> >
> >9. Create a recovery.conf file in the replica's data directory with the
> >appropriate parameters.
> >
> >10. Start the master, then the replica
> >
> >
> 
> I find steps 2 and 6 confusing.

For number 2, he is creating a new cluster on the master server.  For
#6, he is just creating an empty data directory, though this is not
required as rsync will create the directory for you.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: pg_upgrade and rsync

От
Bruce Momjian
Дата:
On Wed, Jan 28, 2015 at 09:26:11PM -0800, Josh Berkus wrote:
> 3. Check that the replica is not very lagged.  If it is, wait for
> traffic to die down and for it to catch up.

Is this necessary.  It seems quite imprecise too.

> 4. Shut down the master using -m fast or -m smart for a clean shutdown.
>  It is not necessary to shut down the replicas yet.

We already give instructions on how to shut down the server in the
pg_ugprade docs.

> 5. pg_upgrade the master using the --link option.  Do not start the new
> version yet.

Stephen mentioned that --link is not clear in the old docs --- I fixed
that.

> 6. create a data directory for the new version on the replica.  This
> directory should be empty; if it was initdb'd by the installation
> package, then delete its contents.

rsync will create this for you.

> 10. Start the master, then the replica

I have incorporated all your suggestions in the attached patch.  I also
split items into separate sections as you suggested.  You can read the
end result here:

    http://momjian.us/tmp/pgsql/pgupgrade.html

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Вложения

Re: pg_upgrade and rsync

От
Andrew Dunstan
Дата:
On 01/29/2015 11:34 AM, Bruce Momjian wrote:
> On Thu, Jan 29, 2015 at 10:21:30AM -0500, Andrew Dunstan wrote:
>> On 01/29/2015 12:26 AM, Josh Berkus wrote:
>>>> So, for my 2c, I'm on the fence about it.  On the one hand, I agree,
>>>> it's a bit of a complex process to get right.  On the other hand, it's
>>>> far better if we put something out there along the lines of "if you
>>>> really want to, this is how to do it" than having folks try to fumble
>>>> through to find the correct steps themselves.
>>> So, here's the correct steps for Bruce, because his current doc does not
>>> cover all of these.  I really think this should go in as a numbered set
>>> of steps; the current doc has some steps as steps, and other stuff
>>> buried in paragraphs.
>>>
>>> 1. Install the new version binaries on both servers, alongside the old
>>> version.
>>>
>>> 2. If not done by the package install, initdb the new version's data
>>> directory.
>>>
>>> 3. Check that the replica is not very lagged.  If it is, wait for
>>> traffic to die down and for it to catch up.
>>>
>>> 4. Shut down the master using -m fast or -m smart for a clean shutdown.
>>>   It is not necessary to shut down the replicas yet.
>>>
>>> 5. pg_upgrade the master using the --link option.  Do not start the new
>>> version yet.
>>>
>>> 6. create a data directory for the new version on the replica.  This
>>> directory should be empty; if it was initdb'd by the installation
>>> package, then delete its contents.
>>>
>>> 7. shut down postgres on the replica.
>>>
>>> 8. rsync both the old and new data directories from the master to the
>>> replica, using the --size-only and -H hard links options.  For example,
>>> if both 9.3 and 9.4 are in /var/lib/postgresql, do:
>>>
>>> rsync -aHv --size-only -e ssh --itemize-changes /var/lib/postgresql/
>>> replica-host:/var/lib/postgresql/
>>>
>>> 9. Create a recovery.conf file in the replica's data directory with the
>>> appropriate parameters.
>>>
>>> 10. Start the master, then the replica
>>>
>>>
>> I find steps 2 and 6 confusing.
> For number 2, he is creating a new cluster on the master server.  For
> #6, he is just creating an empty data directory, though this is not
> required as rsync will create the directory for you.
>


Then step 2 should specify that it's for the master.

cheers

andrew




Re: pg_upgrade and rsync

От
Bruce Momjian
Дата:
On Thu, Jan 29, 2015 at 12:09:58PM -0500, Andrew Dunstan wrote:
> >>>7. shut down postgres on the replica.
> >>>
> >>>8. rsync both the old and new data directories from the master to the
> >>>replica, using the --size-only and -H hard links options.  For example,
> >>>if both 9.3 and 9.4 are in /var/lib/postgresql, do:
> >>>
> >>>rsync -aHv --size-only -e ssh --itemize-changes /var/lib/postgresql/
> >>>replica-host:/var/lib/postgresql/
> >>>
> >>>9. Create a recovery.conf file in the replica's data directory with the
> >>>appropriate parameters.
> >>>
> >>>10. Start the master, then the replica
> >>>
> >>>
> >>I find steps 2 and 6 confusing.
> >For number 2, he is creating a new cluster on the master server.  For
> >#6, he is just creating an empty data directory, though this is not
> >required as rsync will create the directory for you.
> >
> 
> 
> Then step 2 should specify that it's for the master.

Right.  Josh is just listing all the steps --- the pg_upgrade docs
already have that spelled out in detail.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: pg_upgrade and rsync

От
Josh Berkus
Дата:
On 01/29/2015 09:11 AM, Bruce Momjian wrote:
> On Thu, Jan 29, 2015 at 12:09:58PM -0500, Andrew Dunstan wrote:
>> Then step 2 should specify that it's for the master.
> 
> Right.  Josh is just listing all the steps --- the pg_upgrade docs
> already have that spelled out in detail.

What I'm also saying is that, if we expect anyone to be able to follow
all of these steps, it has to be very explicit; just saying "Follow the
pg_upgrade docs but don't start the master yet" isn't clear enough,
because the pg_upgrade docs have a few alternative paths.

On  the whole, I personally would never follow this procedure at a
production site.  It's way too fragile and easy to screw up.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: pg_upgrade and rsync

От
David Steele
Дата:
<div class="moz-text-plain" graphical-quote="true" lang="x-western" style="font-family: -moz-fixed; font-size: 12px;"
wrap="true"><prewrap="">On 1/29/15 11:34 AM, Bruce Momjian wrote: 
</pre><blockquote style="color: #000000;" type="cite"><pre wrap="">3. Check that the replica is not very lagged.  If it
is,wait for 
traffic to die down and for it to catch up.
</pre></blockquote><pre wrap="">I think I'd want a something a bit more specific here.  When the primary
shuts down it will kick out one last WAL.  The filename should be recorded.

</pre><blockquote style="color: #000000;" type="cite"><pre wrap="">7. shut down postgres on the replica.
</pre></blockquote><pre wrap="">Before the shutdown make sure that the replicas are waiting on the
subsequent log file to appear (note that versions prior to 9.3 skip
00).  That means all WAL has been consumed and the primary and
replica(s) are in the same state.

This is a bit more complex if streaming replication is being used
<b class="moz-txt-star"><span class="moz-txt-tag">*</span>without<span class="moz-txt-tag">*</span></b> good old
fashionedlog shipping to a backup server and I'm not 
sure exactly how to go about it.  I suppose you could start Postgres in
single user mode, commit a transaction, and make sure that transaction
gets to the replicas.

OTOH, streaming replication (unless it is synchronous) would be crazy
without doing WAL backup.  Maybe that's just me.

<div class="moz-txt-sig">--
- David Steele
<a class="moz-txt-link-abbreviated" href="mailto:david@pgmasters.net">david@pgmasters.net</a>


</div></pre></div>

Re: pg_upgrade and rsync

От
David Steele
Дата:
On 1/29/15 12:42 PM, Josh Berkus wrote:
> On 01/29/2015 09:11 AM, Bruce Momjian wrote:
>> On Thu, Jan 29, 2015 at 12:09:58PM -0500, Andrew Dunstan wrote:
>>> Then step 2 should specify that it's for the master.
>> Right.  Josh is just listing all the steps --- the pg_upgrade docs
>> already have that spelled out in detail.
> What I'm also saying is that, if we expect anyone to be able to follow
> all of these steps, it has to be very explicit; just saying "Follow the
> pg_upgrade docs but don't start the master yet" isn't clear enough,
> because the pg_upgrade docs have a few alternative paths.
>
> On  the whole, I personally would never follow this procedure at a
> production site.  It's way too fragile and easy to screw up.

I'm in agreement with Josh - I would not use this method.  I may be
wrong, but it makes me extremely nervous.

I prefer to upgrade the primary and get it back up as soon as possible,
then take a backup and restore it to the replicas.  If the replicas are
being used for read-only queries instead of just redundancy then I
redirect that traffic to the primary while the replicas are being
upgraded and restored.  This method has the least downtime for the primary.

If you want less downtime overall then it's best to use the hot rsync /
cold rsync with checksums method, though this depends a lot on the size
of your database.

Ultimately, there is no single best method.  It depends a lot on your
environment.  I would prefer the official documents to contain very safe
methods.

--
- David Steele
david@pgmasters.net



Re: pg_upgrade and rsync

От
David Steele
Дата:
On 1/29/15 10:13 AM, Bruce Momjian wrote:
> Agreed.  I have update the two mentions of rsync in our docs to clarify
> this.  Thank you.
>
> The patch also has pg_upgrade doc improvements suggested by comments
> from Josh Berkus.

It's very good to see this.  Mentions of this rsync vulnerability are
few and far between.

--
- David Steele
david@pgmasters.net



Re: pg_upgrade and rsync

От
Jim Nasby
Дата:
On 1/29/15 5:53 PM, David Steele wrote:
> On 1/29/15 12:42 PM, Josh Berkus wrote:
>> On 01/29/2015 09:11 AM, Bruce Momjian wrote:
>>> On Thu, Jan 29, 2015 at 12:09:58PM -0500, Andrew Dunstan wrote:
>>>> Then step 2 should specify that it's for the master.
>>> Right.  Josh is just listing all the steps --- the pg_upgrade docs
>>> already have that spelled out in detail.
>> What I'm also saying is that, if we expect anyone to be able to follow
>> all of these steps, it has to be very explicit; just saying "Follow the
>> pg_upgrade docs but don't start the master yet" isn't clear enough,
>> because the pg_upgrade docs have a few alternative paths.
>>
>> On  the whole, I personally would never follow this procedure at a
>> production site.  It's way too fragile and easy to screw up.
>
> I'm in agreement with Josh - I would not use this method.  I may be
> wrong, but it makes me extremely nervous.
>
> I prefer to upgrade the primary and get it back up as soon as possible,
> then take a backup and restore it to the replicas.  If the replicas are
> being used for read-only queries instead of just redundancy then I
> redirect that traffic to the primary while the replicas are being
> upgraded and restored.  This method has the least downtime for the primary.
>
> If you want less downtime overall then it's best to use the hot rsync /
> cold rsync with checksums method, though this depends a lot on the size
> of your database.
>
> Ultimately, there is no single best method.  It depends a lot on your
> environment.  I would prefer the official documents to contain very safe
> methods.

How do we define safe though? Your method leaves you without a backup server until your base backup completes and the
replicacatches up. I think we do a dis-service to our users by not pointing that out and providing a potential
alternate*so long as we spell out the tradeoffs/risks*.
 

Ultimately, I think this thread really shows the very large need for a tool that understands things like LSNs to
providersync-ish behavior that's actually safe.
 

FWIW, I personally am very leery of relying on pg_upgrade. It's too easy to introduce bugs, doesn't handle all cases,
andprovides no option for going back to your previous version without losing data. I much prefer old_version --
londiste--> new_version, and then doing the upgrade by reversing the direction of replication.
 

I also don't entirely trust PITR backups. It's too easy to accidentally break them in subtle ways.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: pg_upgrade and rsync

От
David Steele
Дата:
On 1/29/15 7:07 PM, Jim Nasby wrote:
>> Ultimately, there is no single best method.  It depends a lot on your
>> environment.  I would prefer the official documents to contain very safe
>> methods.
>
> How do we define safe though? Your method leaves you without a backup
> server until your base backup completes and the replica catches up. I
> think we do a dis-service to our users by not pointing that out and
> providing a potential alternate *so long as we spell out the
> tradeoffs/risks*.

My method leaves you without a replica, but not without a *backup* as
long as you are shipping WAL somewhere safe.  You can set
archive_timeout to something small if you want to make this safer.  This
is more practical in 9.4 since unused WAL space is zeroed.

OK, I'm willing to admit it would be better to have the option with all
caveats, so long as they are strongly worded.

> Ultimately, I think this thread really shows the very large need for a
> tool that understands things like LSNs to provide rsync-ish behavior
> that's actually safe.

Safe backups can be done without LSNs provided you are willing to trust
your timestamps.

> FWIW, I personally am very leery of relying on pg_upgrade. It's too
> easy to introduce bugs, doesn't handle all cases, and provides no
> option for going back to your previous version without losing data. I
> much prefer old_version -- londiste --> new_version, and then doing
> the upgrade by reversing the direction of replication.

I think the official docs need to stick with options that are core?

I avoid pg_upgrade wherever it is practical.  However, sometimes it
really is the best option.

> I also don't entirely trust PITR backups. It's too easy to
> accidentally break them in subtle ways.

Agreed in general, but I've been doing a lot of work to make this not be
true anymore.

--
- David Steele
david@pgmasters.net



Re: pg_upgrade and rsync

От
Jim Nasby
Дата:
On 1/29/15 6:25 PM, David Steele wrote:
> Safe backups can be done without LSNs provided you are willing to trust
> your timestamps.

Which AFAICT simply isn't safe to do at all... except maybe with the manifest stuff you've talked about?

>> >FWIW, I personally am very leery of relying on pg_upgrade. It's too
>> >easy to introduce bugs, doesn't handle all cases, and provides no
>> >option for going back to your previous version without losing data. I
>> >much prefer old_version -- londiste --> new_version, and then doing
>> >the upgrade by reversing the direction of replication.
> I think the official docs need to stick with options that are core?

I don't think we have any such requirement. IIRC the docs used to talk about using logical replication before we had
pg_upgrade(and may have actually called out Slony).
 

> I avoid pg_upgrade wherever it is practical.  However, sometimes it
> really is the best option.

Certainly. I think what we should be doing is spelling out the available options (with pros/cons) so that users can
decidewhat's best.
 

>> >I also don't entirely trust PITR backups. It's too easy to
>> >accidentally break them in subtle ways.
> Agreed in general, but I've been doing a lot of work to make this not be
> true anymore.

:)

I'd love to see all this stuff Just Work (tm), but I don't think we're there yet, and I'm not really sure how we can
getthere.
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: pg_upgrade and rsync

От
David Steele
Дата:
On 1/29/15 7:55 PM, Jim Nasby wrote:
> On 1/29/15 6:25 PM, David Steele wrote:
>> Safe backups can be done without LSNs provided you are willing to trust
>> your timestamps.
>
> Which AFAICT simply isn't safe to do at all... except maybe with the
> manifest stuff you've talked about?

Yes - that's what I'm talking about.  I had hoped to speak about this at
PgConfNYC, but perhaps I can do it in a lightning talk instead.

--
- David Steele
david@pgmasters.net



Re: pg_upgrade and rsync

От
Jim Nasby
Дата:
On 1/29/15 7:02 PM, David Steele wrote:
> On 1/29/15 7:55 PM, Jim Nasby wrote:
>> On 1/29/15 6:25 PM, David Steele wrote:
>>> Safe backups can be done without LSNs provided you are willing to trust
>>> your timestamps.
>>
>> Which AFAICT simply isn't safe to do at all... except maybe with the
>> manifest stuff you've talked about?
>
> Yes - that's what I'm talking about.  I had hoped to speak about this at
> PgConfNYC, but perhaps I can do it in a lightning talk instead.

Sounds like maybe it should be part of our documentation too...
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: pg_upgrade and rsync

От
David Steele
Дата:
On 1/29/15 8:09 PM, Jim Nasby wrote:
> On 1/29/15 7:02 PM, David Steele wrote:
>> On 1/29/15 7:55 PM, Jim Nasby wrote:
>>> On 1/29/15 6:25 PM, David Steele wrote:
>>>> Safe backups can be done without LSNs provided you are willing to
>>>> trust
>>>> your timestamps.
>>>
>>> Which AFAICT simply isn't safe to do at all... except maybe with the
>>> manifest stuff you've talked about?
>>
>> Yes - that's what I'm talking about.  I had hoped to speak about this at
>> PgConfNYC, but perhaps I can do it in a lightning talk instead.
>
> Sounds like maybe it should be part of our documentation too...

I think the warnings Bruce has added to the documentation about using
checksums are sufficient for now.  The manifest build and delay
methodology are part of PgBackRest, the backup solution I'm working on
as an alternative to barman, etc.  It's not something that can be
implemented trivially.

--
- David Steele
david@pgmasters.net



Re: pg_upgrade and rsync

От
Bruce Momjian
Дата:
On Wed, Jan 28, 2015 at 09:26:11PM -0800, Josh Berkus wrote:
>
> > So, for my 2c, I'm on the fence about it.  On the one hand, I agree,
> > it's a bit of a complex process to get right.  On the other hand, it's
> > far better if we put something out there along the lines of "if you
> > really want to, this is how to do it" than having folks try to fumble
> > through to find the correct steps themselves.
>
> So, here's the correct steps for Bruce, because his current doc does not
> cover all of these.  I really think this should go in as a numbered set
> of steps; the current doc has some steps as steps, and other stuff
> buried in paragraphs.
>
> 1. Install the new version binaries on both servers, alongside the old
> version.
>
> 2. If not done by the package install, initdb the new version's data
> directory.
>
> 3. Check that the replica is not very lagged.  If it is, wait for
> traffic to die down and for it to catch up.

Now that 9.4.1 is released, I would like to get this doc patch applied
--- it will close the often-requested feature of how to pg_upgrade slave
clusters.

I wasn't happy with Josh's specification above that the "replica is not
very lagged", so I added a bullet point to check the pg_controldata
output to verify that the primary and standby servers are synchronized.

Yes, this adds even more complication to the pg_upgrade instructions,
but it is really more of the same complexity.  pg_upgrade really needs
an install-aware and OS-aware tool on top of it to automate much of
this.

Patch attached.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Вложения

Re: pg_upgrade and rsync

От
David Steele
Дата:
On 2/19/15 11:57 AM, Bruce Momjian wrote:
> On Wed, Jan 28, 2015 at 09:26:11PM -0800, Josh Berkus wrote:
>>
>> 3. Check that the replica is not very lagged.  If it is, wait for
>> traffic to die down and for it to catch up.
>
> Now that 9.4.1 is released, I would like to get this doc patch applied
> --- it will close the often-requested feature of how to pg_upgrade slave
> clusters.
>
> I wasn't happy with Josh's specification above that the "replica is not
> very lagged", so I added a bullet point to check the pg_controldata
> output to verify that the primary and standby servers are synchronized.
>
> Yes, this adds even more complication to the pg_upgrade instructions,
> but it is really more of the same complexity.  pg_upgrade really needs
> an install-aware and OS-aware tool on top of it to automate much of
> this.

#3 bothered me as well because it was not specific enough.  I like what
you've added to clarify the procedure.

--
- David Steele
david@pgmasters.net


Re: pg_upgrade and rsync

От
Bruce Momjian
Дата:
On Thu, Feb 19, 2015 at 09:35:02PM -0500, David Steele wrote:
> On 2/19/15 11:57 AM, Bruce Momjian wrote:
> > On Wed, Jan 28, 2015 at 09:26:11PM -0800, Josh Berkus wrote:
> >>
> >> 3. Check that the replica is not very lagged.  If it is, wait for
> >> traffic to die down and for it to catch up.
> > 
> > Now that 9.4.1 is released, I would like to get this doc patch applied
> > --- it will close the often-requested feature of how to pg_upgrade slave
> > clusters.
> > 
> > I wasn't happy with Josh's specification above that the "replica is not
> > very lagged", so I added a bullet point to check the pg_controldata
> > output to verify that the primary and standby servers are synchronized.
> > 
> > Yes, this adds even more complication to the pg_upgrade instructions,
> > but it is really more of the same complexity.  pg_upgrade really needs
> > an install-aware and OS-aware tool on top of it to automate much of
> > this.
> 
> #3 bothered me as well because it was not specific enough.  I like what
> you've added to clarify the procedure.

Good.  It took me a while to understand why they have to be in sync ---
because we are using rsync in size-only-comparison mode, if they are not
in sync we might update some files whose sizes changed, but not others,
and the old slave would be broken.  The new slave is going to get all
new files or hard links for user files, so it would be fine, but we
should be able to fall back to the old slaves, and having them in sync
allows that.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: pg_upgrade and rsync

От
Bruce Momjian
Дата:
On Fri, Feb 20, 2015 at 09:45:08AM -0500, Bruce Momjian wrote:
> > #3 bothered me as well because it was not specific enough.  I like what
> > you've added to clarify the procedure.
> 
> Good.  It took me a while to understand why they have to be in sync ---
> because we are using rsync in size-only-comparison mode, if they are not
> in sync we might update some files whose sizes changed, but not others,
> and the old slave would be broken.  The new slave is going to get all
> new files or hard links for user files, so it would be fine, but we
> should be able to fall back to the old slaves, and having them in sync
> allows that.

Also, since there was concern about the instructions, I am thinking of
applying the patch only to head for 9.5, and then blog about it if
people want to test it.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: pg_upgrade and rsync

От
Vladimir Borodin
Дата:

20 февр. 2015 г., в 18:21, Bruce Momjian <bruce@momjian.us> написал(а):

On Fri, Feb 20, 2015 at 09:45:08AM -0500, Bruce Momjian wrote:
#3 bothered me as well because it was not specific enough.  I like what
you've added to clarify the procedure.

Good.  It took me a while to understand why they have to be in sync ---
because we are using rsync in size-only-comparison mode, if they are not
in sync we might update some files whose sizes changed, but not others,
and the old slave would be broken.  The new slave is going to get all
new files or hard links for user files, so it would be fine, but we
should be able to fall back to the old slaves, and having them in sync
allows that.

Also, since there was concern about the instructions, I am thinking of
applying the patch only to head for 9.5, and then blog about it if
people want to test it.

Am I right that if you are using hot standby with both streaming replication and WAL shipping you do still need to take full backup of master after using pg_upgrade?


--
 Bruce Momjian  <bruce@momjian.us>        http://momjian.us
 EnterpriseDB                             http://enterprisedb.com

 + Everyone has their own god. +


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


--
May the force be with you...




Re: pg_upgrade and rsync

От
Bruce Momjian
Дата:
On Tue, Feb 24, 2015 at 12:13:17PM +0300, Vladimir Borodin wrote:
> 
>     20 февр. 2015 г., в 18:21, Bruce Momjian <bruce@momjian.us> написал(а):
> 
>     On Fri, Feb 20, 2015 at 09:45:08AM -0500, Bruce Momjian wrote:
> 
>             #3 bothered me as well because it was not specific enough.  I like
>             what
>             you've added to clarify the procedure.
> 
> 
>         Good.  It took me a while to understand why they have to be in sync ---
>         because we are using rsync in size-only-comparison mode, if they are
>         not
>         in sync we might update some files whose sizes changed, but not others,
>         and the old slave would be broken.  The new slave is going to get all
>         new files or hard links for user files, so it would be fine, but we
>         should be able to fall back to the old slaves, and having them in sync
>         allows that.
> 
> 
>     Also, since there was concern about the instructions, I am thinking of
>     applying the patch only to head for 9.5, and then blog about it if
>     people want to test it.
> 
> 
> Am I right that if you are using hot standby with both streaming replication
> and WAL shipping you do still need to take full backup of master after using
> pg_upgrade?

No, you would not need to take a full backup if you use these instructions.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: pg_upgrade and rsync

От
Vladimir Borodin
Дата:

2 марта 2015 г., в 21:28, Bruce Momjian <bruce@momjian.us> написал(а):

On Tue, Feb 24, 2015 at 12:13:17PM +0300, Vladimir Borodin wrote:

   20 февр. 2015 г., в 18:21, Bruce Momjian <bruce@momjian.us> написал(а):

   On Fri, Feb 20, 2015 at 09:45:08AM -0500, Bruce Momjian wrote:

           #3 bothered me as well because it was not specific enough.  I like
           what
           you've added to clarify the procedure.


       Good.  It took me a while to understand why they have to be in sync ---
       because we are using rsync in size-only-comparison mode, if they are
       not
       in sync we might update some files whose sizes changed, but not others,
       and the old slave would be broken.  The new slave is going to get all
       new files or hard links for user files, so it would be fine, but we
       should be able to fall back to the old slaves, and having them in sync
       allows that.


   Also, since there was concern about the instructions, I am thinking of
   applying the patch only to head for 9.5, and then blog about it if
   people want to test it.


Am I right that if you are using hot standby with both streaming replication
and WAL shipping you do still need to take full backup of master after using
pg_upgrade?

No, you would not need to take a full backup if you use these instructions.

Although it would be applied to documentation for 9.5 only, are these instructions applicable for upgrading from 9.3.6 to 9.4.1?

Following the instructions from patch I’ve got following errors in postgresql.log of replica after trying to start it with hot_standby = on:

< 2015-02-24 11:47:22.861 MSK >WARNING:  WAL was generated with wal_level=minimal, data may be missing
< 2015-02-24 11:47:22.861 MSK >HINT:  This happens if you temporarily set wal_level=minimal without taking a new base backup.
< 2015-02-24 11:47:22.861 MSK >FATAL:  hot standby is not possible because wal_level was not set to "hot_standby" or higher on the master server
< 2015-02-24 11:47:22.861 MSK >HINT:  Either set wal_level to "hot_standby" on the master, or turn off hot_standby here.
< 2015-02-24 11:47:22.862 MSK >LOG:  startup process (PID 28093) exited with exit code 1
< 2015-02-24 11:47:22.862 MSK >LOG:  aborting startup due to startup process failure


--
 Bruce Momjian  <bruce@momjian.us>        http://momjian.us
 EnterpriseDB                             http://enterprisedb.com

 + Everyone has their own god. +


--
Да пребудет с вами сила…

Re: pg_upgrade and rsync

От
Bruce Momjian
Дата:
On Tue, Mar  3, 2015 at 11:38:58AM +0300, Vladimir Borodin wrote:
>     No, you would not need to take a full backup if you use these instructions.
> 
> 
> Although it would be applied to documentation for 9.5 only, are these
> instructions applicable for upgrading from 9.3.6 to 9.4.1?

Yes.  They work all the way back to 9.0.

> Following the instructions from patch I’ve got following errors in
> postgresql.log of replica after trying to start it with hot_standby = on:
> 
> < 2015-02-24 11:47:22.861 MSK >WARNING:  WAL was generated with wal_level=
> minimal, data may be missing
> < 2015-02-24 11:47:22.861 MSK >HINT:  This happens if you temporarily set
> wal_level=minimal without taking a new base backup.
> < 2015-02-24 11:47:22.861 MSK >FATAL:  hot standby is not possible because
> wal_level was not set to "hot_standby" or higher on the master server
> < 2015-02-24 11:47:22.861 MSK >HINT:  Either set wal_level to "hot_standby" on
> the master, or turn off hot_standby here.
> < 2015-02-24 11:47:22.862 MSK >LOG:  startup process (PID 28093) exited with
> exit code 1
> < 2015-02-24 11:47:22.862 MSK >LOG:  aborting startup due to startup process
> failure

OK, hmmm.  Thanks for testing.  It feels like you didn't have your new
master set up for streaming replication when you ran pg_upgrade.  Is
that correct?  Should I specify that specifically in the instructions?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: pg_upgrade and rsync

От
Bruce Momjian
Дата:
On Tue, Mar  3, 2015 at 08:38:50AM -0500, Bruce Momjian wrote:
> > < 2015-02-24 11:47:22.861 MSK >WARNING:  WAL was generated with wal_level=
> > minimal, data may be missing
> > < 2015-02-24 11:47:22.861 MSK >HINT:  This happens if you temporarily set
> > wal_level=minimal without taking a new base backup.
> > < 2015-02-24 11:47:22.861 MSK >FATAL:  hot standby is not possible because
> > wal_level was not set to "hot_standby" or higher on the master server
> > < 2015-02-24 11:47:22.861 MSK >HINT:  Either set wal_level to "hot_standby" on
> > the master, or turn off hot_standby here.
> > < 2015-02-24 11:47:22.862 MSK >LOG:  startup process (PID 28093) exited with
> > exit code 1
> > < 2015-02-24 11:47:22.862 MSK >LOG:  aborting startup due to startup process
> > failure
> 
> OK, hmmm.  Thanks for testing.  It feels like you didn't have your new
> master set up for streaming replication when you ran pg_upgrade.  Is
> that correct?  Should I specify that specifically in the instructions?

Actually, I think you are on to something that needs to be documented. 
Because the old and new clusters might be using the same port number,
you can't configure the new master to use streaming replication because
you can't be shipping those logs to the old standby.  Yikes.  OK, I
think we need to document that you need to set wal_level=hot_standby on
the new master, but not set up streaming.  Once you are done the
upgrade, you should configure streaming.

If this fixes the problem, I will generate an updated documentation
patch.  Please let me know.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: pg_upgrade and rsync

От
Vladimir Borodin
Дата:

3 марта 2015 г., в 16:38, Bruce Momjian <bruce@momjian.us> написал(а):

On Tue, Mar  3, 2015 at 11:38:58AM +0300, Vladimir Borodin wrote:
   No, you would not need to take a full backup if you use these instructions.


Although it would be applied to documentation for 9.5 only, are these
instructions applicable for upgrading from 9.3.6 to 9.4.1?

Yes.  They work all the way back to 9.0.

Following the instructions from patch I’ve got following errors in
postgresql.log of replica after trying to start it with hot_standby = on:

< 2015-02-24 11:47:22.861 MSK >WARNING:  WAL was generated with wal_level=
minimal, data may be missing
< 2015-02-24 11:47:22.861 MSK >HINT:  This happens if you temporarily set
wal_level=minimal without taking a new base backup.
< 2015-02-24 11:47:22.861 MSK >FATAL:  hot standby is not possible because
wal_level was not set to "hot_standby" or higher on the master server
< 2015-02-24 11:47:22.861 MSK >HINT:  Either set wal_level to "hot_standby" on
the master, or turn off hot_standby here.
< 2015-02-24 11:47:22.862 MSK >LOG:  startup process (PID 28093) exited with
exit code 1
< 2015-02-24 11:47:22.862 MSK >LOG:  aborting startup due to startup process
failure

OK, hmmm.  Thanks for testing.  It feels like you didn't have your new
master set up for streaming replication when you ran pg_upgrade.  Is
that correct?  Should I specify that specifically in the instructions?

After running pg_upgrade I do put in new PGDATA on master old pg_hba.conf and postgresql.conf with wal_level = hot_standby. The full content of postgresql.conf could be seen here - http://pastie.org/9995902. Then I do rsync to replica, put recovery.conf and try to start both - first master, then replica. If I turn off hot_standby in replica configuration, it starts. What am I doing wrong?


--
 Bruce Momjian  <bruce@momjian.us>        http://momjian.us
 EnterpriseDB                             http://enterprisedb.com

 + Everyone has their own god. +


--
Да пребудет с вами сила…

Re: pg_upgrade and rsync

От
Bruce Momjian
Дата:
On Tue, Mar  3, 2015 at 04:55:56PM +0300, Vladimir Borodin wrote:
>     OK, hmmm.  Thanks for testing.  It feels like you didn't have your new
>     master set up for streaming replication when you ran pg_upgrade.  Is
>     that correct?  Should I specify that specifically in the instructions?
> 
> 
> After running pg_upgrade I do put in new PGDATA on master old pg_hba.conf and
> postgresql.conf with wal_level = hot_standby. The full content of
> postgresql.conf could be seen here - http://pastie.org/9995902. Then I do rsync
> to replica, put recovery.conf and try to start both - first master, then
> replica. If I turn off hot_standby in replica configuration, it starts. What am
> I doing wrong?

After running initdb to create the new master, but before running
pg_upgrade, modify the new master's postgresql.conf and change wal_level
= hot_standby.  (Don't modify pg_hba.conf at this stage.)

I didn't think that was necessary, but this might be some 9.3-specific
problem, but let's get it working first.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: pg_upgrade and rsync

От
Vladimir Borodin
Дата:

3 марта 2015 г., в 18:01, Bruce Momjian <bruce@momjian.us> написал(а):

On Tue, Mar  3, 2015 at 04:55:56PM +0300, Vladimir Borodin wrote:
   OK, hmmm.  Thanks for testing.  It feels like you didn't have your new
   master set up for streaming replication when you ran pg_upgrade.  Is
   that correct?  Should I specify that specifically in the instructions?


After running pg_upgrade I do put in new PGDATA on master old pg_hba.conf and
postgresql.conf with wal_level = hot_standby. The full content of
postgresql.conf could be seen here - http://pastie.org/9995902. Then I do rsync
to replica, put recovery.conf and try to start both - first master, then
replica. If I turn off hot_standby in replica configuration, it starts. What am
I doing wrong?

After running initdb to create the new master, but before running
pg_upgrade, modify the new master's postgresql.conf and change wal_level
= hot_standby.  (Don't modify pg_hba.conf at this stage.)


That does not help. The reason is that pg_upgrade sets 'Current wal_level setting: minimal' in control-file, and it does not depend on what is set in postgresql.conf before running pg_upgrade. Details could be seen here - http://pastie.org/9998671.

The workaround for this is to start  and cleanly shut down postgres on master after running pg_upgrade but before running rsync. After that there would be a good control-file for streaming replication and rsync to replica can be done. But it could not be done with --size-only key, because control-file is of fixed size and rsync would skip it. Or may be everything should be copied with --size-only and control-file should be copied without this option.

I didn't think that was necessary, but this might be some 9.3-specific
problem, but let's get it working first.

--
 Bruce Momjian  <bruce@momjian.us>        http://momjian.us
 EnterpriseDB                             http://enterprisedb.com

 + Everyone has their own god. +


--
Да пребудет с вами сила…

Re: pg_upgrade and rsync

От
Bruce Momjian
Дата:
On Wed, Mar  4, 2015 at 01:53:47PM +0300, Vladimir Borodin wrote:
>     After running initdb to create the new master, but before running
>     pg_upgrade, modify the new master's postgresql.conf and change wal_level
>     = hot_standby.  (Don't modify pg_hba.conf at this stage.)
> 
> 
> 
> That does not help. The reason is that pg_upgrade sets 'Current wal_level
> setting: minimal' in control-file, and it does not depend on what is set in
> postgresql.conf before running pg_upgrade. Details could be seen here - http://
> pastie.org/9998671.

Well, what is happening is that the pg_resetxlog commands we run inside
pg_upgrade set the pg_controldata file's wal_level to minimal, but as
you saw, starting the server sets the pg_controldata properly. 
pg_resetxlog is not changing the WAL files at all, just the control
file.

> The workaround for this is to start  and cleanly shut down postgres on master
> after running pg_upgrade but before running rsync. After that there would be a
> good control-file for streaming replication and rsync to replica can be done.

You are correct that a pg_controldata file is copied over that has
wal_level=minimal, but that should not be a problem.

> But it could not be done with --size-only key, because control-file is of fixed
> size and rsync would skip it. Or may be everything should be copied with
> --size-only and control-file should be copied without this option.

Well, what happens is that there is no _new_ standby pg_controldata
file, so it is copied fully from the new master.  Are you running initdb
to create the new standby --- you shouldn't be doing that as the rsync
will do that for you.  Also, are you cleanly shutting down all the
servers, or using pg_ctl -m immediate?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: pg_upgrade and rsync

От
Vladimir Borodin
Дата:

4 марта 2015 г., в 19:28, Bruce Momjian <bruce@momjian.us> написал(а):

On Wed, Mar  4, 2015 at 01:53:47PM +0300, Vladimir Borodin wrote:
   After running initdb to create the new master, but before running
   pg_upgrade, modify the new master's postgresql.conf and change wal_level
   = hot_standby.  (Don't modify pg_hba.conf at this stage.)



That does not help. The reason is that pg_upgrade sets 'Current wal_level
setting: minimal' in control-file, and it does not depend on what is set in
postgresql.conf before running pg_upgrade. Details could be seen here - http://
pastie.org/9998671.

Well, what is happening is that the pg_resetxlog commands we run inside
pg_upgrade set the pg_controldata file's wal_level to minimal, but as
you saw, starting the server sets the pg_controldata properly.
pg_resetxlog is not changing the WAL files at all, just the control
file.

The workaround for this is to start  and cleanly shut down postgres on master
after running pg_upgrade but before running rsync. After that there would be a
good control-file for streaming replication and rsync to replica can be done.

You are correct that a pg_controldata file is copied over that has
wal_level=minimal, but that should not be a problem.

I suppose, this is the root cause of why replica does not start as hot standby. It it enough to start it as warm standby, but not hot standby. See CheckRequiredParameterValues function in xlog.c which is called inside of StartupXLOG function.


But it could not be done with --size-only key, because control-file is of fixed
size and rsync would skip it. Or may be everything should be copied with
--size-only and control-file should be copied without this option.

Well, what happens is that there is no _new_ standby pg_controldata
file, so it is copied fully from the new master.  Are you running initdb
to create the new standby --- you shouldn't be doing that as the rsync
will do that for you.  

No, I don’t. The scenario of the problem with copying control-file was in case when I:
1. ran pg_upgrade on master and got control-file with "wal_level = minimal",
2. did rsync --size-only to replica (and it got this control-file with "wal_level = minimal"),
3. started and stopped postgres on master to get «good» control-file with  "wal_level = hot_standby»,
4. did rsync --size-only to replica one more time. And this time control-file is not copied because of the same size of control-file.

Actually, if you don’t do step 2, everything works as expected. Sorry for bothering you.

Also, are you cleanly shutting down all the
servers, or using pg_ctl -m immediate?

I use init-script, it shuts down cleanly with "-m fast".


--
 Bruce Momjian  <bruce@momjian.us>        http://momjian.us
 EnterpriseDB                             http://enterprisedb.com

 + Everyone has their own god. +


--
May the force be with you…

Re: pg_upgrade and rsync

От
Bruce Momjian
Дата:
On Thu, Mar  5, 2015 at 10:55:28AM +0300, Vladimir Borodin wrote:
>     You are correct that a pg_controldata file is copied over that has
>     wal_level=minimal, but that should not be a problem.
>
>
> I suppose, this is the root cause of why replica does not start as hot standby.
> It it enough to start it as warm standby, but not hot standby.
> See CheckRequiredParameterValues function in xlog.c which is called inside of
> StartupXLOG function.

Yes, you are correct.  I spent all day building a test harness so I
could automate this setup and test various failures.  I was able to
reproduce your failure, and you are correct that the proper fix is to
set wal_level=hot_standby on the new master, and then start and stop the
new cluster just before rsync.

The root cause is that pg_upgrade calls pg_resetxlog -o on the new
cluster _after_ the new cluster stopped for the final time, so rsync is
copying the incorrect pg_controldata wal_level value.  Also, even if
pg_resetxlog preserved wal_level in the control file, there is no
guarantee that the user configured the new cluster's wal_level for
hot_standby anyway.

What I have done is to update the pg_upgrade instructions to add this
required step.  Updated doc patch attached.  (I also added the --delete
flag to rsync.)  Thanks so much for your detailed report.

>         But it could not be done with --size-only key, because control-file is
>         of fixed
>         size and rsync would skip it. Or may be everything should be copied
>         with
>         --size-only and control-file should be copied without this option.
>
>
>     Well, what happens is that there is no _new_ standby pg_controldata
>     file, so it is copied fully from the new master.  Are you running initdb
>     to create the new standby --- you shouldn't be doing that as the rsync
>     will do that for you.
>
>
> No, I don’t. The scenario of the problem with copying control-file was in case
> when I:
> 1. ran pg_upgrade on master and got control-file with "wal_level = minimal",
> 2. did rsync --size-only to replica (and it got this control-file with
> "wal_level = minimal"),
> 3. started and stopped postgres on master to get «good» control-file with
>  "wal_level = hot_standby»,
> 4. did rsync --size-only to replica one more time. And this time control-file
> is not copied because of the same size of control-file.
>
> Actually, if you don’t do step 2, everything works as expected. Sorry for
> bothering you.

Ah, yes, I think doing rsync twice is never a good suggestion.  It can
lead to too many failures.  Doing the start/stop before rsync seems like
the best solution.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Вложения

Re: pg_upgrade and rsync

От
Vladimir Borodin
Дата:

6 марта 2015 г., в 6:11, Bruce Momjian <bruce@momjian.us> написал(а):

On Thu, Mar  5, 2015 at 10:55:28AM +0300, Vladimir Borodin wrote:
   You are correct that a pg_controldata file is copied over that has
   wal_level=minimal, but that should not be a problem.


I suppose, this is the root cause of why replica does not start as hot standby.
It it enough to start it as warm standby, but not hot standby.
See CheckRequiredParameterValues function in xlog.c which is called inside of
StartupXLOG function.

Yes, you are correct.  I spent all day building a test harness so I
could automate this setup and test various failures.  I was able to
reproduce your failure, and you are correct that the proper fix is to
set wal_level=hot_standby on the new master, and then start and stop the
new cluster just before rsync.

The root cause is that pg_upgrade calls pg_resetxlog -o on the new
cluster _after_ the new cluster stopped for the final time, so rsync is
copying the incorrect pg_controldata wal_level value.  Also, even if
pg_resetxlog preserved wal_level in the control file, there is no
guarantee that the user configured the new cluster's wal_level for
hot_standby anyway.

What I have done is to update the pg_upgrade instructions to add this
required step.  Updated doc patch attached.  (I also added the --delete
flag to rsync.)  Thanks so much for your detailed report.

It seems to work fine now. The only thing that would be nice to change is to explicitly write that these instructions are correct for hot standby installations too.

+     <para>
+      If you have Log-Shipping Standby Servers (<xref
+      linkend="warm-standby">), follow these steps to upgrade them (before
+      starting any servers):
+     </para>

Actually, I’ve entered this thread because it is not obvious from the paragraph above or any other places.


       But it could not be done with --size-only key, because control-file is
       of fixed
       size and rsync would skip it. Or may be everything should be copied
       with
       --size-only and control-file should be copied without this option.


   Well, what happens is that there is no _new_ standby pg_controldata
   file, so it is copied fully from the new master.  Are you running initdb
   to create the new standby --- you shouldn't be doing that as the rsync
   will do that for you.  


No, I don’t. The scenario of the problem with copying control-file was in case
when I:
1. ran pg_upgrade on master and got control-file with "wal_level = minimal",
2. did rsync --size-only to replica (and it got this control-file with
"wal_level = minimal"),
3. started and stopped postgres on master to get «good» control-file with
"wal_level = hot_standby»,
4. did rsync --size-only to replica one more time. And this time control-file
is not copied because of the same size of control-file.

Actually, if you don’t do step 2, everything works as expected. Sorry for
bothering you.

Ah, yes, I think doing rsync twice is never a good suggestion.  It can
lead to too many failures.  Doing the start/stop before rsync seems like
the best solution.

--
 Bruce Momjian  <bruce@momjian.us>        http://momjian.us
 EnterpriseDB                             http://enterprisedb.com

 + Everyone has their own god. +
<rsync.diff>


--
May the force be with you…

Re: pg_upgrade and rsync

От
Bruce Momjian
Дата:
On Fri, Mar  6, 2015 at 10:50:27AM +0300, Vladimir Borodin wrote:
>     What I have done is to update the pg_upgrade instructions to add this
>     required step.  Updated doc patch attached.  (I also added the --delete
>     flag to rsync.)  Thanks so much for your detailed report.
>
>
> It seems to work fine now. The only thing that would be nice to change is
> to explicitly write that these instructions are correct for hot standby
> installations too.
>
> +     <para>
> +      If you have Log-Shipping Standby Servers (<xref
> +      linkend="warm-standby">), follow these steps to upgrade them (before
> +      starting any servers):
> +     </para>
>
> Actually, I’ve entered this thread because it is not obvious from the paragraph
> above or any other places.

Oh, very good point.  I was trying to match the wording we use in the
docs, but forgot that log shipping and streaming replication are
specified separately.

Updated patch attached.  You can view the output at:

    http://momjian.us/tmp/pgsql/pgupgrade.html

Thanks much!

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Вложения

Re: pg_upgrade and rsync

От
Bruce Momjian
Дата:
On Fri, Mar  6, 2015 at 12:19:36PM -0500, Bruce Momjian wrote:
> On Fri, Mar  6, 2015 at 10:50:27AM +0300, Vladimir Borodin wrote:
> >     What I have done is to update the pg_upgrade instructions to add this
> >     required step.  Updated doc patch attached.  (I also added the --delete
> >     flag to rsync.)  Thanks so much for your detailed report.
> > 
> > 
> > It seems to work fine now. The only thing that would be nice to change is
> > to explicitly write that these instructions are correct for hot standby
> > installations too.
> > 
> > +     <para>
> > +      If you have Log-Shipping Standby Servers (<xref
> > +      linkend="warm-standby">), follow these steps to upgrade them (before
> > +      starting any servers):
> > +     </para>
> > 
> > Actually, I’ve entered this thread because it is not obvious from the paragraph
> > above or any other places.
> 
> Oh, very good point.  I was trying to match the wording we use in the
> docs, but forgot that log shipping and streaming replication are
> specified separately.
> 
> Updated patch attached.  You can view the output at:
> 
>     http://momjian.us/tmp/pgsql/pgupgrade.html
> 
> Thanks much!

Patch applied to head.  While it will work all the way back to 9.0, the
instructions need a shaking out during beta.  I will blog that people
can try it.  Thanks to Stephen Frost for coming up with this solution.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +