Обсуждение: pgbackrest: backup from standby

Поиск
Список
Период
Сортировка

pgbackrest: backup from standby

От
Craig James
Дата:
The pgbackrest documentation has a section on backing up from a standby, but I'm having trouble figuring it out. I think it's just a terminology problem: I can't figure out which configuration directives go on which server.

To begin with, I already have streaming replication running:

Primary server (live database): "radon"

[emolecules]
  db-path=/var/lib/postgresql/9.6/main

[global]
  repo-path=/pg_archive/pgbackrest
  retention-full=10
  backup-user=postgres
  log-level-file=detail

Hot-standby server: "standby"

[emolecules]
  db-path=/var/lib/postgresql/9.6/main
  recovery-option=standby_mode=on

 [global]
  backup-host=radon.emolecules.com
  backup-user=postgres

The standby server is on Amazon AWS, and I want to set up a backup Amazon S3 archive that gets its snapshot and WAL files from the standby in order to save bandwidth costs from the primary (don't want to copy the data twice).

Note that the primary is also archived to a local NFS disk. I want to keep that going, too, but create a second archive on S3. (The reason for this is that we're migrating from our own hardware to AWS, and during the transition need both systems fully functional.)

There are sections in the user guide on Backup from a Standby and on S3 Support, but I can't sort out how to combine them. Apparently I need to define pg1-host/pg1-path, and pg2-host/pg2-path, then a stanza for S3. But does all this go in the pgbackrest.conf of the primary ("radon" in my case), or on the standby, or somehow split across both?

Another question: if I modify these files, will the current standby replica keep streaming properly? That is, can I add the S3 repository to the current primary/standby pair without having to start over?

Thanks!
Craig

Re: pgbackrest: backup from standby

От
Craig James
Дата:
On Mon, Aug 20, 2018 at 9:12 AM, Craig James <cjames@emolecules.com> wrote:
The pgbackrest documentation has a section on backing up from a standby, but I'm having trouble figuring it out. I think it's just a terminology problem: I can't figure out which configuration directives go on which server.

To begin with, I already have streaming replication running:

Primary server (live database): "radon"

[emolecules]
  db-path=/var/lib/postgresql/9.6/main

[global]
  repo-path=/pg_archive/pgbackrest
  retention-full=10
  backup-user=postgres
  log-level-file=detail

Hot-standby server: "standby"

[emolecules]
  db-path=/var/lib/postgresql/9.6/main
  recovery-option=standby_mode=on

 [global]
  backup-host=radon.emolecules.com
  backup-user=postgres

The standby server is on Amazon AWS, and I want to set up a backup Amazon S3 archive that gets its snapshot and WAL files from the standby in order to save bandwidth costs from the primary (don't want to copy the data twice).

Partially answering my own question, but I'm still confused. It looks to me like I have to create *two* stanzas for the same database on the standby server, because there are two repositories (the original from which it was created, and the S3 one). I need the first stanza because that's how the standby is created from the primary ("pgbackrest --stanza=emolecules1 restore"), and I need the second stanza to make a backup to s3 ("pgbackup --stanza=emolecules2 backup").

Pgbackrest doesn't seem to have an option to specify which repository to use, even though the repo-xxx options can be indexed, i.e. repo1-xxx, repo2-xxx, etc. Or maybe I overlooked something?

So here's my best guess. The master server configuration stays as shown above, but the slave server requires two stanzas:

[global]
  backup-user=postgres

[emolecules1]
  repo1-host=radon.emolecules.com
  pg1-path=/var/lib/postgresql/9.6/main
  recovery-option=standby_mode=on

[emolecules2]
  process-max=4
  repo1-type=s3
  repo1-cipher-type=none
  repo1-path=/production
  repo1-retention-diff=5
  repo1-retention-full=5
  repo1-s3-bucket=some-bucket-name-us-west-2
  repo1-s3-endpoint=s3.amazonaws.com
  repo1-s3-key=XXXXXXX
  repo1-s3-key-secret=xxxxxxxxxxxxxxxxxxxxx
  repo1-s3-region=us-west-2
  start-fast=y
  stop-auto=y

[global:archive-push]
  compress-level=3

And then I do the S3 backups using "--stanza=emolecules2".

Another point I'm not 100% sure of. On the standby server, do I just add

archive_command = 'pgbackrest --stanza=emolecules archive-push %p'

to the postgresql.conf file, just as I would on the primary? I.e. will this work, even though it's using streaming replication from the primary?

Thanks!
Craig

--
---------------------------------
Craig A. James
Chief Technology Officer
eMolecules, Inc.
---------------------------------

Re: pgbackrest: backup from standby

От
David Steele
Дата:
On 8/20/18 12:12 PM, Craig James wrote:
> The pgbackrest documentation has a section on backing up from a standby,
> but I'm having trouble figuring it out. I think it's just a terminology
> problem: I can't figure out which configuration directives go on which
> server.
> 
> To begin with, I already have streaming replication running:
> 
> Primary server (live database): "radon"
> 
>     [emolecules]
>       db-path=/var/lib/postgresql/9.6/main
> 
>     [global]
>       repo-path=/pg_archive/pgbackrest
>       retention-full=10
>       backup-user=postgres
>       log-level-file=detail
> 
> 
> Hot-standby server: "standby"
> 
>     [emolecules]
>       db-path=/var/lib/postgresql/9.6/main
>       recovery-option=standby_mode=on
> 
>      [global]
>       backup-host=radon.emolecules.com <http://radon.emolecules.com>
>       backup-user=postgres

It's much easier just to mount the nfs volume to the standby and then
you can use the same configuration on the primary and standby, though I
can see below you have reasons not to do that.

> The standby server is on Amazon AWS, and I want to set up a backup
> Amazon S3 archive that gets its snapshot and WAL files from the standby
> in order to save bandwidth costs from the primary (don't want to copy
> the data twice).

This is not supported.  If you enable "archive_mode=always" pgBackRest
will complain because there's no way to ensure that multiple server are
not archiving, especially on S3.

> Note that the primary is also archived to a local NFS disk. I want to
> keep that going, too, but create a second archive on S3. (The reason for
> this is that we're migrating from our own hardware to AWS, and during
> the transition need both systems fully functional.)

It's possible to backup/archive to multiple repos but the configuration
is complex.  I recommend that you migrate your pgBackRest repo to S3
first.  That goes like:

1) Sync the pgbackrest repo to S3 (using s3fs)
2) Stop archiving by setting archive_command=false
3) Sync the pgbackrest repo again to catch any files that were missed
the first time.
4) Reconfigure for S3 and enable archiving.

> There are sections in the user guide on /Backup from a Standby
> <https://pgbackrest.org/user-guide.html#standby-backup>/ and on /S3
> Support <https://pgbackrest.org/user-guide.html#s3-support>/, but I
> can't sort out how to combine them. Apparently I need to define
> pg1-host/pg1-path, and pg2-host/pg2-path, then a stanza for S3. But does
> all this go in the pgbackrest.conf of the primary ("radon" in my case),
> or on the standby, or somehow split across both?

You have an asymmetrical configuration here since there is not backup
server and shown in the pgBackRest docs.  That makes configuration quite
a bit harder and make failover harder.

> Another question: if I modify these files, will the current standby
> replica keep streaming properly? That is, can I add the S3 repository to
> the current primary/standby pair without having to start over?

Yes, but multiple repositories are not natively support at this time so
it requires multiple configurations and some care.

Regards,
-- 
-David
david@pgmasters.net


Re: pgbackrest: backup from standby

От
David Steele
Дата:
On 8/20/18 3:36 PM, David Steele wrote:
> On 8/20/18 12:12 PM, Craig James wrote:
> 
>> There are sections in the user guide on /Backup from a Standby
>> <https://pgbackrest.org/user-guide.html#standby-backup>/ and on /S3
>> Support <https://pgbackrest.org/user-guide.html#s3-support>/, but I
>> can't sort out how to combine them. Apparently I need to define
>> pg1-host/pg1-path, and pg2-host/pg2-path, then a stanza for S3. But does
>> all this go in the pgbackrest.conf of the primary ("radon" in my case),
>> or on the standby, or somehow split across both?
> 
> You have an asymmetrical configuration here since there is not backup
> server and shown in the pgBackRest docs.  That makes configuration quite
> a bit harder and make failover harder.

Sorry, this was a bit garbled.  I meant to say:

You have an asymmetrical configuration here since there is no repo
server as shown in the pgBackRest docs.  That makes configuration quite
a bit harder and makes failover harder.

In the end, I think your migration would be made simpler by migrating
the pgBackRest repository before the database.

Is saving bandwidth an overriding concern or just a nice to have?

Regards,
-- 
-David
david@pgmasters.net


Re: pgbackrest: backup from standby

От
Craig James
Дата:
Hi David,

On Mon, Aug 20, 2018 at 12:36 PM, David Steele <david@pgmasters.net> wrote:
On 8/20/18 12:12 PM, Craig James wrote:
> The pgbackrest documentation has a section on backing up from a standby,
> but I'm having trouble figuring it out. I think it's just a terminology
> problem: I can't figure out which configuration directives go on which
> server.
>
> To begin with, I already have streaming replication running:
>
> Primary server (live database): "radon"
>
>     [emolecules]
>       db-path=/var/lib/postgresql/9.6/main
>
>     [global]
>       repo-path=/pg_archive/pgbackrest
>       retention-full=10
>       backup-user=postgres
>       log-level-file=detail
>
>
> Hot-standby server: "standby"
>
>     [emolecules]
>       db-path=/var/lib/postgresql/9.6/main
>       recovery-option=standby_mode=on
>
>      [global]
>       backup-host=radon.emolecules.com <http://radon.emolecules.com>
>       backup-user=postgres

...
It's possible to backup/archive to multiple repos but the configuration
is complex.  I recommend that you migrate your pgBackRest repo to S3
first.  That goes like:

1) Sync the pgbackrest repo to S3 (using s3fs)
2) Stop archiving by setting archive_command=false
3) Sync the pgbackrest repo again to catch any files that were missed
the first time.
4) Reconfigure for S3 and enable archiving.

Just to make it clear what's going on. Black is what we have now. Red is what I was hoping to add. Blue is added when we switch to AWS and shut off the current servers.

primary ----+-----> NFS archive    |
   |        |                      |  local (non-AWS) servers
   |        +-----> Standby #1     |
   |
   V
 Standby #2 +-----> S3 Archive     |
                                |  Amazon AWS & S3
            +-----> New Standby    |

The goal is getting from here to there, while keeping the web site going and orders flowing, while never having less than one hot-standby server and one full archive at all times.

As you can see from my primitive diagram, the primary server is also backing up to a local hot-standby server and to an NFS archive.

One concern: If you recall from a previous pgbackrest bug we encountered, we have hundreds of thousands of objects in the database, many of which are very small. So that's hundreds of thousands of files that must be copied to S3. My understanding is that there is significant round-trip and file-creation latency for every S3 file. Presumably that latency will be minimized between AWS-to-S3 activity, but could be bad from some-other-site-to-S3 (i.e. from our current primary server).

That's one of the reasons I was hoping to backup from the standby server (AWS) to S3 rather than from the primary server to S3. 
 
> There are sections in the user guide on /Backup from a Standby
> <https://pgbackrest.org/user-guide.html#standby-backup>/ and on /S3
> Support <https://pgbackrest.org/user-guide.html#s3-support>/, but I
> can't sort out how to combine them. Apparently I need to define
> pg1-host/pg1-path, and pg2-host/pg2-path, then a stanza for S3. But does
> all this go in the pgbackrest.conf of the primary ("radon" in my case),
> or on the standby, or somehow split across both?

You have an asymmetrical configuration here since there is not backup
server and shown in the pgBackRest docs.  That makes configuration quite
a bit harder and make failover harder.

It's AWS, so it's easy to spin up a small server to just be a repo manager. But I worry that it couldn't handle the existing standby server since all the traffic would have to go local-->AWS-->local to get from the primary over to the current standby server.

I'm open to any and all suggestions.

One possibility: Does pgbackrest support chained standby servers?

primary ----> standby #1 ----> standby #2

If so, we could do this:
1. Set up standby #1 and #2 on AWS, but no S3 yet
2. Prepare config for S3, but don't deploy
3. Failover to standby #1, making it the primary
4. Start the S3 backup

This would give us a short time with no archive/PITR capability, but we could do it during a maintenance window when no orders are expected, and (hopefully) the S3 archive would finish in less than 24 hours.

Thanks very much for your advice!
Craig

Re: pgbackrest: backup from standby

От
David Steele
Дата:
On 8/20/18 9:42 PM, Craig James wrote:
> 
> One possibility: Does pgbackrest support chained standby servers?
> 
>     primary ----> standby #1 ----> standby #2

This won't matter to pgBackRest and Postgres certainly supports it.

> If so, we could do this:
> 
>     1. Set up standby #1 and #2 on AWS, but no S3 yet
>     2. Prepare config for S3, but don't deploy
>     3. Failover to standby #1, making it the primary
>     4. Start the S3 backup

Or combine my earlier plan with your plan to allow you to keep your repo
and have uninterrupted PITR:

Before the migration:

1) Stop backups
2) Sync the NFS repo to S3 using rsync/s3fs or aws cli

During the migration:

3) Failover to new AWS primary but leave archive_command disabled on new
primary (archive_command = false)
4) Resync the NFS repo to S3 to get new WAL segments
5) Enable archive command on AWS primary
6) Perform a new backup directly to S3 (just in case was mistake was
made syncing the repo)

Depending on your WAL volume you can add some syncs in there to reduce
the time when the new primary is not archiving, maybe after step #2 and
again before step #3.

This leaves you with your old backups and PITR capability while only
copying data local->S3 once.

Regards,
-- 
-David
david@pgmasters.net


Re: pgbackrest: backup from standby

От
Craig James
Дата:

On Wed, Aug 22, 2018 at 8:37 AM, David Steele <david@pgmasters.net> wrote:
On 8/20/18 9:42 PM, Craig James wrote:
>
> One possibility: Does pgbackrest support chained standby servers?
>
>     primary ----> standby #1 ----> standby #2

This won't matter to pgBackRest and Postgres certainly supports it.

> If so, we could do this:
>
>     1. Set up standby #1 and #2 on AWS, but no S3 yet
>     2. Prepare config for S3, but don't deploy
>     3. Failover to standby #1, making it the primary
>     4. Start the S3 backup

Or combine my earlier plan with your plan to allow you to keep your repo
and have uninterrupted PITR:

Before the migration:

1) Stop backups
2) Sync the NFS repo to S3 using rsync/s3fs or aws cli

During the migration:

3) Failover to new AWS primary but leave archive_command disabled on new
primary (archive_command = false)
4) Resync the NFS repo to S3 to get new WAL segments
5) Enable archive command on AWS primary
6) Perform a new backup directly to S3 (just in case was mistake was
made syncing the repo)

Depending on your WAL volume you can add some syncs in there to reduce
the time when the new primary is not archiving, maybe after step #2 and
again before step #3.

This leaves you with your old backups and PITR capability while only
copying data local->S3 once.

This is a great plan. Thanks very much for your help.

Craig
 

Regards,
--
-David
david@pgmasters.net



--
---------------------------------
Craig A. James
Chief Technology Officer
eMolecules, Inc.
---------------------------------

Re: pgbackrest: backup from standby

От
David Steele
Дата:
On 8/22/18 11:42 AM, Craig James wrote:
> 
> This is a great plan. Thanks very much for your help.

You're welcome!  I have used this migration plan before so I'm confident
it will work and it has the benefit of being relatively simple.

Regards,
-- 
-David
david@pgmasters.net