Обсуждение: backup postgresql with snapshot AWS

Поиск
Список
Период
Сортировка

backup postgresql with snapshot AWS

От
flumbador@virgilio.it
Дата:

Hi

I have a postgresql in Cloud AWS and I want to build the backup strategy described below.

The postgresql has PGDATA, Tablespace and XLOG on a Physical Volume, mapping a single EBS disk.
The WAL archive filesystem is on a separate Physical Volume, mapping a separate EBS disk.
OS is SLES 11sp4 and PG is 9.4

The Backup procedure is:

1) Start Backup
2) "AWS EBS Snapshot" of the Physical Volume containing PGDATA, Tablespace and XLOG
3) Stop Backup
4) "AWS EBS Snapshot" of the Physical Volume containing WAL archived

To allow PITR the WAL archive are also archived with a filesystem backup

Is someone using a similar backup strategy?
Are there tips or warning that I should be aware of?


Thanks & Regards
Gabriele Fagnani

Re: backup postgresql with snapshot AWS

От
David Steele
Дата:
On 1/19/18 8:32 AM, flumbador@virgilio.it wrote:
> 
> I have a postgresql in Cloud AWS and I want to build the backup strategy
> described below.
> 
> The postgresql has PGDATA, Tablespace and XLOG on a Physical Volume,
> mapping a single EBS disk.
> The WAL archive filesystem is on a separate Physical Volume, mapping a
> separate EBS disk.
> OS is SLES 11sp4 and PG is 9.4
> 
> The Backup procedure is:
> 
> 1) Start Backup
> 2) "AWS EBS Snapshot" of the Physical Volume containing PGDATA,
> Tablespace and XLOG
> 3) Stop Backup
> 4) "AWS EBS Snapshot" of the Physical Volume containing WAL archived
> 
> To allow PITR the WAL archive are also archived with a filesystem backup
> 
> Is someone using a similar backup strategy?

A strategy like this is posted to the lists from time to time.

> Are there tips or warning that I should be aware of?

Rolling your own backup solution is generally not a good idea.  It's a
lot harder to get right than it seems and an incorrect solution is often
not discovered until it is too late.

Since you are on AWS you should consider using a backup solution like
pgBackRest or WAL-G that support backups to S3.

Regards,
-- 
-David
david@pgmasters.net


Re: backup postgresql with snapshot AWS

От
Karl Denninger
Дата:
On 1/19/2018 08:00, David Steele wrote:
On 1/19/18 8:32 AM, flumbador@virgilio.it wrote:
I have a postgresql in Cloud AWS and I want to build the backup strategy
described below.

The postgresql has PGDATA, Tablespace and XLOG on a Physical Volume,
mapping a single EBS disk.
The WAL archive filesystem is on a separate Physical Volume, mapping a
separate EBS disk.
OS is SLES 11sp4 and PG is 9.4

The Backup procedure is:

1) Start Backup
2) "AWS EBS Snapshot" of the Physical Volume containing PGDATA,
Tablespace and XLOG
3) Stop Backup
4) "AWS EBS Snapshot" of the Physical Volume containing WAL archived

To allow PITR the WAL archive are also archived with a filesystem backup

Is someone using a similar backup strategy?
A strategy like this is posted to the lists from time to time.

Are there tips or warning that I should be aware of?
Rolling your own backup solution is generally not a good idea.  It's a
lot harder to get right than it seems and an incorrect solution is often
not discovered until it is too late.

Since you are on AWS you should consider using a backup solution like
pgBackRest or WAL-G that support backups to S3.

Regards,
Seconded.

Clouded-based "snapshots" are not necessarily atomic.  On a *local* disk (on your own hardware) a zfs snapshot where *all* of the postgres data is on the same filesystem being snapshotted (NOT descendants of it) is probably safe, because the operation is allegedly atomic.  So in such an environment a "zfs snapshot" followed by "zfs send" should be ok.

But this is not necessarily true in the case of a cloud-based storage system no matter who's it is, and if it's not true then you are likely to find out the hard way that your "snapshot" was really not much different than a live block copy or "rsync" equivalent -- which is definitely *not* safe.

The problem with believing that these tools are atomic in a cloud environment is that when you find out they're not it's usually after some sort of event where you really *need* that backup to be good.

IMHO you need to use the postgres tools to either make/stream a backup so the database can do its thing to make sure you have a consistent copy or use pg_dump and friends (e.g. pg_dumpall.)

--
Karl Denninger
karl@denninger.net
The Market Ticker
[S/MIME encrypted email preferred]
Вложения

Re: backup postgresql with snapshot AWS

От
Rui DeSousa
Дата:
I find the opposite to be true… if you don’t know how to backup Postgres without using a third party util then you are
atthe mercy of that tool.  It is fundament knowledge that one should know if you’re running Postgres. 

If the snapshot is not atomic and/or using multiple filesystems; that is fine as long as it’s issued and completed
withinthe pg_start_backup() and pg_stop_backup() calls.  



Re: backup postgresql with snapshot AWS

От
Achilleas Mantzios
Дата:
On 19/01/2018 17:40, Rui DeSousa wrote:

> If the snapshot is not atomic and/or using multiple filesystems; that is fine as long as it’s issued and completed
withinthe pg_start_backup() and pg_stop_backup() calls.
 
And as long as the needed wals belonging to that interval between start and stop are to be found in the archive when
theproblem arises. So the whole system must be well documented and periodically 
 
tested.
>

-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt



Re: backup postgresql with snapshot AWS

От
Karl Denninger
Дата:
On 1/19/2018 10:23, Achilleas Mantzios wrote:
On 19/01/2018 17:40, Rui DeSousa wrote:

If the snapshot is not atomic and/or using multiple filesystems; that is fine as long as it’s issued and completed within the pg_start_backup() and pg_stop_backup() calls.
And as long as the needed wals belonging to that interval between start and stop are to be found in the archive when the problem arises. So the whole system must be well documented and periodically tested.

The latter is the problem (e.g. when it says "complete" it really is done and it really is safe to issue the pg_stop_backup() and grab the WAL) and further you are relying on an assumption of atomicity within the I/O subsystem when it is requested by the application within your VM (in which it's reasonable to believe that when fsync() returns it really HAS committed the data from the perspective of said VM) yet you're doing some other operation *behind* the VM on which your application is running.

IMHO the more you intertwine things you cannot be certain of because they operate outside of your particular VM's space with a process that relies on them the more you are risking finding out that your assumptions are wrong the hard way.  You *shouldn't* get bit by this in a well-designed cloud or cluster system but that's not an acceptable word to describe the expected results when it's important.

--
Karl Denninger
karl@denninger.net
The Market Ticker
[S/MIME encrypted email preferred]
Вложения

Re: backup postgresql with snapshot AWS

От
Rui DeSousa
Дата:
>
> IMHO the more you intertwine things you cannot be certain of because they operate outside of your particular VM's
spacewith a process that relies on them the more you are risking finding out that your assumptions are wrong the hard
way. You *shouldn't* get bit by this in a well-designed cloud or cluster system but that's not an acceptable word to
describethe expected results when it's important. 

One needs to understand their technology stack. You need to know where your data is after a sync() call… is it cached
inthe controller, some other application layer, or safely on disk? You also need to understand the relationship between
luns,volume managers, and filesystems and where your atomic operation are in your setup.    

i.e. If the filesystem spans multiple EBS volumes then you can’t use EBS snapshots otherwise you end up with a
corruptedfilesystem/database; however, if you use one filesystem per EBS volume then EBS snapshots work fine.  Amazon’s
ownRDS service uses EBS snapshots for backups. 

Re: backup postgresql with snapshot AWS

От
Pritam Barhate
Дата:
>>  Amazon’s own RDS service uses EBS snapshots for backups.

Is this documented somewhere? A link to official AWS docs pointing this out will be much appreciated. 

Re: backup postgresql with snapshot AWS

От
Rui DeSousa
Дата:
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_CommonTasks.BackupRestore.html

On Jan 20, 2018, at 2:16 AM, Pritam Barhate <pritambarhate@gmail.com> wrote:

>>  Amazon’s own RDS service uses EBS snapshots for backups.

Is this documented somewhere? A link to official AWS docs pointing this out will be much appreciated. 

Re: backup postgresql with snapshot AWS

От
flumbador@virgilio.it
Дата:
Hi
I confirm that we have only one EBS volume, but Postgresql spans multiple filesystem on that EBS volume.
Filesystems are:

/pgsql/pdbabe/data      => PGDATA
/pgsql/pdbabe/tsdata    => Tablespace Data
/pgsql/pdbabe/tsindex   => Tablespace Index
/pgsql/pdbabe/tstemp    => Tablespace temporary
/pgsql/pdbabe/xlog      => pg_xlog

But all on the same EBS Volume.
Does snapshot work fine in this scenario?

Thanks & Regards
Gabriele

> Il 19 gennaio 2018 alle 20.39 Rui DeSousa <rui.desousa@icloud.com> ha scritto:
>
>
>
> >
> > IMHO the more you intertwine things you cannot be certain of because they operate outside of your particular VM's
spacewith a process that relies on them the more you are risking finding out that your assumptions are wrong the hard
way. You *shouldn't* get bit by this in a well-designed cloud or cluster system but that's not an acceptable word to
describethe expected results when it's important. 
>
> One needs to understand their technology stack. You need to know where your data is after a sync() call… is it cached
inthe controller, some other application layer, or safely on disk? You also need to understand the relationship between
luns,volume managers, and filesystems and where your atomic operation are in your setup.    
>
> i.e. If the filesystem spans multiple EBS volumes then you can’t use EBS snapshots otherwise you end up with a
corruptedfilesystem/database; however, if you use one filesystem per EBS volume then EBS snapshots work fine.  Amazon’s
ownRDS service uses EBS snapshots for backups. 


Re: backup postgresql with snapshot AWS

От
Achilleas Mantzios
Дата:
On 23/01/2018 13:02, flumbador@virgilio.it wrote:
> Hi
> I confirm that we have only one EBS volume, but Postgresql spans multiple filesystem on that EBS volume.
> Filesystems are:
>
> /pgsql/pdbabe/data      => PGDATA
> /pgsql/pdbabe/tsdata    => Tablespace Data
> /pgsql/pdbabe/tsindex   => Tablespace Index
> /pgsql/pdbabe/tstemp    => Tablespace temporary
> /pgsql/pdbabe/xlog      => pg_xlog

filesystem != directory. Do a
% df /pgsql/pdbabe/* | sort -u
to show you all filesystems involved. Might be only one.

>
> But all on the same EBS Volume.
> Does snapshot work fine in this scenario?
>
> Thanks & Regards
> Gabriele
>
>> Il 19 gennaio 2018 alle 20.39 Rui DeSousa <rui.desousa@icloud.com> ha scritto:
>>
>>
>>
>>> IMHO the more you intertwine things you cannot be certain of because they operate outside of your particular VM's
spacewith a process that relies on them the more you are risking finding out that your assumptions are wrong the hard
way. You *shouldn't* get bit by this in a well-designed cloud or cluster system but that's not an acceptable word to
describethe expected results when it's important.
 
>> One needs to understand their technology stack. You need to know where your data is after a sync() call… is it
cachedin the controller, some other application layer, or safely on disk? You also need to understand the relationship
betweenluns, volume managers, and filesystems and where your atomic operation are in your setup.
 
>>
>> i.e. If the filesystem spans multiple EBS volumes then you can’t use EBS snapshots otherwise you end up with a
corruptedfilesystem/database; however, if you use one filesystem per EBS volume then EBS snapshots work fine.  Amazon’s
ownRDS service uses EBS snapshots for backups.
 


-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt



Re: backup postgresql with snapshot AWS

От
flumbador@virgilio.it
Дата:
Hi

pgpdbabe@elaawddbad00:/pgsql/pdbabe/data/pdbabe/pg_tblspc> df /pgsql/pdbabe/* | sort -u
/dev/mapper/pdbabe_2-lv_archive 102180192 60291608  36698136  63% /pgsql/pdbabe/archive
/dev/mapper/pdbabe_2-lv_staging  19610300  4810764  13803392  26% /pgsql/pdbabe/staging
/dev/mapper/pdbabe-lv_admin      10321208   241396   9555524   3% /pgsql/pdbabe/admin
/dev/mapper/pdbabe-lv_data       10321208   177628   9619292   2% /pgsql/pdbabe/data
/dev/mapper/pdbabe-lv_tsdata    258030980 45415480 199508300  19% /pgsql/pdbabe/tsdata
/dev/mapper/pdbabe-lv_tsindex   103212320   192268  97777172   1% /pgsql/pdbabe/tsindex
/dev/mapper/pdbabe-lv_tstemp     51606140   184268  48800432   1% /pgsql/pdbabe/tstemp
/dev/mapper/pdbabe-lv_xlog       51606140 16259012  32725688  34% /pgsql/pdbabe/xlog
Filesystem                      1K-blocks     Used Available Use% Mounted on

elaawddbad00:~ # pvs
  PV         VG       Fmt  Attr PSize   PFree
  /dev/sda2  system   lvm2 a--   50.00g 10.55g
  /dev/xvdc  pdbabe   lvm2 a--  510.00g 40.00g
  /dev/xvdd  pdbabe_2 lvm2 a--  120.00g  2.00g

Thanks & Regards
Gabriele

> Il 23 gennaio 2018 alle 12.08 Achilleas Mantzios <achill@matrix.gatewaynet.com> ha scritto:
>
>
> On 23/01/2018 13:02, flumbador@virgilio.it wrote:
> > Hi
> > I confirm that we have only one EBS volume, but Postgresql spans multiple filesystem on that EBS volume.
> > Filesystems are:
> >
> > /pgsql/pdbabe/data      => PGDATA
> > /pgsql/pdbabe/tsdata    => Tablespace Data
> > /pgsql/pdbabe/tsindex   => Tablespace Index
> > /pgsql/pdbabe/tstemp    => Tablespace temporary
> > /pgsql/pdbabe/xlog      => pg_xlog
>
> filesystem != directory. Do a
> % df /pgsql/pdbabe/* | sort -u
> to show you all filesystems involved. Might be only one.
>
> >
> > But all on the same EBS Volume.
> > Does snapshot work fine in this scenario?
> >
> > Thanks & Regards
> > Gabriele
> >
> >> Il 19 gennaio 2018 alle 20.39 Rui DeSousa <rui.desousa@icloud.com> ha scritto:
> >>
> >>
> >>
> >>> IMHO the more you intertwine things you cannot be certain of because they operate outside of your particular VM's
spacewith a process that relies on them the more you are risking finding out that your assumptions are wrong the hard
way. You *shouldn't* get bit by this in a well-designed cloud or cluster system but that's not an acceptable word to
describethe expected results when it's important. 
> >> One needs to understand their technology stack. You need to know where your data is after a sync() call… is it
cachedin the controller, some other application layer, or safely on disk? You also need to understand the relationship
betweenluns, volume managers, and filesystems and where your atomic operation are in your setup. 
> >>
> >> i.e. If the filesystem spans multiple EBS volumes then you can’t use EBS snapshots otherwise you end up with a
corruptedfilesystem/database; however, if you use one filesystem per EBS volume then EBS snapshots work fine.  Amazon’s
ownRDS service uses EBS snapshots for backups. 
>
>
> --
> Achilleas Mantzios
> IT DEV Lead
> IT DEPT
> Dynacom Tankers Mgmt
>
>


Re: backup postgresql with snapshot AWS

От
Rui DeSousa
Дата:
Yes the single EBS volume will work.  As long as a filesystem only spans a single EBS volume and in your case they all
sharethe same single EBS volume. 

If each had there own volume; that too would work using multiple snapshots as one coherent backup.

i.e.

pg_start_backup()

take snapshot: /pgsql/pdbabe/data
take snapshot: /pgsql/pdbabe/tsdata
take snapshot: /pgsql/pdbabe/tsindex
take snapshot: /pgsql/pdbabe/tstemp
take snapshot: /pgsql/pdbabe/xlog

Pg_stop_backup()

And of course the associated WALs and future WALs.