Обсуждение: backup postgresql with snapshot AWS
Hi
I have a postgresql in Cloud AWS and I want to build the backup strategy described below.
The postgresql has PGDATA, Tablespace and XLOG on a Physical Volume, mapping a single EBS disk.
The WAL archive filesystem is on a separate Physical Volume, mapping a separate EBS disk.
OS is SLES 11sp4 and PG is 9.4
The Backup procedure is:
1) Start Backup
2) "AWS EBS Snapshot" of the Physical Volume containing PGDATA, Tablespace and XLOG
3) Stop Backup
4) "AWS EBS Snapshot" of the Physical Volume containing WAL archived
To allow PITR the WAL archive are also archived with a filesystem backup
Is someone using a similar backup strategy?
Are there tips or warning that I should be aware of?
Thanks & Regards
Gabriele Fagnani
On 1/19/18 8:32 AM, flumbador@virgilio.it wrote: > > I have a postgresql in Cloud AWS and I want to build the backup strategy > described below. > > The postgresql has PGDATA, Tablespace and XLOG on a Physical Volume, > mapping a single EBS disk. > The WAL archive filesystem is on a separate Physical Volume, mapping a > separate EBS disk. > OS is SLES 11sp4 and PG is 9.4 > > The Backup procedure is: > > 1) Start Backup > 2) "AWS EBS Snapshot" of the Physical Volume containing PGDATA, > Tablespace and XLOG > 3) Stop Backup > 4) "AWS EBS Snapshot" of the Physical Volume containing WAL archived > > To allow PITR the WAL archive are also archived with a filesystem backup > > Is someone using a similar backup strategy? A strategy like this is posted to the lists from time to time. > Are there tips or warning that I should be aware of? Rolling your own backup solution is generally not a good idea. It's a lot harder to get right than it seems and an incorrect solution is often not discovered until it is too late. Since you are on AWS you should consider using a backup solution like pgBackRest or WAL-G that support backups to S3. Regards, -- -David david@pgmasters.net
On 1/19/2018 08:00, David Steele wrote:
Clouded-based "snapshots" are not necessarily atomic. On a *local* disk (on your own hardware) a zfs snapshot where *all* of the postgres data is on the same filesystem being snapshotted (NOT descendants of it) is probably safe, because the operation is allegedly atomic. So in such an environment a "zfs snapshot" followed by "zfs send" should be ok.
But this is not necessarily true in the case of a cloud-based storage system no matter who's it is, and if it's not true then you are likely to find out the hard way that your "snapshot" was really not much different than a live block copy or "rsync" equivalent -- which is definitely *not* safe.
The problem with believing that these tools are atomic in a cloud environment is that when you find out they're not it's usually after some sort of event where you really *need* that backup to be good.
IMHO you need to use the postgres tools to either make/stream a backup so the database can do its thing to make sure you have a consistent copy or use pg_dump and friends (e.g. pg_dumpall.)
Seconded.On 1/19/18 8:32 AM, flumbador@virgilio.it wrote:I have a postgresql in Cloud AWS and I want to build the backup strategy described below. The postgresql has PGDATA, Tablespace and XLOG on a Physical Volume, mapping a single EBS disk. The WAL archive filesystem is on a separate Physical Volume, mapping a separate EBS disk. OS is SLES 11sp4 and PG is 9.4 The Backup procedure is: 1) Start Backup 2) "AWS EBS Snapshot" of the Physical Volume containing PGDATA, Tablespace and XLOG 3) Stop Backup 4) "AWS EBS Snapshot" of the Physical Volume containing WAL archived To allow PITR the WAL archive are also archived with a filesystem backup Is someone using a similar backup strategy?A strategy like this is posted to the lists from time to time.Are there tips or warning that I should be aware of?Rolling your own backup solution is generally not a good idea. It's a lot harder to get right than it seems and an incorrect solution is often not discovered until it is too late. Since you are on AWS you should consider using a backup solution like pgBackRest or WAL-G that support backups to S3. Regards,
Clouded-based "snapshots" are not necessarily atomic. On a *local* disk (on your own hardware) a zfs snapshot where *all* of the postgres data is on the same filesystem being snapshotted (NOT descendants of it) is probably safe, because the operation is allegedly atomic. So in such an environment a "zfs snapshot" followed by "zfs send" should be ok.
But this is not necessarily true in the case of a cloud-based storage system no matter who's it is, and if it's not true then you are likely to find out the hard way that your "snapshot" was really not much different than a live block copy or "rsync" equivalent -- which is definitely *not* safe.
The problem with believing that these tools are atomic in a cloud environment is that when you find out they're not it's usually after some sort of event where you really *need* that backup to be good.
IMHO you need to use the postgres tools to either make/stream a backup so the database can do its thing to make sure you have a consistent copy or use pg_dump and friends (e.g. pg_dumpall.)
Вложения
I find the opposite to be true… if you don’t know how to backup Postgres without using a third party util then you are atthe mercy of that tool. It is fundament knowledge that one should know if you’re running Postgres. If the snapshot is not atomic and/or using multiple filesystems; that is fine as long as it’s issued and completed withinthe pg_start_backup() and pg_stop_backup() calls.
On 19/01/2018 17:40, Rui DeSousa wrote: > If the snapshot is not atomic and/or using multiple filesystems; that is fine as long as it’s issued and completed withinthe pg_start_backup() and pg_stop_backup() calls. And as long as the needed wals belonging to that interval between start and stop are to be found in the archive when theproblem arises. So the whole system must be well documented and periodically tested. > -- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
On 1/19/2018 10:23, Achilleas Mantzios wrote:
IMHO the more you intertwine things you cannot be certain of because they operate outside of your particular VM's space with a process that relies on them the more you are risking finding out that your assumptions are wrong the hard way. You *shouldn't* get bit by this in a well-designed cloud or cluster system but that's not an acceptable word to describe the expected results when it's important.
On 19/01/2018 17:40, Rui DeSousa wrote:The latter is the problem (e.g. when it says "complete" it really is done and it really is safe to issue the pg_stop_backup() and grab the WAL) and further you are relying on an assumption of atomicity within the I/O subsystem when it is requested by the application within your VM (in which it's reasonable to believe that when fsync() returns it really HAS committed the data from the perspective of said VM) yet you're doing some other operation *behind* the VM on which your application is running.If the snapshot is not atomic and/or using multiple filesystems; that is fine as long as it’s issued and completed within the pg_start_backup() and pg_stop_backup() calls.And as long as the needed wals belonging to that interval between start and stop are to be found in the archive when the problem arises. So the whole system must be well documented and periodically tested.
IMHO the more you intertwine things you cannot be certain of because they operate outside of your particular VM's space with a process that relies on them the more you are risking finding out that your assumptions are wrong the hard way. You *shouldn't* get bit by this in a well-designed cloud or cluster system but that's not an acceptable word to describe the expected results when it's important.
Вложения
> > IMHO the more you intertwine things you cannot be certain of because they operate outside of your particular VM's spacewith a process that relies on them the more you are risking finding out that your assumptions are wrong the hard way. You *shouldn't* get bit by this in a well-designed cloud or cluster system but that's not an acceptable word to describethe expected results when it's important. One needs to understand their technology stack. You need to know where your data is after a sync() call… is it cached inthe controller, some other application layer, or safely on disk? You also need to understand the relationship between luns,volume managers, and filesystems and where your atomic operation are in your setup. i.e. If the filesystem spans multiple EBS volumes then you can’t use EBS snapshots otherwise you end up with a corruptedfilesystem/database; however, if you use one filesystem per EBS volume then EBS snapshots work fine. Amazon’s ownRDS service uses EBS snapshots for backups.
>> Amazon’s own RDS service uses EBS snapshots for backups.
Is this documented somewhere? A link to official AWS docs pointing this out will be much appreciated.
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_CommonTasks.BackupRestore.html
On Jan 20, 2018, at 2:16 AM, Pritam Barhate <pritambarhate@gmail.com> wrote:>> Amazon’s own RDS service uses EBS snapshots for backups.Is this documented somewhere? A link to official AWS docs pointing this out will be much appreciated.
Hi I confirm that we have only one EBS volume, but Postgresql spans multiple filesystem on that EBS volume. Filesystems are: /pgsql/pdbabe/data => PGDATA /pgsql/pdbabe/tsdata => Tablespace Data /pgsql/pdbabe/tsindex => Tablespace Index /pgsql/pdbabe/tstemp => Tablespace temporary /pgsql/pdbabe/xlog => pg_xlog But all on the same EBS Volume. Does snapshot work fine in this scenario? Thanks & Regards Gabriele > Il 19 gennaio 2018 alle 20.39 Rui DeSousa <rui.desousa@icloud.com> ha scritto: > > > > > > > IMHO the more you intertwine things you cannot be certain of because they operate outside of your particular VM's spacewith a process that relies on them the more you are risking finding out that your assumptions are wrong the hard way. You *shouldn't* get bit by this in a well-designed cloud or cluster system but that's not an acceptable word to describethe expected results when it's important. > > One needs to understand their technology stack. You need to know where your data is after a sync() call… is it cached inthe controller, some other application layer, or safely on disk? You also need to understand the relationship between luns,volume managers, and filesystems and where your atomic operation are in your setup. > > i.e. If the filesystem spans multiple EBS volumes then you can’t use EBS snapshots otherwise you end up with a corruptedfilesystem/database; however, if you use one filesystem per EBS volume then EBS snapshots work fine. Amazon’s ownRDS service uses EBS snapshots for backups.
On 23/01/2018 13:02, flumbador@virgilio.it wrote: > Hi > I confirm that we have only one EBS volume, but Postgresql spans multiple filesystem on that EBS volume. > Filesystems are: > > /pgsql/pdbabe/data => PGDATA > /pgsql/pdbabe/tsdata => Tablespace Data > /pgsql/pdbabe/tsindex => Tablespace Index > /pgsql/pdbabe/tstemp => Tablespace temporary > /pgsql/pdbabe/xlog => pg_xlog filesystem != directory. Do a % df /pgsql/pdbabe/* | sort -u to show you all filesystems involved. Might be only one. > > But all on the same EBS Volume. > Does snapshot work fine in this scenario? > > Thanks & Regards > Gabriele > >> Il 19 gennaio 2018 alle 20.39 Rui DeSousa <rui.desousa@icloud.com> ha scritto: >> >> >> >>> IMHO the more you intertwine things you cannot be certain of because they operate outside of your particular VM's spacewith a process that relies on them the more you are risking finding out that your assumptions are wrong the hard way. You *shouldn't* get bit by this in a well-designed cloud or cluster system but that's not an acceptable word to describethe expected results when it's important. >> One needs to understand their technology stack. You need to know where your data is after a sync() call… is it cachedin the controller, some other application layer, or safely on disk? You also need to understand the relationship betweenluns, volume managers, and filesystems and where your atomic operation are in your setup. >> >> i.e. If the filesystem spans multiple EBS volumes then you can’t use EBS snapshots otherwise you end up with a corruptedfilesystem/database; however, if you use one filesystem per EBS volume then EBS snapshots work fine. Amazon’s ownRDS service uses EBS snapshots for backups. -- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
Hi pgpdbabe@elaawddbad00:/pgsql/pdbabe/data/pdbabe/pg_tblspc> df /pgsql/pdbabe/* | sort -u /dev/mapper/pdbabe_2-lv_archive 102180192 60291608 36698136 63% /pgsql/pdbabe/archive /dev/mapper/pdbabe_2-lv_staging 19610300 4810764 13803392 26% /pgsql/pdbabe/staging /dev/mapper/pdbabe-lv_admin 10321208 241396 9555524 3% /pgsql/pdbabe/admin /dev/mapper/pdbabe-lv_data 10321208 177628 9619292 2% /pgsql/pdbabe/data /dev/mapper/pdbabe-lv_tsdata 258030980 45415480 199508300 19% /pgsql/pdbabe/tsdata /dev/mapper/pdbabe-lv_tsindex 103212320 192268 97777172 1% /pgsql/pdbabe/tsindex /dev/mapper/pdbabe-lv_tstemp 51606140 184268 48800432 1% /pgsql/pdbabe/tstemp /dev/mapper/pdbabe-lv_xlog 51606140 16259012 32725688 34% /pgsql/pdbabe/xlog Filesystem 1K-blocks Used Available Use% Mounted on elaawddbad00:~ # pvs PV VG Fmt Attr PSize PFree /dev/sda2 system lvm2 a-- 50.00g 10.55g /dev/xvdc pdbabe lvm2 a-- 510.00g 40.00g /dev/xvdd pdbabe_2 lvm2 a-- 120.00g 2.00g Thanks & Regards Gabriele > Il 23 gennaio 2018 alle 12.08 Achilleas Mantzios <achill@matrix.gatewaynet.com> ha scritto: > > > On 23/01/2018 13:02, flumbador@virgilio.it wrote: > > Hi > > I confirm that we have only one EBS volume, but Postgresql spans multiple filesystem on that EBS volume. > > Filesystems are: > > > > /pgsql/pdbabe/data => PGDATA > > /pgsql/pdbabe/tsdata => Tablespace Data > > /pgsql/pdbabe/tsindex => Tablespace Index > > /pgsql/pdbabe/tstemp => Tablespace temporary > > /pgsql/pdbabe/xlog => pg_xlog > > filesystem != directory. Do a > % df /pgsql/pdbabe/* | sort -u > to show you all filesystems involved. Might be only one. > > > > > But all on the same EBS Volume. > > Does snapshot work fine in this scenario? > > > > Thanks & Regards > > Gabriele > > > >> Il 19 gennaio 2018 alle 20.39 Rui DeSousa <rui.desousa@icloud.com> ha scritto: > >> > >> > >> > >>> IMHO the more you intertwine things you cannot be certain of because they operate outside of your particular VM's spacewith a process that relies on them the more you are risking finding out that your assumptions are wrong the hard way. You *shouldn't* get bit by this in a well-designed cloud or cluster system but that's not an acceptable word to describethe expected results when it's important. > >> One needs to understand their technology stack. You need to know where your data is after a sync() call… is it cachedin the controller, some other application layer, or safely on disk? You also need to understand the relationship betweenluns, volume managers, and filesystems and where your atomic operation are in your setup. > >> > >> i.e. If the filesystem spans multiple EBS volumes then you can’t use EBS snapshots otherwise you end up with a corruptedfilesystem/database; however, if you use one filesystem per EBS volume then EBS snapshots work fine. Amazon’s ownRDS service uses EBS snapshots for backups. > > > -- > Achilleas Mantzios > IT DEV Lead > IT DEPT > Dynacom Tankers Mgmt > >
Yes the single EBS volume will work. As long as a filesystem only spans a single EBS volume and in your case they all sharethe same single EBS volume. If each had there own volume; that too would work using multiple snapshots as one coherent backup. i.e. pg_start_backup() take snapshot: /pgsql/pdbabe/data take snapshot: /pgsql/pdbabe/tsdata take snapshot: /pgsql/pdbabe/tsindex take snapshot: /pgsql/pdbabe/tstemp take snapshot: /pgsql/pdbabe/xlog Pg_stop_backup() And of course the associated WALs and future WALs.