Обсуждение: [GENERAL] Google Cloud Platform, snapshots and WAL

Поиск
Список
Период
Сортировка

[GENERAL] Google Cloud Platform, snapshots and WAL

От
Moreno Andreo
Дата:
Hi everyone,

     I have my PostgreSQL 9.5 server running on a VM instance on Google
Compute Engine (Google Cloud Platform) on Debian Jessie (8.3), and I
have another dedicated VM instance that, every night at 3.00, takes a
snapshot of the whole disk, without stopping the PG instance itself.
Snapshots are stored and kept by Google in an incremental way, and we
keep the last 2 weeks of history.
The question is: Keeping all two weeks worth of pg_xlog files, I don't
think I still need a periodic pg_basebackup to perform PITR, do I?

Thanks in advance,
Moreno.



Re: [GENERAL] Google Cloud Platform, snapshots and WAL

От
Ben Chobot
Дата:

On Mar 20, 2017, at 6:31 AM, Moreno Andreo <moreno.andreo@evolu-s.it> wrote:

Hi everyone,

   I have my PostgreSQL 9.5 server running on a VM instance on Google Compute Engine (Google Cloud Platform) on Debian Jessie (8.3), and I have another dedicated VM instance that, every night at 3.00, takes a snapshot of the whole disk, without stopping the PG instance itself.
Snapshots are stored and kept by Google in an incremental way, and we keep the last 2 weeks of history.
The question is: Keeping all two weeks worth of pg_xlog files, I don't think I still need a periodic pg_basebackup to perform PITR, do I?

You need a base backup to apply your wals to. So long as you have one from after the start of your wal stream, you should be good for PITR. That said, replaying 2 weeks of wal files can take a long time. For that reason alone, it might well make sense to have more than a single basebackup snapshot.

Also, I cannot stress enough how important it is to actually test your recovery strategy. Few things are worse than assuming you can recover only to find out when you need to that you cannot. Do not let https://about.gitlab.com/2017/02/10/postmortem-of-database-outage-of-january-31/ happen to you.

Re: [GENERAL] Google Cloud Platform, snapshots and WAL

От
Moreno Andreo
Дата:
Il 20/03/2017 17:45, Ben Chobot ha scritto:

On Mar 20, 2017, at 6:31 AM, Moreno Andreo <moreno.andreo@evolu-s.it> wrote:

Hi everyone,

   I have my PostgreSQL 9.5 server running on a VM instance on Google Compute Engine (Google Cloud Platform) on Debian Jessie (8.3), and I have another dedicated VM instance that, every night at 3.00, takes a snapshot of the whole disk, without stopping the PG instance itself.
Snapshots are stored and kept by Google in an incremental way, and we keep the last 2 weeks of history.
The question is: Keeping all two weeks worth of pg_xlog files, I don't think I still need a periodic pg_basebackup to perform PITR, do I?

You need a base backup to apply your wals to. So long as you have one from after the start of your wal stream, you should be good for PITR.
Hmmm... I went back in the docs and noticed I missed something. To achieve PITR, the cluster needs to checkpoint, and this can be obtained with the pg_start_backup() function... so if I try to get a snapshot and start a recover (creating recovery.conf etc.) it will not even start recovery, right?

Now I'm gonna try 2 approaches:
1. (straightforward) barman with basebackup and WAL archiving
2. (GoogleCloud-oriented) disk snapshot between pg_start_backup and pg_stop_backup (so the snapshot is taken just after the checkpoint), WAL archiving

I will report the results.

That said, replaying 2 weeks of wal files can take a long time. For that reason alone, it might well make sense to have more than a single basebackup snapshot.
That's right, my (wrong) thought was to have a snapshot per day and all 14 days worth of WALs, but after the meeting with the Google Specialist, I'm oriented to make a base backup per day and then store it on Nearline, making it expire after 14 days. Same for WAL files.

Also, I cannot stress enough how important it is to actually test your recovery strategy.
I totally agree... that's why I'm here. I don't want to prepare a backup strategy when I already need to recover....

Thanks
Moreno.