Обсуждение: Netapp SnapCenter

Поиск
Список
Период
Сортировка

Netapp SnapCenter

От
Paul Förster
Дата:
Hi,

first of all, it wasn't my glorious idea and I know that volume snapshots are no backups. :-)

Leaving that aside: Our company has decided to move away from Netapp SnapCreator to its SnapCenter. I was told there is
somecommunity plugin for PostgreSQL to use with SnapCenter but can't find any information that makes sense. 

The only thing I found is:

https://automationstore.netapp.com/snappack-listNC.shtml

But it's from Oct. 3rd 2018, for PostgreSQL 9.x and 10.x and also requires JDK 1.8 or later.

If this works at all, I consider these version limits (i.e. development status) as ugly, because more or less in limbo.

Does anyone know of a solution for SnapCenter that works, is actively supported and can be used with current versions
ofPostgreSQL (i.e. 11.x, 12.x and soon to come 13.x)? 

Any idea would be greatly appreciated. Thanks in advance.

Cheers,
Paul


RE: Netapp SnapCenter

От
"Wolff, Ken L"
Дата:

We got the following from one of our NetApp contacts regarding a similar question.  I’d be interested to hear if anyone feels the steps mentioned are not accurate:

 

There is presently no NetApp supported plugin for PostgreSQL for SnapCenter.  There is a community support plugin for PostgreSQL, which I have no experience with, and it hasn’t been updated since 2018, so not sure if it’s worth pursuing if it’s not actively being maintained.

 

Documentation for backup and recovery of PostgreSQL is generally missing.  It’s not a SnapCenter issue, it’s PostreSQL itself.  It also doesn’t need a lot of documentation because backups are easy.  PostgreSQL doesn’t need to be in a special mode for backups to work.

 

  1. Place data files on volume 1
  2. Place Logs on volume 2
  3. Schedule ONTAP snapshots on volume 1 at 00:05
  4. Schedule ONTAP snapshots on volume 2 at 00:10

 

There you go.  All you really need is a data file snapshot before the log snapshot.  During recovery, PostgreSQL will figure out what logs are required to make the database consistent.

 

 

Ken

 

-----Original Message-----
From: Paul Förster <paul.foerster@gmail.com>
Sent: Thursday, June 18, 2020 4:56 AM
To: pgsql-general <pgsql-general@postgresql.org>
Subject: EXTERNAL: Netapp SnapCenter

 

Hi,

 

first of all, it wasn't my glorious idea and I know that volume snapshots are no backups. :-)

 

Leaving that aside: Our company has decided to move away from Netapp SnapCreator to its SnapCenter. I was told there is some community plugin for PostgreSQL to use with SnapCenter but can't find any information that makes sense.

 

The only thing I found is:

 

https://automationstore.netapp.com/snappack-listNC.shtml

 

But it's from Oct. 3rd 2018, for PostgreSQL 9.x and 10.x and also requires JDK 1.8 or later.

 

If this works at all, I consider these version limits (i.e. development status) as ugly, because more or less in limbo.

 

Does anyone know of a solution for SnapCenter that works, is actively supported and can be used with current versions of PostgreSQL (i.e. 11.x, 12.x and soon to come 13.x)?

 

Any idea would be greatly appreciated. Thanks in advance.

 

Cheers,

Paul

 

Re: Netapp SnapCenter

От
Paul Förster
Дата:
Hi Ken,

> On 18. Jun, 2020, at 15:56, Wolff, Ken L <ken.l.wolff@lmco.com> wrote:
> PostgreSQL doesn’t need to be in a special mode for backups to work.

this is curious. Doesn't the PostgreSQL cluster need to be set to backup mode to use SnapCenter?

The problem is, one can't test that and get a reliable answer, because you do 100 backups (snaps) for testing and
restorethose 100 backups, and all may be ok. 

But what about the 1000th snap? Just in that millisecond of the snap, some important information is written to the
volumeon which the PostgreSQL cluster resides and for some reason, it needs to be restored later and this information
islost or may lead to corruption. This is why backup mode exists, after all. Really, no backup mode by a pre/post
scriptwhen snapping? 

This is very surprising to me.

Cheers,
Paul




Re: Netapp SnapCenter

От
Magnus Hagander
Дата:


On Thu, Jun 18, 2020 at 4:07 PM Paul Förster <paul.foerster@gmail.com> wrote:
Hi Ken,

> On 18. Jun, 2020, at 15:56, Wolff, Ken L <ken.l.wolff@lmco.com> wrote:
> PostgreSQL doesn’t need to be in a special mode for backups to work.

this is curious. Doesn't the PostgreSQL cluster need to be set to backup mode to use SnapCenter?

I don't know specifically about SnapCenter, but for snapshots in general, it does require backup mode *unless* all your data is on the same disk and you have an atomic snapshot across that disk (in theory it can be on different disk as well, as long as the snapshots in that case are atomic across *all* those disks, not just individually, but that is unusual).

So the upthread suggestion of putting data and wal on different disk and snapshoting them at different times is *NOT* safe. Unless the reference to the directory for the logs means a directory where log files are copied out with archive_command, and it's actually the log archive (in which case it will work, but the recommendation is that the log archive should not be on the same machine).


The problem is, one can't test that and get a reliable answer, because you do 100 backups (snaps) for testing and restore those 100 backups, and all may be ok.

But what about the 1000th snap? Just in that millisecond of the snap, some important information is written to the volume on which the PostgreSQL cluster resides and for some reason, it needs to be restored later and this information is lost or may lead to corruption. This is why backup mode exists, after all. Really, no backup mode by a pre/post script when snapping?

The normal case is that snapshots are guaranteed to be atomic, and thus this millisecond window cannot appear. But that usually only applies across individual volumes. It's also worth noticing that taking the backups without using the backup mode and archive_command means you cannot use them for PITR, only for restore onto that specific snapshot.

While our documentation on backups in general definitely needs improvement, this particular requirement is documented at https://www.postgresql.org/docs/current/backup-file.html.
 
--

Re: Netapp SnapCenter

От
Paul Förster
Дата:
Hi Magnus,

> On 18. Jun, 2020, at 16:19, Magnus Hagander <magnus@hagander.net> wrote:
> I don't know specifically about SnapCenter, but for snapshots in general, it does require backup mode *unless* all
yourdata is on the same disk and you have an atomic snapshot across that disk (in theory it can be on different disk as
well,as long as the snapshots in that case are atomic across *all* those disks, not just individually, but that is
unusual).

according to what I know from our storage guys, Netapp does atomic snapshots for each volume. We have the database and
itscorresponding WAL files (pg_wal directory) on the same volume and the archived WALs (archive_command) on another.
Andthe snapshots on those two volumes are not taken at the same time. Currently, the database is set to backup mode
(usingcron) and the storage guys have a window during which they can take the snapshots. 

> So the upthread suggestion of putting data and wal on different disk and snapshoting them at different times is *NOT*
safe.Unless the reference to the directory for the logs means a directory where log files are copied out with
archive_command,and it's actually the log archive (in which case it will work, but the recommendation is that the log
archiveshould not be on the same machine). 

as I said above, pg_wal is a directory in PGDATA at the default location and WALs are archived using the
archive_commandto a different volume. So I guess, we should be safe then. 

> The normal case is that snapshots are guaranteed to be atomic, and thus this millisecond window cannot appear. But
thatusually only applies across individual volumes. It's also worth noticing that taking the backups without using the
backupmode and archive_command means you cannot use them for PITR, only for restore onto that specific snapshot. 
>
> While our documentation on backups in general definitely needs improvement, this particular requirement is documented
athttps://www.postgresql.org/docs/current/backup-file.html. 

since we use backup mode, we should be good with PITR too. I didn't test that myself, but a workmate did and he said it
workednicely. 

So bottom line, SnapCenter means for PostgreSQL just a plain volume snapshot like we currently do with SnapCreator,
usingbackup mode scripts and "archive_command"-ing WALs to a diffent volume, i.e. no change in strategy. It's just that
SnapCentercan't control backup mode as it can with Oracle. 

Thanks for the invaluable input. Also, thanks Ken.

Cheers,
Paul




Re: Netapp SnapCenter

От
Stephen Frost
Дата:
Greetings,

* Paul Förster (paul.foerster@gmail.com) wrote:
> > On 18. Jun, 2020, at 16:19, Magnus Hagander <magnus@hagander.net> wrote:
> > I don't know specifically about SnapCenter, but for snapshots in general, it does require backup mode *unless* all
yourdata is on the same disk and you have an atomic snapshot across that disk (in theory it can be on different disk as
well,as long as the snapshots in that case are atomic across *all* those disks, not just individually, but that is
unusual).
>
> according to what I know from our storage guys, Netapp does atomic snapshots for each volume. We have the database
andits corresponding WAL files (pg_wal directory) on the same volume and the archived WALs (archive_command) on
another.And the snapshots on those two volumes are not taken at the same time. Currently, the database is set to backup
mode(using cron) and the storage guys have a window during which they can take the snapshots. 

If the entire database, all tablespaces, and pg_wal, are on the same
volume and the snapshot of the volume is atomic, then you don't actually
need to go through the start/stop backup- a snapshot being restored will
look just like a system crash and PG will just go back to the last
checkpoint and replay the WAL that's in pg_wal and it should reach
consistency and come up.

> > So the upthread suggestion of putting data and wal on different disk and snapshoting them at different times is
*NOT*safe. Unless the reference to the directory for the logs means a directory where log files are copied out with
archive_command,and it's actually the log archive (in which case it will work, but the recommendation is that the log
archiveshould not be on the same machine). 
>
> as I said above, pg_wal is a directory in PGDATA at the default location and WALs are archived using the
archive_commandto a different volume. So I guess, we should be safe then. 

Yes, that should be alright.

> > The normal case is that snapshots are guaranteed to be atomic, and thus this millisecond window cannot appear. But
thatusually only applies across individual volumes. It's also worth noticing that taking the backups without using the
backupmode and archive_command means you cannot use them for PITR, only for restore onto that specific snapshot. 
> >
> > While our documentation on backups in general definitely needs improvement, this particular requirement is
documentedat https://www.postgresql.org/docs/current/backup-file.html. 
>
> since we use backup mode, we should be good with PITR too. I didn't test that myself, but a workmate did and he said
itworked nicely. 
>
> So bottom line, SnapCenter means for PostgreSQL just a plain volume snapshot like we currently do with SnapCreator,
usingbackup mode scripts and "archive_command"-ing WALs to a diffent volume, i.e. no change in strategy. It's just that
SnapCentercan't control backup mode as it can with Oracle. 

The one issue here is that if you're using the deprecated exxclusive
backup API, then PG will create a backup_label file in the data
directory.  If the system reboots while that file exists, there's a good
chance that PG won't start up cleanly since, due to the file existing,
it thinks that it's restoring from a backup when it isn't.

Of course, if you're actually restoring from a backup, then that file is
absolutely critical to have in place, otherwise PG won't realize it's
being restored from a backup and you'll end up with a corrupted database
on restore.

Better is to use the newer non-exclusive API and arrange to collect the
necessary contents of the backup_label file from PG and store that with
the snapshot that you've taken.

Thanks,

Stephen

Вложения

Re: Netapp SnapCenter

От
Ron
Дата:
On 6/18/20 11:49 AM, Paul Förster wrote:
> Hi Magnus,
>
>> On 18. Jun, 2020, at 16:19, Magnus Hagander <magnus@hagander.net> wrote:
>> I don't know specifically about SnapCenter, but for snapshots in general, it does require backup mode *unless* all
yourdata is on the same disk and you have an atomic snapshot across that disk (in theory it can be on different disk as
well,as long as the snapshots in that case are atomic across *all* those disks, not just individually, but that is
unusual).
> according to what I know from our storage guys, Netapp does atomic snapshots for each volume. We have the database
andits corresponding WAL files (pg_wal directory) on the same volume and the archived WALs (archive_command) on
another.And the snapshots on those two volumes are not taken at the same time. Currently, the database is set to backup
mode(using cron) and the storage guys have a window during which they can take the snapshots.
 
>
>> So the upthread suggestion of putting data and wal on different disk and snapshoting them at different times is
*NOT*safe. Unless the reference to the directory for the logs means a directory where log files are copied out with
archive_command,and it's actually the log archive (in which case it will work, but the recommendation is that the log
archiveshould not be on the same machine).
 
> as I said above, pg_wal is a directory in PGDATA at the default location and WALs are archived using the
archive_commandto a different volume. So I guess, we should be safe then.
 

But it's trivial to make pg_xlog a separate mount point.

-- 
Angular momentum makes the world go 'round.



Re: Netapp SnapCenter

От
Paul Förster
Дата:
Hi Stephen,

> On 18. Jun, 2020, at 21:26, Stephen Frost <sfrost@snowman.net> wrote:
> If the entire database, all tablespaces, and pg_wal, are on the same
> volume and the snapshot of the volume is atomic, then you don't actually
> need to go through the start/stop backup- a snapshot being restored will
> look just like a system crash and PG will just go back to the last
> checkpoint and replay the WAL that's in pg_wal and it should reach
> consistency and come up.

we DID use tablespaces as an old habit (coming from Oracle :-( when everything PostgreSQL was new). When we realized
whata bad idea that was, I reorganized them all away. So we don't use tablespaces anymore. Still, they used to be on
thesame volume, just a different directory. So even that would have been no problem in this particular case. :-) 

Only Flyway doesn't like a DBA to reorg a tablespace if developers add a tablespace clause to their create table/index
statements.But we made the developers remove those clauses. 

> The one issue here is that if you're using the deprecated exxclusive
> backup API, then PG will create a backup_label file in the data
> directory.  If the system reboots while that file exists, there's a good
> chance that PG won't start up cleanly since, due to the file existing,
> it thinks that it's restoring from a backup when it isn't.

we don't. We use the non-exclusive backup mode according to:

25.3.3.1. Making A Non-Exclusive Low-Level Backup
https://www.postgresql.org/docs/current/continuous-archiving.html

But I must say that I'm really not a friend of being forced to keep the session open until the pg_stop_backup() call
occurs.This really gave me a huge headache when writing the backup script. It's solved now but I consider this very
ugly.

When I wrote our backup mode script I read the deprecation note about the exclusive mode backup. This is why I decided
togo with non-exclusive to be ready when exclusive backup mode is finally removed some day. Yet, I don't see the
reason.Everything has to be consistent. So a non-exclusive backup mode makes absolutely no sense to me. Either the
wholedatabase cluster is in backup mode or it is not. There's nothing in between. 

This is unlike Oracle where you can set single tablespaces to backup mode and then just backup them separately. Still,
Inever saw a use for this with Oracle too. 

Cheers,
Paul


Re: Netapp SnapCenter

От
Paul Förster
Дата:
Hi Ron,

> On 18. Jun, 2020, at 21:30, Ron <ronljohnsonjr@gmail.com> wrote:
> But it's trivial to make pg_xlog a separate mount point.

technically yes, but that would mean it would go to a different volume and hence destroy atomicity of the Netapp
snapshots.

Also, you'd have a huge administrative barrier against such things in our company.

Cheers,
Paul


Re: Netapp SnapCenter

От
Stephen Frost
Дата:
Greetings,

* Paul Förster (paul.foerster@gmail.com) wrote:
> When I wrote our backup mode script I read the deprecation note about the exclusive mode backup. This is why I
decidedto go with non-exclusive to be ready when exclusive backup mode is finally removed some day. Yet, I don't see
thereason. Everything has to be consistent. So a non-exclusive backup mode makes absolutely no sense to me. Either the
wholedatabase cluster is in backup mode or it is not. There's nothing in between. 

Glad that you went with the non-exclusive method.

When it comes to 'backup mode', it's actually the case that there can be
multiple backups running concurrently because there isn't actually a
single 'cluster wide backup mode', really.  Regarding the deprecated
methodology, there just isn't a way for the database on-disk image to
look exactly like a backup while also being able to survive a
crash/restart with only the WAL that's in the pg_wal directory.  Perhaps
there's other things we could do but at some point it has to be accepted
that there's gotta be something changed in the data directory to
indicate that it's a backup and not just a crash, and that change needs
to happen *after* the backup/snapshot/whatever has been taken, otherwise
there's a window of risk where a crash/restart would fail.

Thanks,

Stephen

Вложения

Re: Netapp SnapCenter

От
Paul Förster
Дата:
Hi Stephen,

> On 19. Jun, 2020, at 18:02, Stephen Frost <sfrost@snowman.net> wrote:
> When it comes to 'backup mode', it's actually the case that there can be
> multiple backups running concurrently because there isn't actually a
> single 'cluster wide backup mode', really.

this is what I don't understand. Why would there be a reason to run multiple backups concurrently? I mean, using
pg_start_backup()means I then have to backup (using whatever method) of the while PGDATA and not just some random file
${PGDATA}/base/13297/2685.And since I have to backup the whole of PGDATA anyway, why would I want to do that multiple
times,even more so, concurrently? 

I read the backup doc but I just can't grasp the idea of why multiple concurrent backup capabilities should yield any
benefitat all. 

Cheers,
Paul


Re: Netapp SnapCenter

От
Magnus Hagander
Дата:
On Sat, Jun 20, 2020 at 9:04 AM Paul Förster <paul.foerster@gmail.com> wrote:
Hi Stephen,

> On 19. Jun, 2020, at 18:02, Stephen Frost <sfrost@snowman.net> wrote:
> When it comes to 'backup mode', it's actually the case that there can be
> multiple backups running concurrently because there isn't actually a
> single 'cluster wide backup mode', really.

this is what I don't understand. Why would there be a reason to run multiple backups concurrently? I mean, using pg_start_backup() means I then have to backup (using whatever method) of the while PGDATA and not just some random file ${PGDATA}/base/13297/2685. And since I have to backup the whole of PGDATA anyway, why would I want to do that multiple times, even more so, concurrently?

I read the backup doc but I just can't grasp the idea of why multiple concurrent backup capabilities should yield any benefit at all.

One not uncommon case is for example being able to provision a new replica while a backup is running. Since replicas are provisioned starting off a base backup, being able to run that concurrently is very useful. Especially if base backups take a long time to run.

--

Re: Netapp SnapCenter

От
Paul Förster
Дата:
Hi Magnus,

> On 21. Jun, 2020, at 21:35, Magnus Hagander <magnus@hagander.net> wrote:
> One not uncommon case is for example being able to provision a new replica while a backup is running. Since replicas
areprovisioned starting off a base backup, being able to run that concurrently is very useful. Especially if base
backupstake a long time to run. 

hmm, a backup shouldn't take so long as to delay a base backup significantly. But that's just my opinion. I may be a
littlesnapshot biased, though, where it only takes a few seconds. :-) 

Cheers,
Paul


Re: Netapp SnapCenter

От
Stephen Frost
Дата:
Greetings,

* Paul Förster (paul.foerster@gmail.com) wrote:
> > On 21. Jun, 2020, at 21:35, Magnus Hagander <magnus@hagander.net> wrote:
> > One not uncommon case is for example being able to provision a new replica while a backup is running. Since
replicasare provisioned starting off a base backup, being able to run that concurrently is very useful. Especially if
basebackups take a long time to run. 
>
> hmm, a backup shouldn't take so long as to delay a base backup significantly. But that's just my opinion. I may be a
littlesnapshot biased, though, where it only takes a few seconds. :-) 

That's not the only case that I, at least, have heard of- folks aren't
really very happy with their backups fail when they could have just as
well completed, even if they're overlapping.  Sure, it's better if
backups are scheduled such that they don't overlap, but that can be hard
to guarantee.

The thing about this is though that the new API avoids *other* issues,
like what happens if the system crashes during a backup (which is an
entirely common thing that happens, considering how long many backups
take...) and it does so in a relatively reasonable way while also
allowing concurrent backups, which is perhaps a relatively modest
benefit but isn't the main point of the different API.

Thanks,

Stephen

Вложения

Re: Netapp SnapCenter

От
Paul Förster
Дата:
Hi Stephen,

> On 22. Jun, 2020, at 07:36, Stephen Frost <sfrost@snowman.net> wrote:
> That's not the only case that I, at least, have heard of- folks aren't
> really very happy with their backups fail when they could have just as
> well completed, even if they're overlapping.  Sure, it's better if
> backups are scheduled such that they don't overlap, but that can be hard
> to guarantee.

I see.

> The thing about this is though that the new API avoids *other* issues,
> like what happens if the system crashes during a backup (which is an
> entirely common thing that happens, considering how long many backups
> take...) and it does so in a relatively reasonable way while also
> allowing concurrent backups, which is perhaps a relatively modest
> benefit but isn't the main point of the different API.

that makes me curious about another thing. The output of pg_stop_backup() is to be stored. Otherwise the backup is
useless.So far, so good. But what if the server crashes in the middle of the backup and pg_stop_back() hence is never
reached?In this case, it obviously does not create any output. 

Ok, you usually start the server, the database does a crash recovery and opens. Then, some time later, you do the usual
backupand all is well. This is like 99.999% of all cases. 

But what if you need to restore to the latest transaction while the database was running in backup mode during which
thecrash occurred. How does that work if no pg_stop_backup() output exists? Did I miss something here? 

Cheers,
Paul


Re: Netapp SnapCenter

От
"Peter J. Holzer"
Дата:
On 2020-06-22 08:02:06 +0200, Paul Förster wrote:
> > On 22. Jun, 2020, at 07:36, Stephen Frost <sfrost@snowman.net> wrote:
> > The thing about this is though that the new API avoids *other* issues,
> > like what happens if the system crashes during a backup (which is an
> > entirely common thing that happens, considering how long many backups
> > take...) and it does so in a relatively reasonable way while also
> > allowing concurrent backups, which is perhaps a relatively modest
> > benefit but isn't the main point of the different API.
>
> that makes me curious about another thing. The output of
> pg_stop_backup() is to be stored. Otherwise the backup is useless. So
> far, so good. But what if the server crashes in the middle of the
> backup and pg_stop_back() hence is never reached? In this case, it
> obviously does not create any output.

Correct. Your backup will also almost certainly be missing some files,
so it won't be usable anyway.

> Ok, you usually start the server, the database does a crash recovery
> and opens. Then, some time later, you do the usual backup and all is
> well. This is like 99.999% of all cases.
>
> But what if you need to restore to the latest transaction while the
> database was running in backup mode during which the crash occurred.
> How does that work if no pg_stop_backup() output exists? Did I miss
> something here?

Restore the previous backup and replay WALs from there (that assumes of
course that you are archiving WALs continuously, but if you don't, you
can't do PITR in general, so if you have that requirement you are doing
it).

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Вложения

Re: Netapp SnapCenter

От
Magnus Hagander
Дата:


On Mon, Jun 22, 2020 at 8:02 AM Paul Förster <paul.foerster@gmail.com> wrote:
Hi Stephen,

> On 22. Jun, 2020, at 07:36, Stephen Frost <sfrost@snowman.net> wrote:
> That's not the only case that I, at least, have heard of- folks aren't
> really very happy with their backups fail when they could have just as
> well completed, even if they're overlapping.  Sure, it's better if
> backups are scheduled such that they don't overlap, but that can be hard
> to guarantee.

I see.


Yeah, especially when your backups are a number of TB which makes them take Some Time (TM) to complete...



> The thing about this is though that the new API avoids *other* issues,
> like what happens if the system crashes during a backup (which is an
> entirely common thing that happens, considering how long many backups
> take...) and it does so in a relatively reasonable way while also
> allowing concurrent backups, which is perhaps a relatively modest
> benefit but isn't the main point of the different API.

that makes me curious about another thing. The output of pg_stop_backup() is to be stored. Otherwise the backup is useless. So far, so good. But what if the server crashes in the middle of the backup and pg_stop_back() hence is never reached? In this case, it obviously does not create any output.

Whenever the connection that ran pg_start_backup() disconnects without calling pg_stop_backup(), the "state" of being "in backup mode" is "rolled back" in the database. So similar to how a transaction you started with BEGIN gets rolled back if you just disconnect without issuing COMMIT.

Your backup will of course be invalid in this case, but the database itself will be fine. (And the inability to ensure this is exactly why the old "exclusive mode" for backups is deprecated -- but the non-exclusive mode is safe with this) So it is of course very important to check that the pg_stop_backup() step completed successfully, and fail the entire backup if it did not.


Ok, you usually start the server, the database does a crash recovery and opens. Then, some time later, you do the usual backup and all is well. This is like 99.999% of all cases.

But what if you need to restore to the latest transaction while the database was running in backup mode during which the crash occurred. How does that work if no pg_stop_backup() output exists? Did I miss something here?

It does not work off *that* base backup. But if you start from the *prior* be backup (one that did complete with a successful pg_stop_backup) then you can still use the archived wal to recover to any point in time. 

--

Re: Netapp SnapCenter

От
Paul Förster
Дата:
Hi Peter,

> On 22. Jun, 2020, at 13:01, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
> Restore the previous backup and replay WALs from there (that assumes of
> course that you are archiving WALs continuously, but if you don't, you
> can't do PITR in general, so if you have that requirement you are doing
> it).

ok, that makes it a lot clearer to me. Thank you very much.

Cheers,
Paul



Re: Netapp SnapCenter

От
Paul Förster
Дата:
Hi Magnus,

> On 22. Jun, 2020, at 13:08, Magnus Hagander <magnus@hagander.net> wrote:
>
> It does not work off *that* base backup. But if you start from the *prior* be backup (one that did complete with a
successfulpg_stop_backup) then you can still use the archived wal to recover to any point in time.  

ok, that's clear now. Thank you very much.

Cheers,
Paul


Re: Netapp SnapCenter

От
Stephen Frost
Дата:
Greetings,

* Paul Förster (paul.foerster@gmail.com) wrote:
> > On 22. Jun, 2020, at 13:08, Magnus Hagander <magnus@hagander.net> wrote:
> > It does not work off *that* base backup. But if you start from the *prior* be backup (one that did complete with a
successfulpg_stop_backup) then you can still use the archived wal to recover to any point in time.  
>
> ok, that's clear now. Thank you very much.

Right, and tools like pgbackrest will figure this kind of thing out for
you too- just give it the time you want to restore to and it'll figure
out the right backup to use.

Thanks,

Stephen

Вложения

Re: Netapp SnapCenter

От
"Wolff, Ken L"
Дата:
So apologies if this is a stupid question but there's obviously been a lot of discussion on this issue.  Was a
consensusever reached on the following? 

If a Postgres database (both data and WAL) is located on one NetApp volume, meaning a snapshot should capture
everythingat exactly the same time with the required atomicity, do we still need to put the database into backup mode
beforehand(and take it out afterwards)?  If we don't put Postgres into backup mode first, will we still be able to use
theWALs to roll transactions forward or would we be limited to only the point-in-time at which that snapshot was taken? 

Thanks much.

Ken

-----Original Message-----
From: Stephen Frost <sfrost@snowman.net>
Sent: Monday, June 22, 2020 7:06 AM
To: Paul Förster <paul.foerster@gmail.com>
Cc: Magnus Hagander <magnus@hagander.net>; Wolff, Ken L (US) <ken.l.wolff@lmco.com>; pgsql-general
<pgsql-general@postgresql.org>
Subject: EXTERNAL: Re: Netapp SnapCenter

Greetings,

* Paul Förster (paul.foerster@gmail.com) wrote:
> > On 22. Jun, 2020, at 13:08, Magnus Hagander <magnus@hagander.net> wrote:
> > It does not work off *that* base backup. But if you start from the *prior* be backup (one that did complete with a
successfulpg_stop_backup) then you can still use the archived wal to recover to any point in time.  
>
> ok, that's clear now. Thank you very much.

Right, and tools like pgbackrest will figure this kind of thing out for you too- just give it the time you want to
restoreto and it'll figure out the right backup to use. 

Thanks,

Stephen



Re: Netapp SnapCenter

От
Paul Förster
Дата:
Hi Ken,

please don't put your answer at the top.

> On 22. Jun, 2020, at 17:46, Wolff, Ken L <ken.l.wolff@lmco.com> wrote:
>
> So apologies if this is a stupid question but there's obviously been a lot of discussion on this issue.  Was a
consensusever reached on the following? 
>
> If a Postgres database (both data and WAL) is located on one NetApp volume, meaning a snapshot should capture
everythingat exactly the same time with the required atomicity, do we still need to put the database into backup mode
beforehand(and take it out afterwards)?  If we don't put Postgres into backup mode first, will we still be able to use
theWALs to roll transactions forward or would we be limited to only the point-in-time at which that snapshot was taken? 

you're absolutely fine with that as long as PGDATA and the pg_wal directory are located on the same volume. But you
can'tperform a PITR. 

If you don't do pg_start_backup() then you won't be able to perform a PITR but if that's an
"I-will-*DEFINITELY*-never-need-a-PITR"situation, then that's ok. Otherwise, from what I learned, you do a
pg_start_backup(),then do the volume snapshot, and finally the pg_stop_backup() saving the output of the latter for
PITRpurposes. 

Cheers,
Paul


Re: Netapp SnapCenter

От
Stephen Frost
Дата:
Greetings,

* Paul Förster (paul.foerster@gmail.com) wrote:
> > On 22. Jun, 2020, at 17:46, Wolff, Ken L <ken.l.wolff@lmco.com> wrote:
> > So apologies if this is a stupid question but there's obviously been a lot of discussion on this issue.  Was a
consensusever reached on the following? 
> >
> > If a Postgres database (both data and WAL) is located on one NetApp volume, meaning a snapshot should capture
everythingat exactly the same time with the required atomicity, do we still need to put the database into backup mode
beforehand(and take it out afterwards)?  If we don't put Postgres into backup mode first, will we still be able to use
theWALs to roll transactions forward or would we be limited to only the point-in-time at which that snapshot was taken? 
>
> you're absolutely fine with that as long as PGDATA and the pg_wal directory are located on the same volume. But you
can'tperform a PITR. 

... and all tablespaces are also on that volume.  Basically, anything
that PG might ever write to needs to all be included in that atomic
write.

> If you don't do pg_start_backup() then you won't be able to perform a PITR but if that's an
"I-will-*DEFINITELY*-never-need-a-PITR"situation, then that's ok. Otherwise, from what I learned, you do a
pg_start_backup(),then do the volume snapshot, and finally the pg_stop_backup() saving the output of the latter for
PITRpurposes. 

Note that the output from pg_stop_backup() is absolutely essential for
the backup itself, not just if you want to do PITR.

As for doing PITR with such a snapshot, it's not something I'd recommend
as a general PITR-supporting backup strategy as it ends up being
complicated and very much has a risk associated with it that you'll end
up with a corrupted database if you don't manage it properly (and PG is
pretty limited in its ability to help you with this process..).

Thanks,

Stephen

Вложения

Re: Netapp SnapCenter

От
"Wolff, Ken L"
Дата:
One last follow up on this thread.  I don't remember this being mentioned, so wanted to share in case if might be of
help.The following is NetApps's Best Practices for PostgreSQL on NetApp ONTAP storage: 

 https://www.netapp.com/us/media/tr-4770.pdf



Re: Netapp SnapCenter

От
Paul Förster
Дата:
Hi Wolf,

> On 23. Jun, 2020, at 22:59, Wolff, Ken L <ken.l.wolff@lmco.com> wrote:
> 
> https://www.netapp.com/us/media/tr-4770.pdf 

I will check this out. Thank you very much.

Cheers,
Paul



RE: EXTERNAL: Re: Netapp SnapCenter

От
"Wolff, Ken L"
Дата:
>> On 23. Jun, 2020, at 22:59, Wolff, Ken L <ken.l.wolff@lmco.com> wrote:
>>
>> https://www.netapp.com/us/media/tr-4770.pdf

> I will check this out. Thank you very much.


There's actually a lot of good stuff in that document about Postgres in general.  I'd be curious to hear what everyone
thinks,though, and specifically about what NetApp recommends in Section 3.3 about putting data and WAL on separate
volumes,which I believe contradicts what's been discussed in this email thread. 




Re: EXTERNAL: Re: Netapp SnapCenter

От
Paul Förster
Дата:
Hi Ken,

> On 25. Jun, 2020, at 17:15, Wolff, Ken L <ken.l.wolff@lmco.com> wrote:
> There's actually a lot of good stuff in that document about Postgres in general.  I'd be curious to hear what
everyonethinks, though, and specifically about what NetApp recommends in Section 3.3 about putting data and WAL on
separatevolumes, which I believe contradicts what's been discussed in this email thread. 

yes, I've read the part about different volumes and I must say, I don't agree because I think it violates atomicity.

Archived WAL is another thing, but PGDATA and pg_wal should IMHO always be located on the same volume, along with
tablespaces,if any. 

Cheers,
Paul


Re: EXTERNAL: Re: Netapp SnapCenter

От
"David G. Johnston"
Дата:
On Thu, Jun 25, 2020 at 8:24 AM Paul Förster <paul.foerster@gmail.com> wrote:
Archived WAL is another thing, but PGDATA and pg_wal should IMHO always be located on the same volume, along with tablespaces, if any.

My understanding that having such a setup (single volume) eases administration at the cost of performance.  It is perfectly acceptable to favor the performance benefits, you just need to ensure that your administration of the server is robust enough to handle it properly.  It sounds like this product, though, doesn't provide the correct depth of administration while recommending the performance weighted setup.

David J.

Re: EXTERNAL: Re: Netapp SnapCenter

От
Magnus Hagander
Дата:


On Thu, Jun 25, 2020 at 5:23 PM Paul Förster <paul.foerster@gmail.com> wrote:
Hi Ken,

> On 25. Jun, 2020, at 17:15, Wolff, Ken L <ken.l.wolff@lmco.com> wrote:
> There's actually a lot of good stuff in that document about Postgres in general.  I'd be curious to hear what everyone thinks, though, and specifically about what NetApp recommends in Section 3.3 about putting data and WAL on separate volumes, which I believe contradicts what's been discussed in this email thread.

yes, I've read the part about different volumes and I must say, I don't agree because I think it violates atomicity.

I believe NetApp does atomic snapshots across multiple volumes, if you have them in the same consistency group. (If you don't then you're definitely in for a world of pain if you ever have to restore)

Snapshotting multiple volumes in a consistency group will set up a write fence across them, then snapshot, and AIUI guarantees correct write ordering.

--

Re: EXTERNAL: Re: Netapp SnapCenter

От
Paul Förster
Дата:
Hi Magnus,

> On 26. Jun, 2020, at 12:29, Magnus Hagander <magnus@hagander.net> wrote:
> I believe NetApp does atomic snapshots across multiple volumes, if you have them in the same consistency group. (If
youdon't then you're definitely in for a world of pain if you ever have to restore) 
>
> Snapshotting multiple volumes in a consistency group will set up a write fence across them, then snapshot, and AIUI
guaranteescorrect write ordering. 

That's how I understood it too. But it will be difficult in our complex storage world to get the storage guys to create
aconsistency group. 

Also, what I'm looking for is a plugin that allows to, how Netapp call it, "quiece" the PostgreSQL cluster, i.e. does
start/stopbackup. 

I don't really trust the community plugin as it is so old and I'm not sure it gets any more development attention. For
example,what about the API change (exclusive vs. non-exclusive)? Does it use the new API? 

Cheers,
Paul


Re: Netapp SnapCenter

От
Paul Förster
Дата:
Hi Stephen,

> On 22. Jun, 2020, at 19:00, Stephen Frost <sfrost@snowman.net> wrote:
> ... and all tablespaces are also on that volume.  Basically, anything
> that PG might ever write to needs to all be included in that atomic
> write.

it turns out that tablespaces can be stored on different volumes if you use consistency groups in Netapp to do the
snapshots.But this is per documentation that Ken mentioned (see below)! I couldn't verify it as I am not the storage
person...

https://www.netapp.com/us/media/tr-4770.pdf

Cheers,
Paul


Re: EXTERNAL: Re: Netapp SnapCenter

От
Stephen Frost
Дата:
Greetings,

* Paul Förster (paul.foerster@gmail.com) wrote:
> > On 26. Jun, 2020, at 12:29, Magnus Hagander <magnus@hagander.net> wrote:
> > I believe NetApp does atomic snapshots across multiple volumes, if you have them in the same consistency group. (If
youdon't then you're definitely in for a world of pain if you ever have to restore) 
> >
> > Snapshotting multiple volumes in a consistency group will set up a write fence across them, then snapshot, and AIUI
guaranteescorrect write ordering. 
>
> That's how I understood it too. But it will be difficult in our complex storage world to get the storage guys to
createa consistency group. 
>
> Also, what I'm looking for is a plugin that allows to, how Netapp call it, "quiece" the PostgreSQL cluster, i.e. does
start/stopbackup. 

Presumably they mean 'quiesce', except that that *isn't* what PG's
start/stop backup calls do, and assuming that's what happens is quite
wrong and could lead to issues.

The PG start/stop backup calls do things like wait for a checkpoint to
happen and track when that checkpoint was and return that info along
with whatever the stopping point of the backup is- so that you can make
sure that you have all of the WAL between those two points, and so you
can create the backup_label file that's needed to indicate on restore
that you're restoring from a backup and not just doing crash recovery.

If it isn't an atomic snapshot across everything then start/stop calls
have to be done as well as all that other fun stuff.

> I don't really trust the community plugin as it is so old and I'm not sure it gets any more development attention.
Forexample, what about the API change (exclusive vs. non-exclusive)? Does it use the new API? 

No clue.

Thanks,

Stephen

Вложения

Re: EXTERNAL: Re: Netapp SnapCenter

От
Paul Förster
Дата:
Hi Stephen,

> On 29. Jun, 2020, at 15:32, Stephen Frost <sfrost@snowman.net> wrote:
>
> Presumably they mean 'quiesce', except that that *isn't* what PG's

yes, sorry, "quiece" was a typo on my part. I never fully understood what they mean with "quiesce" anyway. But then,
I'mnot the storage specialist in out company anyway. 

> start/stop backup calls do, and assuming that's what happens is quite
> wrong and could lead to issues.
>
> The PG start/stop backup calls do things like wait for a checkpoint to
> happen and track when that checkpoint was and return that info along
> with whatever the stopping point of the backup is- so that you can make
> sure that you have all of the WAL between those two points, and so you
> can create the backup_label file that's needed to indicate on restore
> that you're restoring from a backup and not just doing crash recovery.
>
> If it isn't an atomic snapshot across everything then start/stop calls
> have to be done as well as all that other fun stuff.

that's exactly why I want control over pg_start_backup() and pg_stop_backup(). It may be in the form of pre- and
post-scripts,but I want control over it. I just can't seem to build trust in a plugin that saw the last release two
yearsago and which I can't even find out if it would allow PITRs, works with the new API and such things. 

I may be wrong here, but my gut feeling about this is just not good for some reason.

Cheers,
Paul