Обсуждение: Consistent file-level backup of pg data directory

Поиск
Список
Период
Сортировка

Consistent file-level backup of pg data directory

От
gator_ml@yahoo.de
Дата:
Hi,

On my site, we run rsync-based backups of all servers to a central
backup server every night. Unfortunately, reading the postgres
documentation I could not find any direct way how to ensure that the
state of the postgres data directory stored on the backup server is in a
consistent state.

It seems to me, that using (abusing;-) the "Continuous Archiving" it
should be possible to get what i need. Specifically, I came up with
the following strategy:
- set wal_level=archive
- call pg_start_backup()
- keep a copy "backup_label" under a different name
- call pg_stop_backup()
- the run the regular backup procedure

As far as I can see, the result of this should be that I get a
consistent snapshot of the data directory at the time of the
pg_stop_backup call. After restoring the backup somewhere,
I only need to create a "recovery.conf" file (containing
restore_command='/bin/true' because restore_command is
mandatory) and everything should be fine.

In practice, this procedure also seems to work as expected.
Is there any hidden pitfall I overlooked? Or is there maybe
a better way not mentioned in the postgres documentation how to
get the data into a consistent state for a file level backup?

Any Feedback would be appreciated!

Regards,
                         Peter

Re: Consistent file-level backup of pg data directory

От
Jeff Janes
Дата:
On Tue, Jan 7, 2014 at 10:58 AM, <gator_ml@yahoo.de> wrote:
Hi,

On my site, we run rsync-based backups of all servers to a central backup server every night. Unfortunately, reading the postgres documentation I could not find any direct way how to ensure that the state of the postgres data directory stored on the backup server is in a consistent state.

It seems to me, that using (abusing;-) the "Continuous Archiving" it
should be possible to get what i need. Specifically, I came up with
the following strategy:
- set wal_level=archive
- call pg_start_backup()
- keep a copy "backup_label" under a different name

Why under a different name?  That sounds dangerous to me.
 
- call pg_stop_backup()
- the run the regular backup procedure

You have this order backwards.  You can only tell postgres that you are done doing the backup once you are done doing the backup.
 
 
After restoring the backup somewhere,
I only need to create a "recovery.conf" file (containing
restore_command='/bin/true' because restore_command is
mandatory) and everything should be fine.

Does this mean you weren't actually archiving the xlog when you turned archiving on?  

What do you do with the backup_label?  If you don't restore it to its original name, then postgres will just think it is recovering from a power failure or something like that, not a media failure, so it will not even try to access the archived logs.
 
In practice, this procedure also seems to work as expected.

If no checkpoints occurred during the backup (or no more than one of them did), then your method might accidentally work.  Or it could silently corrupt your database in hard to detect ways.

 
Is there any hidden pitfall I overlooked? Or is there maybe
a better way not mentioned in the postgres documentation how to
get the data into a consistent state for a file level backup?

Any Feedback would be appreciated!

It is pretty much a disaster.  Do you really need rsync?  Otherwise, use pg_basebackup, it is easier to get right.

Cheers,

Jeff

Re: Consistent file-level backup of pg data directory

От
Rajeev rastogi
Дата:
On 08 January 2014, Gator wrote:

> Hi,
>
> On my site, we run rsync-based backups of all servers to a central
> backup server every night. Unfortunately, reading the postgres
> documentation I could not find any direct way how to ensure that the
> state of the postgres data directory stored on the backup server is in
> a consistent state.
>
> It seems to me, that using (abusing;-) the "Continuous Archiving" it
> should be possible to get what i need. Specifically, I came up with the
> following strategy:
> - set wal_level=archive
> - call pg_start_backup()
> - keep a copy "backup_label" under a different name
> - call pg_stop_backup()
> - the run the regular backup procedure
>
> As far as I can see, the result of this should be that I get a
> consistent snapshot of the data directory at the time of the
> pg_stop_backup call. After restoring the backup somewhere, I only need
> to create a "recovery.conf" file (containing
> restore_command='/bin/true' because restore_command is
> mandatory) and everything should be fine.
>
> In practice, this procedure also seems to work as expected.
> Is there any hidden pitfall I overlooked? Or is there maybe a better
> way not mentioned in the postgres documentation how to get the data
> into a consistent state for a file level backup?
>
> Any Feedback would be appreciated!
>
> Regards,
>                          Peter
>
>

You can better try to use pg_basebackup (if you are not very particular about rsync), which will be
easy to use and will give you consistent backup.
You will be required to make configuration changes of all servers.
Check for PostgreSQL documentation for more detail.

Thanks and Regards,
Kumar Rajeev Rastogi


Re: Consistent file-level backup of pg data directory

От
gator_ml@yahoo.de
Дата:
On 2014-01-07 21:54, Jeff Janes wrote:
> On Tue, Jan 7, 2014 at 10:58 AM, <gator_ml@yahoo.de
> <mailto:gator_ml@yahoo.de>> wrote:
>     - keep a copy "backup_label" under a different name
[...[
> Why under a different name?  That sounds dangerous to me.
... otherwise it would be useless, because "pg_stop_backup"
will delete the "backup_label" file

>     - call pg_stop_backup()
>     - the run the regular backup procedure
>
>
> You have this order backwards.  You can only tell postgres that you are
> done doing the backup once you are done doing the backup.
this is what I tried 1st - unfortunately, this won't work at all,
because postgres creates another checkpoint _after_ pg_stop and will
refuse to run the restore procedure if this last wal file is missing
(which in my scenario inevitably will be the case)

> It is pretty much a disaster.  Do you really need rsync?  Otherwise, use
> pg_basebackup, it is easier to get right.

... it looks like I didn't explain clear enough, what I am trying to
accomplish. So let me try it again:

We have a general backup procedure, that runs every night. For
some machines, there may additionally be more
specialized/fine-grained measures, but in any case, it should be
possible to restore any given server without any machine-specific
knowledge from the last nightly backup following a
generic "restore-recipe". Our current rsync-based solution is a
pretty good compromise between many different requirements.

For machines running database systems, this means, this means,
that I need some way to get a consistent state of some point in
time. It does not particularly matter, which time exactly (in
particular, I do not care, if transactions committed during the
backup are included or not) For this purpose, I can define
machine-specific hook to run before/ after the actual backup
procedure.

Unfortunately, it does not look like there is any direct way to
accomplish this with postgres except shutting down the whole
database system while the backup is running. The systems will
be almost idle while the backup runs, and write transactions
will be very rare, but a total shutdown for every backup still
would be too radical.

Actually, with some tests copying the database directory while
writing data to the database and then restarting postgres with
such a copy, I did not manage to produce any visible
inconsistencies. If postgres was able to restore the last
consistent state in such a situation, then no additional measures
would be needed (like I said, it is not particularly important
which state this is - losing everything written during the last
15 minutes including committed transactions would be good enough)
.But at least as far as I can tell from the documentation, this
can not safely be assumed.

Otherwise, the procedures for "Continuous Archiving" are clearly
made for some a diffent purpose, it seems like this is about as
close as it gets to what I need. As far as I understand:

- between pg_start_backup() and pg_stop_backup() it should be safe
to copy the data directory any time
- all committed transactions during this time will go to the wal files
(actually, I noticed that files in other directories than pg_xlog are
modified, too ...)
- normally, it is intended to copy every single newly written wal file
somewhere else by whatever is defined as "archive_command", but
if this works, than any other way to copy all wal files written
after pg_start_backup should work just as well.

My original idea was, to call pg_start_backup() 1st, then run the
actual backup followed by pg_stop_backup. Because the procedure
for " Point-in-Time Recovery" is the same and no wal file may be
modified after it was written and "archive_command" called, it
follows, that any wal file plus the preceding wal files written
after pg_start_backup must contain everything that is needed to
restore any state up to the time the wal file was
written (otherwise the whole "PITR)" mechanism would not work).
Unfortunately, postgres refuses to run the restore procedure if
the last wal file (which is written after pg_stop_backup, so in
my case it will not be available) even if "recovery_target_time"
is explicitly set to some earlier time. I didn't try it, but
assuming that the last wal file is not really needed in this
case, it would be enough to just create an empty file with the
name postgres is looking for?

If I can't find, a better way, I probably could arrange to
separately copy this last wal file into the
backup (unfortunately, there is no easy way to do this ...)-:

Calling pg_stop_backup() _before_ the backup was of course sheer
desperation but this way postgres seemed to correctly run it's
restore procedure in my tests, but I don't know about the
postgres internals (actually, I personally am not even
responsible for any postgres server), that's why I asked here.

So again my question: is there any reasonable way to get a
consistent backup of a postgres server in my
scenario (pg_basebackup won't help much here - I could of course
use it to create a local copy before the backup, but this would
waste lots of space)

Regards,
                  Peter


Re: Consistent file-level backup of pg data directory

От
Jeff Janes
Дата:
On Wed, Jan 8, 2014 at 7:09 AM, <gator_ml@yahoo.de> wrote:
On 2014-01-07 21:54, Jeff Janes wrote:
> On Tue, Jan 7, 2014 at 10:58 AM, <gator_ml@yahoo.de
> <mailto:gator_ml@yahoo.de>> wrote:
>     - keep a copy "backup_label" under a different name
[...[
> Why under a different name?  That sounds dangerous to me.
... otherwise it would be useless, because "pg_stop_backup"
will delete the "backup_label" file

That is one of the reasons you should run the backup before you tell PostgreSQL that you finished the backup :)


>     - call pg_stop_backup()
>     - the run the regular backup procedure
>
>
> You have this order backwards.  You can only tell postgres that you are
> done doing the backup once you are done doing the backup.
this is what I tried 1st - unfortunately, this won't work at all,
because postgres creates another checkpoint _after_ pg_stop and will
refuse to run the restore procedure if this last wal file is missing
(which in my scenario inevitably will be the case)

Eventually another checkpoint will be run, but pg_stop_backup will not directly cause one to happen.  What it creates is a record in the WAL stream saying that the backup has finished.  That record is what it needs to see.  After writing that record, it triggers a log switch (not the same thing as a checkpoint) so that the xlog file containing the record will get archived (if you are were truly using archiving).

That requirement is there to protect your data, you cannot safely circumvent the need for it.  If you have backed up the xlog file that contains that record before the record was present, then that will be a problem, as it won't find the record. The solution for that (other than using archiving) is to backup everything before calling pg_stop_backup, then backup just the pg_xlog after calling pg_stop_backup.



> It is pretty much a disaster.  Do you really need rsync?  Otherwise, use
> pg_basebackup, it is easier to get right.

... it looks like I didn't explain clear enough, what I am trying to
accomplish. So let me try it again:

We have a general backup procedure, that runs every night. For
some machines, there may additionally be more
specialized/fine-grained measures, but in any case, it should be
possible to restore any given server without any machine-specific
knowledge from the last nightly backup following a
generic "restore-recipe". Our current rsync-based solution is a
pretty good compromise between many different requirements.

For machines running database systems, this means, this means,
that I need some way to get a consistent state of some point in
time. It does not particularly matter, which time exactly (in
particular, I do not care, if transactions committed during the
backup are included or not) For this purpose, I can define
machine-specific hook to run before/ after the actual backup
procedure.

I think it would be easier to just exclude the database from the system-wide backup and use a different method for it, rather than engineer the necessary before/after hooks onto the system-wide backup.

 

Unfortunately, it does not look like there is any direct way to
accomplish this with postgres except shutting down the whole
database system while the backup is running. The systems will
be almost idle while the backup runs, and write transactions
will be very rare, but a total shutdown for every backup still
would be too radical.

Actually, with some tests copying the database directory while
writing data to the database and then restarting postgres with
such a copy, I did not manage to produce any visible
inconsistencies. If postgres was able to restore the last
consistent state in such a situation, then no additional measures
would be needed (like I said, it is not particularly important
which state this is - losing everything written during the last
15 minutes including committed transactions would be good enough)
.But at least as far as I can tell from the documentation, this
can not safely be assumed.

On small, nearly idle systems, you can often get away with doing a lot of dangerous things.  This is perhaps unfortunate, as it breeds bad habits.


Otherwise, the procedures for "Continuous Archiving" are clearly
made for some a diffent purpose, it seems like this is about as
close as it gets to what I need. As far as I understand:

- between pg_start_backup() and pg_stop_backup() it should be safe
to copy the data directory any time
- all committed transactions during this time will go to the wal files
(actually, I noticed that files in other directories than pg_xlog are
modified, too ...)

True, but the data in pg_xlog tells it how to replay those other changes to those other directories.  So it doesn't matter whether the other file was backed up pre-modification or post-modification, as the WAL allows it to be repaired either way. Unless the WAL stream ends too early--then it can't repair them.
 
- normally, it is intended to copy every single newly written wal file
somewhere else by whatever is defined as "archive_command", but
if this works, than any other way to copy all wal files written
after pg_start_backup should work just as well.

Kind of.  The archive_command conspires with database server to prevent it from recycling a wal file until we know it has been successfully copied by archive_command.  It is difficult for "any other way" to make that guarantee--although if the server is nearly idle, then the risk of such recycling causing problems is rather low.
 
My original idea was, to call pg_start_backup() 1st, then run the
actual backup followed by pg_stop_backup. Because the procedure
for " Point-in-Time Recovery" is the same and no wal file may be
modified after it was written and "archive_command" called, it
follows, that any wal file plus the preceding wal files written
after pg_start_backup must contain everything that is needed to
restore any state up to the time the wal file was
written (otherwise the whole "PITR)" mechanism would not work).

PITR allows you to recover to any point after the backup successfully ended.  It does not allow you to recover to some point in the middle of the backup (other than by starting at the previous successful backup and rolling forward).

I think that this is a logical necessity.  If you could use a backup set to restore to a point in the middle of that same backup, then you wouldn't be in the middle of the backup, you would be done with it.

 
Unfortunately, postgres refuses to run the restore procedure if
the last wal file (which is written after pg_stop_backup, so in
my case it will not be available) even if "recovery_target_time"
is explicitly set to some earlier time. I didn't try it, but
assuming that the last wal file is not really needed in this
case, it would be enough to just create an empty file with the
name postgres is looking for?

It is really needed if you have high standards of data integrity and safety.  That is why PostgreSQL requires it.
 

If I can't find, a better way, I probably could arrange to
separately copy this last wal file into the
backup (unfortunately, there is no easy way to do this ...)-:

 
You could just copy the entire pg_xlog directory again.  It is easier, probably safer (there is no guarantee that the "last" wal file is really only one file, rather than two or three), and on a low-traffic server the size is probably not all that large.

Cheers,

Jeff

Re: Consistent file-level backup of pg data directory

От
Ivan Voras
Дата:
On 08/01/2014 16:09, gator_ml@yahoo.de wrote:

> For machines running database systems, this means, this means,
> that I need some way to get a consistent state of some point in
> time. It does not particularly matter, which time exactly (in

> Unfortunately, it does not look like there is any direct way to
> accomplish this with postgres except shutting down the whole
> database system while the backup is running. The systems will

You could use a file system which supports snapshots (I don't know which
Linux FS's do, I've read that LVM can be used to simulate those) to get
an exact point-in-time backup which will use the database's usual
resilience to be stable enough for restoring.



Вложения

Re: Consistent file-level backup of pg data directory

От
M Tarkeshwar Rao
Дата:
Hi All,

I am facing one problem. I want to read the log files of postgres.

Actually our customer facing some problem in database. Continuously one error exception raised by the system. 

Error is:
2014-01-09 22:08:12.003, SEVERE, manager.Data Manager - Could not execute JDBC batch update; SQL [insert into
MM_CONFIGURATION(NAME, CONFIGTYPE, CHECKEDOUT, DELETED, COLLECTIONID, CONFIGID) values (?, ?, ?, ?, ?, ?)]; constraint
[null];nested exception is org.hibernate.exception.ConstraintViolationException: Could not execute JDBC batch update
 

I want to read the data provided by the customer to the system in log files.

Can you suggest me which log file I should go through and how?


Regards
Tarkeshwar

Re: Consistent file-level backup of pg data directory

От
gator_ml@yahoo.de
Дата:
On 01/08/14 19:55, Jeff Janes wrote:
> I think it would be easier to just exclude the database from the
> system-wide backup and use a different method for it, rather than
> engineer the necessary before/after hooks onto the system-wide backup.

Thanks for your comments!

I really thought, it would be easier (I still have the
impression, that almost everything needed to make a filesystem
backup safe would be available, only the parts don't fit too well
for that purpose). I now came up with a way how to safely get a
postgres database into the general backup-procedure:

- move the postgres data directory to a separate LVM volume
- call pg_start_backup()
- create a LVM snapshot
- call pg_stop_backup() and update the snapshot's pg_xlog from
   the "real" volume
- create a recovery.conf on the snapshot
- run our normal backup procedure with the "real" postgres data
   volume replaced by the prepared snapshot
- remove the snapshot again
- If the backup is restored to a machine, postgres will find the
   recovery.conf file and initiate the recovery

Actually, I wonder if in this scenario, the pg_start/stop_backup
is really necessary - would it be safe to just tell postgres to
create a checkpoint right before the snapshot and directly use
the "frozen" data directory in the backup? (The postgres
documentation suggests this).

Another point in favor of the slightly cumbersome solution above
would be the ability to combine it with  Point-in-Time Recovery.
As far as I understood, this only works with a base backup created
using pg_start/stop_backup. Is this right?

Regards,
                      Peter


Re: Consistent file-level backup of pg data directory

От
Jeff Janes
Дата:

On Fri, Jan 10, 2014 at 9:03 AM, <gator_ml@yahoo.de> wrote:
On 01/08/14 19:55, Jeff Janes wrote:
I think it would be easier to just exclude the database from the
system-wide backup and use a different method for it, rather than
engineer the necessary before/after hooks onto the system-wide backup.

Thanks for your comments!

I really thought, it would be easier (I still have the
impression, that almost everything needed to make a filesystem
backup safe would be available, only the parts don't fit too well
for that purpose). I now came up with a way how to safely get a
postgres database into the general backup-procedure:

- move the postgres data directory to a separate LVM volume
- call pg_start_backup()
- create a LVM snapshot
- call pg_stop_backup() and update the snapshot's pg_xlog from
  the "real" volume
- create a recovery.conf on the snapshot
- run our normal backup procedure with the "real" postgres data
  volume replaced by the prepared snapshot
- remove the snapshot again
- If the backup is restored to a machine, postgres will find the
  recovery.conf file and initiate the recovery

Provided the entire database (including all tablespaces, all pg_xlog) participate in the snapshot atomically, you can skip almost all of those steps (and you should, because including extraneous steps and confusing and therefore dangerous):

0) do a one-time permanent rearrangement so the database lives on a separate volume.
1)  snapshot.  
2)  backup the frozen snapshot.
3)  release the snapshot.

When you start postgres based on the copied snapshot, the database will think that it crashed, and will go through soft crash recovery (recovery by using the files it finds in pg_xlog).  You don't need recovery.conf, because you are not using a WAL archive, only the WAL that is naturally left in the pg_xlog.  Since you are not doing a hard recovery, you don't need pg_start_backup(), etc.


Actually, I wonder if in this scenario, the pg_start/stop_backup
is really necessary - would it be safe to just tell postgres to
create a checkpoint right before the snapshot and directly use
the "frozen" data directory in the backup? (The postgres
documentation suggests this).

Yep.  Assuming you trust LVM snapshot code to be free of bugs.
 

Another point in favor of the slightly cumbersome solution above
would be the ability to combine it with  Point-in-Time Recovery.

Now I'm a bit confused.  I thought you were not doing actual log archiving, and did not want to do so.  If you are not keeping the log archive, then you can't use PITR in a meaningful way.  

But if you are using real log archiving, then you don't need to use a LVM snapshot.  Half of the point of the dance with pg_start_backup(), the backup_label file, and the pg_stop_backup() is to protect you from problems that are caused by the backup not being instantaneous.

 
As far as I understood, this only works with a base backup created
using pg_start/stop_backup. Is this right?


Or with the pg_basebackup program.  

It is probably *possible* to do a PITR starting from a LVM snapshot rather than a base backup, but that is just juggling running chainsaws.  

So first decide if you want to use log archiving or not, then backup using the method corresponding to that decision.  Of course if you do use log archiving, you need to make sure the archive is backed up, too.

Cheers,

Jeff