Re: Consistent file-level backup of pg data directory

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: Consistent file-level backup of pg data directory
Дата
Msg-id CAMkU=1yTn1NK+AZ_mBzRZcdmEMLQNjBZWgzfE0qYaDTu3-VfKA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Consistent file-level backup of pg data directory  (gator_ml@yahoo.de)
Ответы Re: Consistent file-level backup of pg data directory
Список pgsql-general
On Wed, Jan 8, 2014 at 7:09 AM, <gator_ml@yahoo.de> wrote:
On 2014-01-07 21:54, Jeff Janes wrote:
> On Tue, Jan 7, 2014 at 10:58 AM, <gator_ml@yahoo.de
> <mailto:gator_ml@yahoo.de>> wrote:
>     - keep a copy "backup_label" under a different name
[...[
> Why under a different name?  That sounds dangerous to me.
... otherwise it would be useless, because "pg_stop_backup"
will delete the "backup_label" file

That is one of the reasons you should run the backup before you tell PostgreSQL that you finished the backup :)


>     - call pg_stop_backup()
>     - the run the regular backup procedure
>
>
> You have this order backwards.  You can only tell postgres that you are
> done doing the backup once you are done doing the backup.
this is what I tried 1st - unfortunately, this won't work at all,
because postgres creates another checkpoint _after_ pg_stop and will
refuse to run the restore procedure if this last wal file is missing
(which in my scenario inevitably will be the case)

Eventually another checkpoint will be run, but pg_stop_backup will not directly cause one to happen.  What it creates is a record in the WAL stream saying that the backup has finished.  That record is what it needs to see.  After writing that record, it triggers a log switch (not the same thing as a checkpoint) so that the xlog file containing the record will get archived (if you are were truly using archiving).

That requirement is there to protect your data, you cannot safely circumvent the need for it.  If you have backed up the xlog file that contains that record before the record was present, then that will be a problem, as it won't find the record. The solution for that (other than using archiving) is to backup everything before calling pg_stop_backup, then backup just the pg_xlog after calling pg_stop_backup.



> It is pretty much a disaster.  Do you really need rsync?  Otherwise, use
> pg_basebackup, it is easier to get right.

... it looks like I didn't explain clear enough, what I am trying to
accomplish. So let me try it again:

We have a general backup procedure, that runs every night. For
some machines, there may additionally be more
specialized/fine-grained measures, but in any case, it should be
possible to restore any given server without any machine-specific
knowledge from the last nightly backup following a
generic "restore-recipe". Our current rsync-based solution is a
pretty good compromise between many different requirements.

For machines running database systems, this means, this means,
that I need some way to get a consistent state of some point in
time. It does not particularly matter, which time exactly (in
particular, I do not care, if transactions committed during the
backup are included or not) For this purpose, I can define
machine-specific hook to run before/ after the actual backup
procedure.

I think it would be easier to just exclude the database from the system-wide backup and use a different method for it, rather than engineer the necessary before/after hooks onto the system-wide backup.

 

Unfortunately, it does not look like there is any direct way to
accomplish this with postgres except shutting down the whole
database system while the backup is running. The systems will
be almost idle while the backup runs, and write transactions
will be very rare, but a total shutdown for every backup still
would be too radical.

Actually, with some tests copying the database directory while
writing data to the database and then restarting postgres with
such a copy, I did not manage to produce any visible
inconsistencies. If postgres was able to restore the last
consistent state in such a situation, then no additional measures
would be needed (like I said, it is not particularly important
which state this is - losing everything written during the last
15 minutes including committed transactions would be good enough)
.But at least as far as I can tell from the documentation, this
can not safely be assumed.

On small, nearly idle systems, you can often get away with doing a lot of dangerous things.  This is perhaps unfortunate, as it breeds bad habits.


Otherwise, the procedures for "Continuous Archiving" are clearly
made for some a diffent purpose, it seems like this is about as
close as it gets to what I need. As far as I understand:

- between pg_start_backup() and pg_stop_backup() it should be safe
to copy the data directory any time
- all committed transactions during this time will go to the wal files
(actually, I noticed that files in other directories than pg_xlog are
modified, too ...)

True, but the data in pg_xlog tells it how to replay those other changes to those other directories.  So it doesn't matter whether the other file was backed up pre-modification or post-modification, as the WAL allows it to be repaired either way. Unless the WAL stream ends too early--then it can't repair them.
 
- normally, it is intended to copy every single newly written wal file
somewhere else by whatever is defined as "archive_command", but
if this works, than any other way to copy all wal files written
after pg_start_backup should work just as well.

Kind of.  The archive_command conspires with database server to prevent it from recycling a wal file until we know it has been successfully copied by archive_command.  It is difficult for "any other way" to make that guarantee--although if the server is nearly idle, then the risk of such recycling causing problems is rather low.
 
My original idea was, to call pg_start_backup() 1st, then run the
actual backup followed by pg_stop_backup. Because the procedure
for " Point-in-Time Recovery" is the same and no wal file may be
modified after it was written and "archive_command" called, it
follows, that any wal file plus the preceding wal files written
after pg_start_backup must contain everything that is needed to
restore any state up to the time the wal file was
written (otherwise the whole "PITR)" mechanism would not work).

PITR allows you to recover to any point after the backup successfully ended.  It does not allow you to recover to some point in the middle of the backup (other than by starting at the previous successful backup and rolling forward).

I think that this is a logical necessity.  If you could use a backup set to restore to a point in the middle of that same backup, then you wouldn't be in the middle of the backup, you would be done with it.

 
Unfortunately, postgres refuses to run the restore procedure if
the last wal file (which is written after pg_stop_backup, so in
my case it will not be available) even if "recovery_target_time"
is explicitly set to some earlier time. I didn't try it, but
assuming that the last wal file is not really needed in this
case, it would be enough to just create an empty file with the
name postgres is looking for?

It is really needed if you have high standards of data integrity and safety.  That is why PostgreSQL requires it.
 

If I can't find, a better way, I probably could arrange to
separately copy this last wal file into the
backup (unfortunately, there is no easy way to do this ...)-:

 
You could just copy the entire pg_xlog directory again.  It is easier, probably safer (there is no guarantee that the "last" wal file is really only one file, rather than two or three), and on a low-traffic server the size is probably not all that large.

Cheers,

Jeff

В списке pgsql-general по дате отправления:

Предыдущее
От: Nelson Green
Дата:
Сообщение: Re: Last inserted row id with complex PK
Следующее
От: David Johnston
Дата:
Сообщение: Re: Last inserted row id with complex PK