Обсуждение: base backup requirements for PITR
Postgres Admin Wizards, I am trying to manage an 8.4 database whose ingest rate is sufficiently high that doing base backups on it causes it to fall unacceptably behind. To deal with this, I want to do WAL shipping to another server that is in warm standby mode, and then on that server perform the base backups in order to effect log change accumulation without hosing the master server. Once upon a time, months ago, before the load on the system made creating a base backup completely untenable, I configured an archive_command, did a pg_start_backup, kicked off a tar operation on the data directory, and after the tar operation's completion ran pg_stop_backup. Last week, I was finally allocated a new server on which to create a warm standby, so I unpacked the tarball I created, installed a recovery.conf with a restore_command, and started the server up. It has been methodically chewing through the WAL segments, and a little over a week later it has gotten through nearly all three months' worth of WAL segments (I know, miserable, and hopefully never an experience I will repeat). I take this to mean that I did things correctly and that shortly I will end up with a warm standby server, on the assumption that if I had botched the creation of the base backup things would have bombed out very quickly. That said, I'm wondering whether I created the base backup with reliably correct procedures or merely was lucky. It's not exactly clear to me when I turned on the warm standby server how it knew where in the WAL to start processing data. As I understand things, when Postgres experiences a hard crash, it can rely on the contents of global/pg_control to provide the value of the last checkpoint's REDO location, from which it can start replaying WAL records. But what happens if you're in the middle of creating a base backup and Postgres forces a checkpoint, possibly causing global/pg_control to be changed at some indeterminate point during tar's execution? And are the concerns different for the scenario in which you're running pg_start_backup/tar/pg_stop_backup on a master server and the scenario in which you're just running tar on a warm standby server (because you can't actually log into a warm standby server)? In other words, is pg_start_backup/pg_stop_backup doing something magical do give the eventual recovery process a heads up about what to do? When I first started puzzling through this, I figured that all I had to do was run pg_controldata on the warm standby, parse out the latest checkpoint's REDO address, calculate the corresponding file, and then assume that I could purge my WAL archive of any files preceding that. But then I started to worry about whether upon startup of a server in recovery mode using a tarball that I had generated from a warm standby Postgres would know where to start in the WAL when invoking my restore_command. Looking through emails from the archives spanning the past decade, I've seen at least one saying that I need to make sure that global/pg_control is archived first and one that says it needs to be archived last when I'm creating a tarball on a warm standby server, and I really can't tell what the requirement is, if any, on 8.4, and I also can't tell, if there is such a requirement for a warm standby server, whether such a requirement is obviated on a master server by virtue of things that pg_start_backup/pg_stop_backup might be doing. So, can someone clarify the following? 1) When Postgres starts up in recovery mode (i.e. with a recovery.conf), how does it know where in the WAL to start replaying records? 2) Beyond just running a tar command and making sure that I keep around the required WAL files, what constraints are there on creating a tarball for the base backup? Do I need to do anything in particular to ensure that the resultant warm standby server, upon startup, knows where in the WAL to start replay? Is the point at which global/pg_control is picked up by tar critical to the success of a server in recovery mode coming online in a consistent state, or is Postgres doing something else involving log sequence numbers stamped onto data files to figure out what to do? Many thanks for your guidance... -- AWG
Hi Andrew, Il 29/11/12 02:52, Andrew W. Gibbs ha scritto: > That said, I'm wondering whether I created the base backup with > reliably correct procedures or merely was lucky. PostgreSQL documentation on how continuous archiving works is clear, but you need to follow exactly what it says and have a clear understanding and how Postgres works. In order to simplify the management of disaster recovery solutions for PostgreSQL, we have recently developed and distributed an open source project called Barman. I am not sure whether you have looked at it or not. In this case, I suggest you give it a go. More information at http://www.pgbarman.org/ It is also available as RPM and Debian/Ubuntu package and works from PostgreSQL 8.4 onwards. Barman takes care of backups, works remotely and assist DBAs during recovery (both local to the backup server or remote). Cheers, Gabriele -- Gabriele Bartolini - 2ndQuadrant Italia PostgreSQL Training, Services and Support gabriele.bartolini@2ndQuadrant.it | www.2ndQuadrant.it
Gabriele Bartolini wrote: > Il 29/11/12 02:52, Andrew W. Gibbs ha scritto: >> That said, I'm wondering whether I created the base backup with >> reliably correct procedures or merely was lucky. > In order to simplify the management of disaster recovery solutions for > PostgreSQL, we have recently developed and distributed an open source > project called Barman. [...] Gabriele, I understand that you want to spread the word, but the OP's problem was a different one: He wants to create a base backup by just copying the data directory of an 8.4 warm standby server. There is no possibility to run pg_(start|stop)_backup, but the documentation seems to suggest that it could work anyway: http://www.postgresql.org/docs/8.4/static/warm-standby.html#BACKUP-INCREMENTAL-UPDATED I wonder if that's true, and I'd assume that you'd have to backup pg_control first (to assure an old enough checkpoint), but I don't know enough to answer the question. I'd like to know myself. Yours, Laurenz Albe
Hi Laurenz, Il 01/12/12 12:18, Albe Laurenz ha scritto: > Gabriele, I understand that you want to spread the word, > but the OP's problem was a different one: You are right. I totally missed that. I skimmed the email (very quickly) and thought that in that particular context Andrew was referring to PITR in general. > He wants to create a base backup by just copying the > data directory of an 8.4 warm standby server. > There is no possibility to run pg_(start|stop)_backup, > but the documentation seems to suggest that it could work > anyway: > http://www.postgresql.org/docs/8.4/static/warm-standby.html#BACKUP-INCREMENTAL-UPDATED Yes, it is possible. I'd like to try and implement that in Barman, in order to automate this. I'd try and follow this path though: pg_start_backup() on the master, wait for the operation to be replayed on the standby (with pg_controldata if < 9.0 or through hot standby), rsync/tar from the standby, then pg_stop_backup() on the master. If you think of following this path yourself, please make sure you test it accurately (especially for production environment). Sorry Andrew about misunderstanding the original question. Cheers, Gabriele -- Gabriele Bartolini - 2ndQuadrant Italia PostgreSQL Training, Services and Support gabriele.bartolini@2ndQuadrant.it | www.2ndQuadrant.it
Gabriele Bartolini wrote: >> Gabriele, I understand that you want to spread the word, >> but the OP's problem was a different one: > You are right. I totally missed that. I skimmed the email (very quickly) > and thought that in that particular context Andrew was referring to PITR > in general. I admit that it was a rather long e-mail. >> He wants to create a base backup by just copying the >> data directory of an 8.4 warm standby server. >> There is no possibility to run pg_(start|stop)_backup, >> but the documentation seems to suggest that it could work >> anyway: >> http://www.postgresql.org/docs/8.4/static/warm-standby.html#BACKUP-INCREMENTAL-UPDATED > Yes, it is possible. I'd like to try and implement that in Barman, in > order to automate this. I'd try and follow this path though: > pg_start_backup() on the master, wait for the operation to be replayed > on the standby (with pg_controldata if < 9.0 or through hot standby), > rsync/tar from the standby, then pg_stop_backup() on the master. If you > think of following this path yourself, please make sure you test it > accurately (especially for production environment). A viable alternative could be to shutdown the standby and take an offline backup. It can catch up with WAL after restart. Yours, Laurenz Albe