Обсуждение: [ADMIN] Bad recovery: no pg_xlog/RECOVERYXLOG
Hi! I try to setup continuous archiving with PG 9.6 according to this documentation: https://www.postgresql.org/docs/9.6/static/continuous-archiving.html I have Postgres wal_archive set to replica, I have archive on and archive command is properly copying WAL segments to backup storage. Having this running, I make a successful tar base backup using pg_basebackup. I then stop the DB, remove the data directory, unpack base backup to it, create recovery.conf with a proper restore_command, run the server, and get: LOG: database system was interrupted; last known up at 2017-10-25 15:47:37 UTC LOG: starting archive recovery Object 'pg_small3/pg_xlog/RECOVERYXLOG.lzo' not found Cannot download pg_xlog/RECOVERYXLOG.lzo LOG: invalid checkpoint record FATAL: could not locate required checkpoint record HINT: If you are not restoring from a backup, try removing the file "/var/lib/postgresql/data/backup_label". LOG: startup process (PID 20) exited with exit code 1 LOG: aborting startup due to startup process failure LOG: database system is shut down The message about "pg_xlog/RECOVERYXLOG.lzo" is written out by restore_command. Indeed, the file is not in the backup storage, and pg_xlog/RECOVERYXLOG was NEVER sent there by archive_command (which compresses and adds .lzo extension)! What could I be doing wrong? -- Marcin Koziej GPG key: http://go.cahoots.pl/gpg/ Ϟ Twitter: @movonw -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
I got it. The semantics of archive_command suggets, that for recovery_command %f is basename of %p. This was not the case: %p is local file in data dir, %f is remote (backed up) file name. Marcin Koziej GPG key: http://go.cahoots.pl/gpg/ Ϟ Twitter: @movonw On 25.10.2017 18:50, Marcin Koziej wrote: > Hi! > > I try to setup continuous archiving with PG 9.6 according to this > documentation: > https://www.postgresql.org/docs/9.6/static/continuous-archiving.html > > I have Postgres wal_archive set to replica, I have archive on and > archive command is properly copying WAL segments to backup storage. > > Having this running, I make a successful tar base backup using > pg_basebackup. > > I then stop the DB, remove the data directory, unpack base backup to it, > create recovery.conf with a proper restore_command, run the server, and get: > > LOG: database system was interrupted; last known up at 2017-10-25 > 15:47:37 UTC > LOG: starting archive recovery > Object 'pg_small3/pg_xlog/RECOVERYXLOG.lzo' not found > Cannot download pg_xlog/RECOVERYXLOG.lzo > LOG: invalid checkpoint record > FATAL: could not locate required checkpoint record > HINT: If you are not restoring from a backup, try removing the file > "/var/lib/postgresql/data/backup_label". > LOG: startup process (PID 20) exited with exit code 1 > LOG: aborting startup due to startup process failure > LOG: database system is shut down > > The message about "pg_xlog/RECOVERYXLOG.lzo" is written out by > restore_command. Indeed, the file is not in the backup storage, and > pg_xlog/RECOVERYXLOG was NEVER sent there by archive_command (which > compresses and adds .lzo extension)! > > What could I be doing wrong? > -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
Hi!
I try to setup continuous archiving with PG 9.6 according to this
documentation:
https://www.postgresql.org/docs/9.6/static/continuous-archiving.html
I have Postgres wal_archive set to replica, I have archive on and
archive command is properly copying WAL segments to backup storage.
Having this running, I make a successful tar base backup using
pg_basebackup.
I then stop the DB, remove the data directory, unpack base backup to it,
create recovery.conf with a proper restore_command, run the server, and get:
LOG: database system was interrupted; last known up at 2017-10-25
15:47:37 UTC
LOG: starting archive recovery
Object 'pg_small3/pg_xlog/RECOVERYXLOG.lzo' not found
Cannot download pg_xlog/RECOVERYXLOG.lzo
LOG: invalid checkpoint record
FATAL: could not locate required checkpoint record
HINT: If you are not restoring from a backup, try removing the file
"/var/lib/postgresql/data/backup_label".
LOG: startup process (PID 20) exited with exit code 1
LOG: aborting startup due to startup process failure
LOG: database system is shut down
The message about "pg_xlog/RECOVERYXLOG.lzo" is written out by
restore_command. Indeed, the file is not in the backup storage, and
pg_xlog/RECOVERYXLOG was NEVER sent there by archive_command (which
compresses and adds .lzo extension)!
What could I be doing wrong?
--
Marcin Koziej
GPG key: http://go.cahoots.pl/gpg/ Ϟ Twitter: @movonw
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Now it's fixed, but if anyone needs I'm attaching all scripts to 1) backup and restore wal's and 2) backup and restore base backup from OpenStack SWIFT
Marcin Koziej GPG key: http://go.cahoots.pl/gpg/ Ϟ Twitter: @movonw
Hi Marcin,Could you please share archive_command and restore_command you used? If you are using script inside restore or archive comman, please also share them. It looks like the problem relevant with them.Best regards.Samed YILDIRIM25.10.2017, 19:52, "Marcin Koziej" <marcin@cahoots.pl>:Hi!
I try to setup continuous archiving with PG 9.6 according to this
documentation:
https://www.postgresql.org/docs/9.6/static/continuous-archiving.html
I have Postgres wal_archive set to replica, I have archive on and
archive command is properly copying WAL segments to backup storage.
Having this running, I make a successful tar base backup using
pg_basebackup.
I then stop the DB, remove the data directory, unpack base backup to it,
create recovery.conf with a proper restore_command, run the server, and get:
LOG: database system was interrupted; last known up at 2017-10-25
15:47:37 UTC
LOG: starting archive recovery
Object 'pg_small3/pg_xlog/RECOVERYXLOG.lzo' not found
Cannot download pg_xlog/RECOVERYXLOG.lzo
LOG: invalid checkpoint record
FATAL: could not locate required checkpoint record
HINT: If you are not restoring from a backup, try removing the file
"/var/lib/postgresql/data/backup_label".
LOG: startup process (PID 20) exited with exit code 1
LOG: aborting startup due to startup process failure
LOG: database system is shut down
The message about "pg_xlog/RECOVERYXLOG.lzo" is written out by
restore_command. Indeed, the file is not in the backup storage, and
pg_xlog/RECOVERYXLOG was NEVER sent there by archive_command (which
compresses and adds .lzo extension)!
What could I be doing wrong?
--
Marcin Koziej
GPG key: http://go.cahoots.pl/gpg/ Ϟ Twitter: @movonw
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Вложения
Greetings, * Marcin Koziej (marcin@cahoots.pl) wrote: > Now it's fixed, but if anyone needs I'm attaching all scripts to 1) > backup and restore wal's and 2) backup and restore base backup from > OpenStack SWIFT Interesting, but these scripts seem to be seriously lacking in error checking (what happens if the copy to swift fails..? or pg_basebackup fails?) and it's unclear how you can be sure that the WAL file has been sync'd to disk which is important or you might end up having holes in your WAL stream if the swift system fails. There's also no checking to make sure that the WAL needed for a given pg_basebackup ever actually made it to the swift system, which is required to ensure you have a consistent backup. Generally speaking, these kinds of scripts really aren't a good choice for doing backups of PG. I'd strongly suggest you look at one of the existing tools which are developed specifically for doing backups of PG and are well tested, supported, and maintained. If you'd like support for a new storage system, I know that at least pgBackRest's storage layer is pluggable and adding a new storage option is pretty straight forward. Thanks! Stephen
On 31/10/17 04:47, Stephen Frost wrote: > Greetings, > > * Marcin Koziej (marcin@cahoots.pl) wrote: >> Now it's fixed, but if anyone needs I'm attaching all scripts to 1) >> backup and restore wal's and 2) backup and restore base backup from >> OpenStack SWIFT > Interesting, but these scripts seem to be seriously lacking in error > checking (what happens if the copy to swift fails..? or pg_basebackup > fails?) and it's unclear how you can be sure that the WAL file has been > sync'd to disk which is important or you might end up having holes in > your WAL stream if the swift system fails. There's also no checking to > make sure that the WAL needed for a given pg_basebackup ever actually > made it to the swift system, which is required to ensure you have a > consistent backup. > > Generally speaking, these kinds of scripts really aren't a good choice > for doing backups of PG. I'd strongly suggest you look at one of the > existing tools which are developed specifically for doing backups of PG > and are well tested, supported, and maintained. If you'd like support > for a new storage system, I know that at least pgBackRest's storage > layer is pluggable and adding a new storage option is pretty straight > forward. > > I'm not convinced that his approach is bad. The script checks the result of the 'swift upload' for the base backup, it is the wal backup one that does not explicitly check the 'swift upload' result (this should really be added). To be fair, anything wrong with the swift system will likely be discovered immediately beforehand where he does a 'swift stat'! I'd guess his original problem was an improperly setup recovery.conf, rather than the overall design. regards Mark -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
Mark, * Mark Kirkwood (mark.kirkwood@catalyst.net.nz) wrote: > On 31/10/17 04:47, Stephen Frost wrote: > >* Marcin Koziej (marcin@cahoots.pl) wrote: > >>Now it's fixed, but if anyone needs I'm attaching all scripts to 1) > >>backup and restore wal's and 2) backup and restore base backup from > >>OpenStack SWIFT > >Interesting, but these scripts seem to be seriously lacking in error > >checking (what happens if the copy to swift fails..? or pg_basebackup > >fails?) and it's unclear how you can be sure that the WAL file has been > >sync'd to disk which is important or you might end up having holes in > >your WAL stream if the swift system fails. There's also no checking to > >make sure that the WAL needed for a given pg_basebackup ever actually > >made it to the swift system, which is required to ensure you have a > >consistent backup. > > > >Generally speaking, these kinds of scripts really aren't a good choice > >for doing backups of PG. I'd strongly suggest you look at one of the > >existing tools which are developed specifically for doing backups of PG > >and are well tested, supported, and maintained. If you'd like support > >for a new storage system, I know that at least pgBackRest's storage > >layer is pluggable and adding a new storage option is pretty straight > >forward. > > I'm not convinced that his approach is bad. I was the same way for a long time, thinking that shell scripts could reasonably be used with certain caveats, but the devil really is in the details and it's far too easy to miss things in shell scripts (such as not checking return codes, or not doing so properly, or various other issues). Also, you didn't address things like verifying that you actually have all the WAL needed for a valid backup, and how to handle retention? > The script checks the result of the 'swift upload' for the base > backup, it is the wal backup one that does not explicitly check the > 'swift upload' result (this should really be added). To be fair, > anything wrong with the swift system will likely be discovered > immediately beforehand where he does a 'swift stat'! Things could certainly break between those two calls to swift, in a variety of ways. > I'd guess his original problem was an improperly setup > recovery.conf, rather than the overall design. I agree that the original issue is unlikely to be related to these scripts. That doesn't mean that using them is a good idea. Thanks! Stephen
On 01/11/17 00:47, Stephen Frost wrote: > Mark, > > * Mark Kirkwood (mark.kirkwood@catalyst.net.nz) wrote: >> On 31/10/17 04:47, Stephen Frost wrote: >>> * Marcin Koziej (marcin@cahoots.pl) wrote: >>>> Now it's fixed, but if anyone needs I'm attaching all scripts to 1) >>>> backup and restore wal's and 2) backup and restore base backup from >>>> OpenStack SWIFT >>> Interesting, but these scripts seem to be seriously lacking in error >>> checking (what happens if the copy to swift fails..? or pg_basebackup >>> fails?) and it's unclear how you can be sure that the WAL file has been >>> sync'd to disk which is important or you might end up having holes in >>> your WAL stream if the swift system fails. There's also no checking to >>> make sure that the WAL needed for a given pg_basebackup ever actually >>> made it to the swift system, which is required to ensure you have a >>> consistent backup. >>> >>> Generally speaking, these kinds of scripts really aren't a good choice >>> for doing backups of PG. I'd strongly suggest you look at one of the >>> existing tools which are developed specifically for doing backups of PG >>> and are well tested, supported, and maintained. If you'd like support >>> for a new storage system, I know that at least pgBackRest's storage >>> layer is pluggable and adding a new storage option is pretty straight >>> forward. >> I'm not convinced that his approach is bad. > I was the same way for a long time, thinking that shell scripts could > reasonably be used with certain caveats, but the devil really is in the > details and it's far too easy to miss things in shell scripts (such as > not checking return codes, or not doing so properly, or various other > issues). Also, you didn't address things like verifying that you > actually have all the WAL needed for a valid backup, and how to handle > retention? > >> The script checks the result of the 'swift upload' for the base >> backup, it is the wal backup one that does not explicitly check the >> 'swift upload' result (this should really be added). To be fair, >> anything wrong with the swift system will likely be discovered >> immediately beforehand where he does a 'swift stat'! > Things could certainly break between those two calls to swift, in a > variety of ways. > >> I'd guess his original problem was an improperly setup >> recovery.conf, rather than the overall design. > I agree that the original issue is unlikely to be related to these > scripts. That doesn't mean that using them is a good idea. > > Exactly - ,the original issue is unlikely to be related to these scripts! While I agree that the scripts concerned would benefit from some development/qa etc, I'm really disagreeing with your point that folk 'should not do this'. I would like to say that folk 'should do this' - but try to do it well - and we should help them (isn't that idea kinda tied up with the whole point of these lists)? I'm not keen on us being seen to stifle development just because we know of another existing product that *might* be better. To me that seems like a slippery slope that gets us endorsing only certain vendors solutions (whether they be open source or not). I'm not a fan of that at all. regards Mark -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
Mark, * Mark Kirkwood (mark.kirkwood@catalyst.net.nz) wrote: > While I agree that the scripts concerned would benefit from some > development/qa etc, I'm really disagreeing with your point that folk > 'should not do this'. I would like to say that folk 'should do this' > - but try to do it well - and we should help them (isn't that idea > kinda tied up with the whole point of these lists)? I'm all for someone else starting up a new project to improve the situation around backups for PG. That not going to be three shell scripts amounting to maybe 100 lines of code and what I really am concerned about is people seeing these simple shell scripts thinking "oh, I'll just use these simple things" without realizing that they're going to end up in a bad spot because those simple shell scripts aren't sufficient to do backups with PG properly and reliably. We could store data in a CSV file and access it through shell scripts too and call it a database. If someone posted those as an alternative to PG, I don't doubt that they'd get shot down pretty hard too. These aren't just perfectionism complaints about shell scripts being used to do backups of PG either, I've seen people using them and doing so in ways that result in not having reliable backups which has then lead to literally days of work be lost. Put these shell scripts out on a github website with a big "in development, not for production use, do not use" readme and continue to hack on them as much as you'd like. Don't post them to these lists with a "this is how you do backups in PG". Thanks! Stephen
On 02/11/17 01:28, Stephen Frost wrote: > Mark, > > * Mark Kirkwood (mark.kirkwood@catalyst.net.nz) wrote: >> While I agree that the scripts concerned would benefit from some >> development/qa etc, I'm really disagreeing with your point that folk >> 'should not do this'. I would like to say that folk 'should do this' >> - but try to do it well - and we should help them (isn't that idea >> kinda tied up with the whole point of these lists)? > I'm all for someone else starting up a new project to improve the > situation around backups for PG. That not going to be three shell > scripts amounting to maybe 100 lines of code and what I really am > concerned about is people seeing these simple shell scripts thinking > "oh, I'll just use these simple things" without realizing that they're > going to end up in a bad spot because those simple shell scripts aren't > sufficient to do backups with PG properly and reliably. > > We could store data in a CSV file and access it through shell scripts > too and call it a database. If someone posted those as an alternative > to PG, I don't doubt that they'd get shot down pretty hard too. > > These aren't just perfectionism complaints about shell scripts being > used to do backups of PG either, I've seen people using them and doing > so in ways that result in not having reliable backups which has then > lead to literally days of work be lost. > > Put these shell scripts out on a github website with a big "in > development, not for production use, do not use" readme and continue to > hack on them as much as you'd like. Don't post them to these lists with > a "this is how you do backups in PG". > > I don't think either the original script author - or myself - are attempting to suggest a few shell scripts were the next, complete coverage backup solution (ahem - it is only yourself that is pushing this extreme interpretation)! However there is the use case for people that just want a minimal backup solution that works for their specific environment, and don't want to bring along a lot of extra machinery that a full coverage all-singing-and-dancing product includes - this *can* be accomplished by a few shell scripts. Yes, it does mean that you spend extra time testing and debugging [1]. Err - I think that is all the original author (who is probably scared off now), was wanting a bit of help with. regards Mark [1] which is where a pre-existing more complex solution is likely to be better - it has had more testing in the field, and of course it is fine to point that out. -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
Mark, * Mark Kirkwood (mark.kirkwood@catalyst.net.nz) wrote: > However there is the use case for people that just want a minimal > backup solution that works for their specific environment, and don't > want to bring along a lot of extra machinery that a full coverage > all-singing-and-dancing product includes - this *can* be > accomplished by a few shell scripts. Yes, it does mean that you > spend extra time testing and debugging [1]. Err - I think that is > all the original author (who is probably scared off now), was > wanting a bit of help with. This is exactly the issue that concerns me. I'm not suggesting that these scripts are, or need to be, the end-all, be-all of PG backup solutions. What I'm pointing out is that shell-script based solutions are *broken*, not that they are lacking in features. Many, many years ago I also used to think it was possible to perform a PG backup using just shell scripts and have it be successful and reliable, but since then I've seen too many cases where exactly that has lead to incomplete and invalid backups to be able to agree that they're reasonable to use. Not having a way to reliably sync the WAL files copied by archive command to disk, in particular, really is an issue, it's not some feature, it's a requirement of a functional PG backup system. The other requirement for a functional PG backup system is a check to verify that all of the WAL for a given backup has been archived safely to disk, otherwise the backup is incomplete and can't be used. Both of those basic requirements are, at best, extremely difficult to do in a shell script. Maybe it's possible to do, but I've certainly yet to see it and I'm not going to agree that such "simple" shell scripts should be posted to our mailing lists without someone pointing out that they're broken because, otherwise, people will take and use them and end up with backups that are broken (often right when they end up actually needing it). If you'd like to develop a shell script that addresses these basic requirements of file-based PG backups and ask for critique on it while making it clear that it's in development, I'd be happy to provide comments on it. I won't agree that any shell-based solution that doesn't have these basic requirements met is an acceptable option. Thanks! Stephen
On 02/11/17 11:18, Stephen Frost wrote: > Mark, > > * Mark Kirkwood (mark.kirkwood@catalyst.net.nz) wrote: >> However there is the use case for people that just want a minimal >> backup solution that works for their specific environment, and don't >> want to bring along a lot of extra machinery that a full coverage >> all-singing-and-dancing product includes - this *can* be >> accomplished by a few shell scripts. Yes, it does mean that you >> spend extra time testing and debugging [1]. Err - I think that is >> all the original author (who is probably scared off now), was >> wanting a bit of help with. > This is exactly the issue that concerns me. I'm not suggesting that > these scripts are, or need to be, the end-all, be-all of PG backup > solutions. > > What I'm pointing out is that shell-script based solutions are *broken*, > not that they are lacking in features. Many, many years ago I also used > to think it was possible to perform a PG backup using just shell scripts > and have it be successful and reliable, but since then I've seen too > many cases where exactly that has lead to incomplete and invalid > backups to be able to agree that they're reasonable to use. Not having > a way to reliably sync the WAL files copied by archive command to disk, > in particular, really is an issue, it's not some feature, it's a > requirement of a functional PG backup system. The other requirement for > a functional PG backup system is a check to verify that all of the WAL > for a given backup has been archived safely to disk, otherwise the > backup is incomplete and can't be used. > > Both of those basic requirements are, at best, extremely difficult to > do in a shell script. Maybe it's possible to do, but I've certainly yet > to see it and I'm not going to agree that such "simple" shell scripts > should be posted to our mailing lists without someone pointing out that > they're broken because, otherwise, people will take and use them and end > up with backups that are broken (often right when they end up actually > needing it). > > If you'd like to develop a shell script that addresses these basic > requirements of file-based PG backups and ask for critique on it while > making it clear that it's in development, I'd be happy to provide > comments on it. I won't agree that any shell-based solution that > doesn't have these basic requirements met is an acceptable option. > > Ok, that is interesting. In my experience, provided the a) construction you are using in archive_command correctly reports success/failure, and b) you have some monitoring that checks for archive failure then that requirement of you having the required logs will be fine. Finally that c) your pg_basebackup concoction properly checks return codes then you are fine. All these are reasonably straightforward to implement via shell. Also, if what you are suggesting were actually the case, almost everyone's streaming replication (and/or log shipping) would be broken all the time. With respect to 'If I would like to develop etc etc..' - err, all I was doing in this thread was helping the original poster make his stuff a bit better - I'll continue to do that. Best wishes Mark -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
On 02/11/17 11:18, Stephen Frost wrote: > Not having > a way to reliably sync the WAL files copied by archive command to disk, > in particular, really is an issue, it's not some feature, it's a > requirement of a functional PG backup system. The other requirement for > a functional PG backup system is a check to verify that all of the WAL > for a given backup has been archived safely to disk, otherwise the > backup is incomplete and can't be used. > > Funnily enough, the original poster's scripts were attempting to address (at least some) of this: he was sending stuff to swift, so if he got a ok return code then it is *there* - that being the whole point of a distributed, fault tolerant object store (I do swift support BTW). I wonder if you are seeing this discussion in the light of folk doing backups to unreliable storage locations (e.g: the same server, NFS etc etc), then sure I completely agree with what you are saying (these issue impact backup designs no matter what tool is used to write them). Best wishes Mark -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
Mark, * Mark Kirkwood (mark.kirkwood@catalyst.net.nz) wrote: > Ok, that is interesting. In my experience, provided the a) > construction you are using in archive_command correctly reports > success/failure, and b) you have some monitoring that checks for > archive failure then that requirement of you having the required > logs will be fine. Finally that c) your pg_basebackup concoction > properly checks return codes then you are fine. > > All these are reasonably straightforward to implement via shell. Sure, that'll work much of the time, but that's about like saying that PG could run without fsync being enabled much of the time and everything will be ok. Both are accurate, but hopefully you'll agree that PG really should always be run with fsync enabled. > Also, if what you are suggesting were actually the case, almost > everyone's streaming replication (and/or log shipping) would be > broken all the time. No, again, this isn't an argument about if it'll work most of the time or not, it's about if it's correct. PG without fsync will work most of the time too, but that doesn't mean it's actually correct. > With respect to 'If I would like to develop etc etc..' - err, all I > was doing in this thread was helping the original poster make his > stuff a bit better - I'll continue to do that. Ignoring the basic requirements which I outlined isn't helping him get to a reliable backup system. Thanks! Stephen
Mark, * Mark Kirkwood (mark.kirkwood@catalyst.net.nz) wrote: > On 02/11/17 11:18, Stephen Frost wrote: > > >Not having > >a way to reliably sync the WAL files copied by archive command to disk, > >in particular, really is an issue, it's not some feature, it's a > >requirement of a functional PG backup system. The other requirement for > >a functional PG backup system is a check to verify that all of the WAL > >for a given backup has been archived safely to disk, otherwise the > >backup is incomplete and can't be used. > > Funnily enough, the original poster's scripts were attempting to > address (at least some) of this: he was sending stuff to swift, so > if he got a ok return code then it is *there* - that being the whole > point of a distributed, fault tolerant object store (I do swift > support BTW). There's different levels of storage reliability even in swift and that doesn't do anything to address the issue that you don't know if all of the WAL for a given backup has actually made it to swift. Perhaps it might be useful to also point out here that pg_basebackup is going to exit just as soon as it's done copying the files- it's not going to wait for the WAL to finish getting to swift before returning 'success' because you didn't ask pg_basebackup to pull the WAL in these scripts. What that means is that you could have everything be successful, per your definitions, and still not have a valid backup, and then you decide to rotate off your older backup and then there's a crash. Guess what? You don't have a valid backup anymore because you haven't got all of the necessary WAL for the pg_basebackup that you did do, so you can't use that, and you nuked your prior backup, so that's gone too. Hopefully you have more backups than that, but if not, because you trusted in these scripts and the guarantees of swift, then you've just lost everything. > I wonder if you are seeing this discussion in the light of folk > doing backups to unreliable storage locations (e.g: the same server, > NFS etc etc), then sure I completely agree with what you are saying > (these issue impact backup designs no matter what tool is used to > write them). That you're arguing so hard about this one specific shell script which happens to be based on swift really doesn't convince me that recommending shell-script based backup solutions on PG is a good idea. Doing backups locally may not be ideal for various reasons, but at least if you're making sure to properly fsync the data out to the RAID'd disks, and verifying that your backups are fully fsync'd and that you've checked to make sure you have all of the WAL for a given backup (and that it's all fsync'd) then I'd argue that it's at least conceptually correct. The same goes for NFS, or sending the data to another server, assuming they're set up properly to respect fsync. Simply skipping the requirements to verify that you've got all of the WAL for the backup and that you've made sure that it's all stored on reliable storage isn't correct. Doing proper backups of PG is *hard*. There's a lot of things you have to do correctly to get them to actually be consistently reliable in the face of even single-point failures. Having swift provide reliability guarantees for the archived WAL, provided the shell script is perfectly written to catch all errors and report them back to PG correctly, is great, but it still doesn't address the other requirement of ensuring that all WAL has actually been archived before considering a given backup as complete, and you have to decide what level of guarantees you want from swift and configure it appropriately. If you want simple script-based backups, then use pg_basebackup and make it do the WAL handling as well and then make sure that you've got your script set up to check error codes from pg_basebackup and that you're actually monitoring your backups. Even then there's risks of issues which boil down to cases where even we didn't fsync things out properly leading to cases where WAL or files could be lost due to a crash after pg_basebackup finishing. Hopefully those have all been addressed now, but it's a testiment to the difficulty of doing these things correctly. Thanks! Stephen
Stephen, On 03/11/17 00:11, Stephen Frost wrote: > > Sure, that'll work much of the time, but that's about like saying that > PG could run without fsync being enabled much of the time and everything > will be ok. Both are accurate, but hopefully you'll agree that PG > really should always be run with fsync enabled. It is completely different - this is a 'straw man' argument, and justs serves to confuse this discussion. > >> Also, if what you are suggesting were actually the case, almost >> everyone's streaming replication (and/or log shipping) would be >> broken all the time. > No, again, this isn't an argument about if it'll work most of the time > or not, it's about if it's correct. PG without fsync will work most of > the time too, but that doesn't mean it's actually correct. No, it is pointing out that if your argument were correct, then there should be the above side effects - there are not, which is significant. The crux of your argument seems to be concerning the synchronization between pg_basbackup finishing and being sure you have the required archive logs. Now just so we are all clear, when pg_basebackup ends it essentially calls do_pg_stop_backup (from xlog.c) which ensures that all required WAL files are archived, or to be precise here makes sure archive_command has been run successfully for each required WAL file. Your entire argument seems about whether said WAL is fsync'ed to disk, and how this is impossible to ensure in a shell script. Actually it is possible quite simply: e.g suppose you archive command is: rsync ... targetserver:/disk There are several ways to get that to sync: rsync .. targetserver:/disk && ssh target server sync Alternatively amend vm.dirty_bytes on targetserver to be < 16M, or mount the /disk with sync option! So it is clearly *possible*. However, I think you are obsessing over the minutiae of fsync to single server/disk when there are much more important (read likely to happen) problems to consider. For me, the critical consideration is, not 'are the WAL files there *right now*'..but 'will they be there tomorrow when I need them for a restore'? Next is 'will they be the same/undamaged when I read them tomorrow'? This is why I'm *not* obsessing about fsyncing...make where you store these WAL files *reliable*...either via proxying/ip splitting so you send stuff to more that one server (if we are still thinking server + disk = backup solution). Alternatively use a distributed object store (Swift, S3 etc) that handle that for you, and in addition they checksum and heal any individual node data corruption for you as well. >> With respect to 'If I would like to develop etc etc..' - err, all I >> was doing in this thread was helping the original poster make his >> stuff a bit better - I'll continue to do that. > Ignoring the basic requirements which I outlined isn't helping him get > to a reliable backup system. Actually I was helping him get a *reliable* backup system, I think you misunderstood how swift changes the picture compared to a single server/single disk design. regards Mark -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin
Mark, * Mark Kirkwood (mark.kirkwood@catalyst.net.nz) wrote: > On 03/11/17 00:11, Stephen Frost wrote: > >Sure, that'll work much of the time, but that's about like saying that > >PG could run without fsync being enabled much of the time and everything > >will be ok. Both are accurate, but hopefully you'll agree that PG > >really should always be run with fsync enabled. > > It is completely different - this is a 'straw man' argument, and > justs serves to confuse this discussion. I don't see it as any different at all. The point I was trying to make there is that there's a minimum requirement for backups, just as there is for ACID compliance, and any solution needs to meet that minimum to be considered. > The crux of your argument seems to be concerning the synchronization > between pg_basbackup finishing and being sure you have the required > archive logs. Now just so we are all clear, when pg_basebackup ends > it essentially calls do_pg_stop_backup (from xlog.c) which ensures > that all required WAL files are archived, or to be precise here > makes sure archive_command has been run successfully for each > required WAL file. pg_basebackup talks the replication protocol, to be clear, and sends a BASE_BACKUP message, of which one of the options is 'NOWAIT' to indicate if the server should wait until all of the WAL has been archived. Typically, pg_basebackup does send a 'NOWAIT' to tell the server to not hold up the final message until all of the WAL has been archived, because it's handling the verification of the WAL having been archived. In the unusual case that WAL isn't included with the pg_basebackup it looks like it would wait for the archive_command to complete, which is better than I had thought (and hadn't noticed on my first glance through the code), though that does depend on a functional and perfect archive_command, and there's no shortage of reasons for why that might not be the case at the time the backup is happening. That's an awful lot of action-at-a-distance hope for me to be comfortable with, however. A backup solution really does need to verify that the WAL has been completely and reliably stored, as discussed in the documentation, before claiming a backup is valid, and there's basically no reason not to unless the tool you've chosen to use makes that particularly difficult (even if not *technically* impossible, given enough effort). If your solution is built on the assumption that WAL archiving is always working and there's no check happening during backup to verify that you've got all the WAL then I have serious doubts about it being reliable. If you're independently monitoring that all WAL has been archived, that's certainly helpful, but I don't consider that to be a complete substitute for making sure that you've got all of the WAL for a given backup. > Your entire argument seems about whether said WAL is fsync'ed to > disk, and how this is impossible to ensure in a shell script. [...] > So it is clearly *possible*. Yes, it's possible, but it's not something I'd recommend doing and none of your arguments have made me any more likely to recommend trying to ensure a proper backup has completed using shell scripts. What I fail to understand is your insistence on it being a good idea. I've seen lots and lots of attempts at it, even made some myself, and have come to the generally agreed upon conclusion that it's both a bad idea to hack together your own backup solution for PG and that, even if you do want to try, using shell scripts to attempt to accomplish it is a bad idea. There's much better solutions out there which are really what folks should be using. I'm not against using pg_basebackup either, but if you're using it, let it handle the archiving because it does verify that all of the WAL has been archived properly. > Actually I was helping him get a *reliable* backup system, I think > you misunderstood how swift changes the picture compared to a single > server/single disk design. I do understand the goals of things like swift and s3 and the intent behind them to provide a better store than local disks, and I'm not against using them, to be clear, but they only address one of the requirements that I outlined for a reliable backup solution. I mention both requirements consistently to, hopefully, ensure that those coming along later to read these threads remember that it's more than just making sure that you verify all the WAL has been archived during a backup- but that they've been archived and actually fsync'd or written out to reliable storage. Thanks! Stephen
On 07/11/17 02:37, Stephen Frost wrote: > Mark, > > * Mark Kirkwood (mark.kirkwood@catalyst.net.nz) wrote: >> On 03/11/17 00:11, Stephen Frost wrote: >>> Sure, that'll work much of the time, but that's about like saying that >>> PG could run without fsync being enabled much of the time and everything >>> will be ok. Both are accurate, but hopefully you'll agree that PG >>> really should always be run with fsync enabled. >> It is completely different - this is a 'straw man' argument, and >> justs serves to confuse this discussion. > I don't see it as any different at all. The point I was trying to make > there is that there's a minimum requirement for backups, just as there > is for ACID compliance, and any solution needs to meet that minimum to > be considered. Ok and apologies - I thought you were going all 'schoolboy debating' on me :-) . I'll discuss how I'm seeing this: In the case of a db server running with fsync off, one crash and it may not be able to be restarted - ever, so pretty severe loss of service. In the case of a backup server crashing immediately after a backup (assuming archive logs and backup going to same host for simplicity), then *if undected* it could mean that later you cannot restore this backup - very bad...so in that case I agree with you. However detection (i.e monitoring) is essential otherwise a meticulously fsync'd set of WAL can be lost or corrupted by the various usual suspects too (bad ram/hba/disk...) - with the same result. So assuming we have monitoring doing its thing, after the backup server crashes then missing or damaged WAL can be retrieved from our still running db server - or if they have been recycled, then we need to do another backup. No loss of service. >> The crux of your argument seems to be concerning the synchronization >> between pg_basbackup finishing and being sure you have the required >> archive logs. Now just so we are all clear, when pg_basebackup ends >> it essentially calls do_pg_stop_backup (from xlog.c) which ensures >> that all required WAL files are archived, or to be precise here >> makes sure archive_command has been run successfully for each >> required WAL file. > pg_basebackup talks the replication protocol, to be clear, and sends a > BASE_BACKUP message, of which one of the options is 'NOWAIT' to indicate > if the server should wait until all of the WAL has been archived. > Typically, pg_basebackup does send a 'NOWAIT' to tell the server to not > hold up the final message until all of the WAL has been archived, > because it's handling the verification of the WAL having been archived. > In the unusual case that WAL isn't included with the pg_basebackup it > looks like it would wait for the archive_command to complete, which is > better than I had thought (and hadn't noticed on my first glance through > the code), though that does depend on a functional and perfect > archive_command, and there's no shortage of reasons for why that might > not be the case at the time the backup is happening. > > That's an awful lot of action-at-a-distance hope for me to be > comfortable with, however. A backup solution really does need to verify > that the WAL has been completely and reliably stored, as discussed in > the documentation, before claiming a backup is valid, and there's > basically no reason not to unless the tool you've chosen to use makes > that particularly difficult (even if not *technically* impossible, given > enough effort). If your solution is built on the assumption that WAL > archiving is always working and there's no check happening during backup > to verify that you've got all the WAL then I have serious doubts about > it being reliable. If you're independently monitoring that all WAL has > been archived, that's certainly helpful, but I don't consider that to be > a complete substitute for making sure that you've got all of the WAL for > a given backup. > >> Your entire argument seems about whether said WAL is fsync'ed to >> disk, and how this is impossible to ensure in a shell script. > [...] >> So it is clearly *possible*. > Yes, it's possible, but it's not something I'd recommend doing and none > of your arguments have made me any more likely to recommend trying to > ensure a proper backup has completed using shell scripts. What I fail > to understand is your insistence on it being a good idea. I've seen > lots and lots of attempts at it, even made some myself, and have come to > the generally agreed upon conclusion that it's both a bad idea to hack > together your own backup solution for PG and that, even if you do want > to try, using shell scripts to attempt to accomplish it is a bad idea. > There's much better solutions out there which are really what folks > should be using. I'm not against using pg_basebackup either, but if > you're using it, let it handle the archiving because it does verify that > all of the WAL has been archived properly. > >> Actually I was helping him get a *reliable* backup system, I think >> you misunderstood how swift changes the picture compared to a single >> server/single disk design. Ok, so I think we have moved closer to seeing each other's point of view - been an interesting discussion so far! > I do understand the goals of things like swift and s3 and the intent > behind them to provide a better store than local disks, and I'm not > against using them, to be clear, but they only address one of the > requirements that I outlined for a reliable backup solution. I mention > both requirements consistently to, hopefully, ensure that those coming > along later to read these threads remember that it's more than just > making sure that you verify all the WAL has been archived during a > backup- but that they've been archived and actually fsync'd or written > out to reliable storage. > > Here I think you have still not grasped that (e.g) swift achieves *both* of these - without you attempting to call fsync after your uploads. (for instance in our swift cluster, you would have to have all three data centers down to lose access to your uploaded WAL...and we run with the various storage mounted with barrier=on so the files will be there when the centers return) - note that swift PUT operation (which is what upload is doing) does fsync at the end. regards Mark -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin