Обсуждение: HELP!!! The WAL Archive is taking up all space
Hi all, Please help... I have 1 master PostgreSQL and 1 standby PostgreSQL. Both servers has the same OS Linux Debian Wheezy, the same hardware. Both server hardware: CPU: 24 cores RAM: 128GB Disk-1: 800GB SAS (for OS, logs, WAL archive directory) Disk-2: 330GB SSD (for PostgreSQL data directory, except WAL archive and except pg_log) The part of the configuration are as below: checkpoint_segments = 64 checkpoint_completion_target = 0.9 default_statistics_target = 10 maintenance_work_mem = 1GB effective_cache_size = 64GB shared_buffers = 24GB work_mem = 5MB wal_buffers = 8MB wal_keep_segments = 4096 wal_level = hot_standby max_wal_senders = 10 archive_mode = on archive_command = 'cp -i %p /home/postgres/archive/master/%f' The WAL archive folder is at /home/postgres/archive/master/, right? This directory consumes around 750GB of Disk-1. Each segment in the /home/postgres/archive/master/ is 16MB each There are currently 47443 files in this folder. If I want to limit the total size use by WAL archive to around 200-400 GB, what value should I set for the wal_keep_segments, checkpoint_segments? Regards, Fattah
On Wednesday, December 09, 2015 07:55:09 AM FattahRozzaq wrote: > archive_mode = on > archive_command = 'cp -i %p /home/postgres/archive/master/%f' > > > The WAL archive folder is at /home/postgres/archive/master/, right? > This directory consumes around 750GB of Disk-1. > Each segment in the /home/postgres/archive/master/ is 16MB each > There are currently 47443 files in this folder. > > If I want to limit the total size use by WAL archive to around 200-400 > GB, what value should I set for the wal_keep_segments, > checkpoint_segments? PostgreSQL doesn't clean up files copied by your archive_command. You need to have a separate task clean those out. PostgreSQL's active wal_keep_segments etc. are in the data/pg_xlog directory.
Вложения
On 12/09/2015 11:15 AM, Alan Hodgson wrote: > On Wednesday, December 09, 2015 07:55:09 AM FattahRozzaq wrote: >> archive_mode = on >> archive_command = 'cp -i %p /home/postgres/archive/master/%f' >> >> >> The WAL archive folder is at /home/postgres/archive/master/, right? >> This directory consumes around 750GB of Disk-1. >> Each segment in the /home/postgres/archive/master/ is 16MB each >> There are currently 47443 files in this folder. >> >> If I want to limit the total size use by WAL archive to around 200-400 >> GB, what value should I set for the wal_keep_segments, >> checkpoint_segments? > > PostgreSQL doesn't clean up files copied by your archive_command. You need to > have a separate task clean those out. PostgreSQL's active wal_keep_segments > etc. are in the data/pg_xlog directory. > The OP might want to take a look at: http://www.postgresql.org/docs/9.4/interactive/pgarchivecleanup.html To be safe I would use: -n Print the names of the files that would have been removed on stdout (performs a dry run). at first. -- Adrian Klaver adrian.klaver@aklaver.com
On 12/8/2015 4:55 PM, FattahRozzaq wrote: > ...I want to limit the total size use by WAL archive to around 200-400 GB...? for what purpose are you keeping a wal archive ? if its for PITR (point in time recovery), you need ALL WAL records since the start of a base backup up to the point in time at which you wish to recover. -- john r pierce, recycling bits in santa cruz
Hi John, I really don't know why I should keep the wal archives. I implement streaming replication into 1 server (standby server). I'm really newbie to PostgreSQL but the boss pushed me to handle it and implement it in production f*&%*$%%$#%$# (forgive me) They don't hire a database expert, I don't know why. Hi Alan, I have no problem with pg_xlog size which is only 67GB. My issue is with the archive folder size. I've read in this blog: http://blog.endpoint.com/2014/09/pgxlog-disk-space-problem-on-postgres.html And I change the archive_command into: archive_command = '/bin/true' Is it a bad decision? Hi Adrian, Thank you very much for the suggestion. is pgarchivecleanup harmful? Do you have a pgarchivecleanup example for my case? How to run a dry-run? Thank you, Fattah
Quick information, After I realize, the line "archive_command=/bin/true" is a bad decision, I have revert it back. Now I'm really confused and panic. I don't know what to do, and I don't really understand the postgresql.conf I'm a network engineer, I should handle the network and also postgresql database. Oh man, the office is so good but this part is sucks :(( -- On 10/12/2015, FattahRozzaq <ssoorruu@gmail.com> wrote: > Hi John, > > I really don't know why I should keep the wal archives. > I implement streaming replication into 1 server (standby server). > I'm really newbie to PostgreSQL but the boss pushed me to handle it > and implement it in production f*&%*$%%$#%$# (forgive me) > They don't hire a database expert, I don't know why. > > Hi Alan, > I have no problem with pg_xlog size which is only 67GB. > My issue is with the archive folder size. > I've read in this blog: > http://blog.endpoint.com/2014/09/pgxlog-disk-space-problem-on-postgres.html > And I change the archive_command into: archive_command = '/bin/true' > Is it a bad decision? > > Hi Adrian, > Thank you very much for the suggestion. is pgarchivecleanup harmful? > Do you have a pgarchivecleanup example for my case? > How to run a dry-run? > > > Thank you, > Fattah >
On 12/09/2015 04:38 PM, FattahRozzaq wrote: > Quick information, > > After I realize, the line "archive_command=/bin/true" is a bad > decision, I have revert it back. > Now I'm really confused and panic. > I don't know what to do, and I don't really understand the postgresql.conf > I'm a network engineer, I should handle the network and also > postgresql database. > Oh man, the office is so good but this part is sucks :(( If the pg_xlog directory is growing it is likely that either: * wal_keep_segments is set high and your slave is not correctly receiving updates. * You are using a replication slot and the slave is not correctly receiving updates. If your archive_command does not return a success, your pg_xlog will also grow but you don't need the archive_command *IF* your streaming replication is working *UNLESS* you are also doing archiving or PITR. Sincerely, JD -- Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 PostgreSQL Centered full stack support, consulting and development. Announcing "I'm offended" is basically telling the world you can't control your own emotions, so everyone else should do it for you.
On 12/9/2015 4:27 PM, FattahRozzaq wrote: > I really don't know why I should keep the wal archives. > I implement streaming replication into 1 server (standby server). > I'm really newbie to PostgreSQL but the boss pushed me to handle it > and implement it in production f*&%*$%%$#%$# (forgive me) > They don't hire a database expert, I don't know why. primary use for wal archives is to implement a point-in-time-recovery (PITR) backup system. It is effectively a continuous backup of your databases, that can be restored to an arbitrary point-in-time. typically, every so often (maybe once a week?) you would create a base backup. I personally would keep the last 2 basebackups, plus all wal archives since the start of the older of those two. both the WAL archives and base backups should be stored on a separate storage system,, NOT either the master or standby database servers, typically via NFS. one usecase for this basebackup+wal archive is fast recovery in case of master failure... if your master fails, you need to promote the standby to master, then you'll want to bring up a new standby server, this can be done by using the most recent base backup and then playing back all wal archives to it til it catches up, and then it can stream from the recently promoted master which was the standby and you have your high availability redundancy back. another use case, and perhaps more critical one.. say someone does something nasty to your databases, like drops the wrong table, or clobbers a bunch of financial data. this will quickly replicate to the standby, rendering it equally useless for recovery. With PITR, you can restore that most recent basebackup, then play back the WAL archive up just to before the transaction that clobbered your critical data. you would need to do this on both the master and slave, resuming replication in the correct sequence. about the only use case for wal archives without a base backup is speeding up the resuming of a standby server that got behind, perhaps due to being shutdown for hardware or OS maintenance, or whatever. if you don't have a wal archive, the standby server has to request all the xlog's since it last streamed from the master, while with a wal archive, it can fetch as many as it can from the wal archive THEN catch up with the master. This lowers the workload on the master. -- john r pierce, recycling bits in santa cruz
Hi John, Really thanking you for spend time typing and responding my email. I think the archive_command returns success, I can see the archive directory piling up 16MB every 2 minutes. Maybe the pgarchivecleanup is the solution to cleanup the contents of archive folder? How to properly do it? What is the pgarchivecleanup example that I can use for this case? How to run a dry-run for pgarchivecleanup? Best Regards, FR On 10/12/2015, Joshua D. Drake <jd@commandprompt.com> wrote: > On 12/09/2015 04:38 PM, FattahRozzaq wrote: >> Quick information, >> >> After I realize, the line "archive_command=/bin/true" is a bad >> decision, I have revert it back. >> Now I'm really confused and panic. >> I don't know what to do, and I don't really understand the >> postgresql.conf >> I'm a network engineer, I should handle the network and also >> postgresql database. >> Oh man, the office is so good but this part is sucks :(( > > If the pg_xlog directory is growing it is likely that either: > > * wal_keep_segments is set high and your slave is not correctly > receiving updates. > > * You are using a replication slot and the slave is not correctly > receiving updates. > > If your archive_command does not return a success, your pg_xlog will > also grow but you don't need the archive_command *IF* your streaming > replication is working *UNLESS* you are also doing archiving or PITR. > > Sincerely, > > JD > > -- > Command Prompt, Inc. - http://www.commandprompt.com/ 503-667-4564 > PostgreSQL Centered full stack support, consulting and development. > Announcing "I'm offended" is basically telling the world you can't > control your own emotions, so everyone else should do it for you. >
On 12/09/2015 04:27 PM, FattahRozzaq wrote: > Hi John, > > I really don't know why I should keep the wal archives. So who set up the archiving and why? Is archive recovery set up on the standby?: http://www.postgresql.org/docs/9.4/interactive/archive-recovery-settings.html > I implement streaming replication into 1 server (standby server). Is that the only standby or is there another set up previously? Per another recent thread having a WAL archive to fall back on is handy if the streaming replication falls behind and wal_keep_segments is not high enough: http://www.postgresql.org/docs/9.4/interactive/warm-standby.html#STREAMING-REPLICATION "If you use streaming replication without file-based continuous archiving, the server might recycle old WAL segments before the standby has received them. If this occurs, the standby will need to be reinitialized from a new base backup. You can avoid this by setting wal_keep_segments to a value large enough to ensure that WAL segments are not recycled too early, or by configuring a replication slot for the standby. If you set up a WAL archive that's accessible from the standby, these solutions are not required, since the standby can always use the archive to catch up provided it retains enough segments." > I'm really newbie to PostgreSQL but the boss pushed me to handle it > and implement it in production f*&%*$%%$#%$# (forgive me) > They don't hire a database expert, I don't know why. > > Hi Alan, > I have no problem with pg_xlog size which is only 67GB. > My issue is with the archive folder size. > I've read in this blog: > http://blog.endpoint.com/2014/09/pgxlog-disk-space-problem-on-postgres.html > And I change the archive_command into: archive_command = '/bin/true' > Is it a bad decision? > > Hi Adrian, > Thank you very much for the suggestion. is pgarchivecleanup harmful? > Do you have a pgarchivecleanup example for my case? > How to run a dry-run? Before you do any of the below make sure you know what your replication/archiving set up is, per the questions above. On the standby do: select pg_last_xlog_receive_location(); This will give you the last WAL synced to the standby disk. You can use that to see what WAL files are before this in the WAL archive. Then you can do this: pg_archivecleanup -d -n <archive_dir> <some_wal_file_before_the one you found_above> > > > Thank you, > Fattah > > -- Adrian Klaver adrian.klaver@aklaver.com
> FattahRozzaq <ssoorruu@gmail.com> hat am 10. Dezember 2015 um 01:27 > geschrieben: > > > Hi John, > > I really don't know why I should keep the wal archives. That's the problem! But that's your part, not our. If you need a Backup with PITR-capability you have to create a so called basebackup and continously WAL's. If you create later, say the next day, a new Basebackup and your Backup-Policy is hold one Backup, than you can delete all WAL's untill to the new Basebackup and the old Backup. If i where you i would use somethink like barman (see: http://www.pgbarman.org/ ) for that. And yes: you should a extra Backup-Server. If you have both (Database and Backup) on the same machine and the machine burns you will lost both, data and backup. Questions? > I implement streaming replication into 1 server (standby server). Streamin Replication can't replace a Backup! > I'm really newbie to PostgreSQL but the boss pushed me to handle it > and implement it in production f*&%*$%%$#%$# (forgive me) > They don't hire a database expert, I don't know why. You can learn that. PostgreSQL is really, really great. Btw.: i know companies providing remote DBA service for PostgreSQL. > http://blog.endpoint.com/2014/09/pgxlog-disk-space-problem-on-postgres.html > And I change the archive_command into: archive_command = '/bin/true' > Is it a bad decision? Maybe. You don't have a backup now. In case of desaster your boss will fire YOU! (and not me)
On 12/9/15 7:05 PM, Andreas Kretschmer wrote: >> I'm really newbie to PostgreSQL but the boss pushed me to handle it >> >and implement it in production f*&%*$%%$#%$# (forgive me) >> >They don't hire a database expert, I don't know why. > You can learn that. PostgreSQL is really, really great. > Btw.: i know companies providing remote DBA service for PostgreSQL. Even if you want to learn all about Postgres, I'd strongly recommend your company get some kind of a contract in place with an experienced Postgres expert. Databases are some of the most complicated pieces of software out there, and there's any number of ways you can really screw yourself if you're not careful. Almost all other software is stateless and pretty easy to fix mistakes in (screwed up a firewall config? Hit the console, fix it, and you're good. Screwed up the database? All your data may now be gone forever!) As a Postgres consultant my opinion is obviously biased, but I've seen plenty of horror stories where recovery of data becomes virtually impossible, frequently without any kind of a backup in place. It is possible that the data you're storing just isn't that important (as hard as database people find that to believe!). If that's your case then you can probably just turn off archiving and not worry about it. Finally, as someone else said, *a replica is NOT a backup!* -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com