Обсуждение: Startup process thrashing
Hello Everyone, I've got a bit of a problem. It started last night when postgres (8.1.9) went down citing the need for a vacuum full to be done due to the transaction log needing to wraparound. So I stopped the server, logged in using a standalone backend and started a vacuum full analyze on all dbs (one being 156GB). During the vacuum of the larger of the databases a few hours in it failed, it's filled up the 18GB pg_xlog partition with over 1000 wal files. Due to running out of space the vacuum failed. When I came in this morning I attempted to start postgres using the normal init script, and now it's stuck. The startup process is thrashing the disks and working hard, pg_controldata says it's in recovery, but it's been going for over two hours. My question is where I should go from here? Should i kill the startup script, clear out the excess wal files, start the standalone server and try vacuum again? Or should I just wait and see if the startup process sorts itself out? The startup process is responding to login attempts with FATAL: the database system is starting up and logging these attempts so I assume it's still alive and working... I appreciate any help and advice, I really hope it's not going to turn into lost data (gulp). Output from pg_controldata: pg_control version number: 812 Catalog version number: 200510211 Database system identifier: 5142157718116482999 Database cluster state: in recovery pg_control last modified: Wed 10 Dec 2008 05:55:52 PM CST Current log file ID: 811 Next log file segment: 221 Latest checkpoint location: 327/8AE2BED0 Prior checkpoint location: 327/8AE2BE80 Latest checkpoint's REDO location: 327/8AE2BED0 Latest checkpoint's UNDO location: 0/0 Latest checkpoint's TimeLineID: 1 Latest checkpoint's NextXID: 2146484231 Latest checkpoint's NextOID: 123620 Latest checkpoint's NextMultiXactId: 806872 Latest checkpoint's NextMultiOffset: 1766404 Time of latest checkpoint: Wed 10 Dec 2008 06:01:01 AM CST Maximum data alignment: 8 Database block size: 8192 Blocks per segment of large relation: 131072 Bytes per WAL segment: 16777216 Maximum length of identifiers: 64 Maximum columns in an index: 32 Date/time type storage: floating-point numbers Maximum length of locale name: 128 LC_COLLATE: en_US.UTF-8 LC_CTYPE: en_US.UTF-8
On Thu, 11 Dec 2008, Phillip Berry wrote: > I've got a bit of a problem. It started last night when postgres > (8.1.9) went down citing the need for a vacuum full to be done due to > the transaction log needing to wraparound. Not exactly. What it said was "To avoid a database shutdown, execute a full-database VACUUM". In that context, "full" means you vacuum everything in the database, but only regular VACUUM is needed. VACUUM FULL, as you learned the hard way, is a more intensive operation, and it's not needed to resolve the problem you started with. It's a somewhat unfortunately worded HINT. > During the vacuum of the larger of the databases a few hours in it > failed, it's filled up the 18GB pg_xlog partition with over 1000 wal > files. Due to running out of space the vacuum failed. Ouch. Are you running PITR recovery by setting archive_command? Did you set checkpoint_segments to some very high value? 1000 WAL files is not normal, curious how you ended up with so many of them. > When I came in this morning I attempted to start postgres using the normal init script, and now it's > stuck. The startup process is thrashing the disks and working hard, pg_controldata says it's in > recovery, but it's been going for over two hours. It takes a long time to sort through 1000 WAL files and figure out if the database is consistent for every transactions mentioned there. If your VACUUM FULL ran for several hours and kicked out 1000 of them, it would be reasonable to expect the cleanup to also take many hours. > My question is where I should go from here? Should i kill the startup script, clear out the excess > wal files, start the standalone server and try vacuum again? Deleting the WAL files like that will leave your database completely trashed. The utility that's provided to do the job you're considering is pg_resetxlog: http://www.postgresql.org/docs/8.1/static/app-pgresetxlog.html Which is slightly safer, but note the dire warnings there. You are very likely to get some sort of data corruption if you do that, and you won't know where it is. You'll be facing a dump and restore to sort that out, and if you think the server startup is taking a while on a 156GB database you're really not going to be happy with how long a restore takes. The safest thing you can do here is just wait for the server to finish recovery so it starts up. Watch the system activity with something like vmstat. If the server process is busy using the CPU and it's doing stuff with the disks, if you have evidence it's making progress, you'll be hard pressed to execute any manual recovery that's any safer or more efficient than that is. Someone else may be able to point you toward better estimating how far it's got left to go, I haven't ever been stuck in your position for long enough before to figure that out myself. Good luck. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Hi Greg, I appreciate the reply. Fortunately within the last 10 minutes it has finished the recovery...and then promptly shut itself down again. The exact error is in fact: FATAL: database is not accepting commands to avoid wraparound data loss in database "aim" 2008-12-10 06:00:02 CST [21392]: [4-1] HINT: Stop the postmaster and use a standalone backend to vacuum database "aim". So I'm back to where I was last night which is a slight relief. The difference being that the pg_xlog is still full and when I attempt to start Postgres in single user mode it fails becasue of that. I'm not running PITR and checkpoint_segments is set to 100 as this is home to a very write intensive app. The db has been running for over a year and in that time has been vacuumed many times without issue. So now I guess I have two questions: 1. Will copying the pg_xlog dir onto a larger partition cause any problems? 2. Is there any reason that vacuuming would create so many WAL files? I've looked around and can't find any explaination, though it's a problem that seems to have happened before. Actually make that three questions... 3. I installed pgbouncer yesterday for connection pooling, does the fact that it holds connections open have any bearing on how rapidly the transaction IDs are used up? As in are transaction IDs in anyway related to connections? It's a pretty interesting coincidence. Cheers Phil On Thursday 11 December 2008 15:20:21 you wrote: > On Thu, 11 Dec 2008, Phillip Berry wrote: > > I've got a bit of a problem. It started last night when postgres > > (8.1.9) went down citing the need for a vacuum full to be done due to > > the transaction log needing to wraparound. > > Not exactly. What it said was "To avoid a database shutdown, execute a > full-database VACUUM". In that context, "full" means you vacuum > everything in the database, but only regular VACUUM is needed. VACUUM > FULL, as you learned the hard way, is a more intensive operation, and it's > not needed to resolve the problem you started with. It's a somewhat > unfortunately worded HINT. > > > During the vacuum of the larger of the databases a few hours in it > > failed, it's filled up the 18GB pg_xlog partition with over 1000 wal > > files. Due to running out of space the vacuum failed. > > Ouch. Are you running PITR recovery by setting archive_command? Did you > set checkpoint_segments to some very high value? 1000 WAL files is not > normal, curious how you ended up with so many of them. > > > When I came in this morning I attempted to start postgres using the > > normal init script, and now it's stuck. The startup process is thrashing > > the disks and working hard, pg_controldata says it's in recovery, but > > it's been going for over two hours. > > It takes a long time to sort through 1000 WAL files and figure out if the > database is consistent for every transactions mentioned there. If your > VACUUM FULL ran for several hours and kicked out 1000 of them, it would be > reasonable to expect the cleanup to also take many hours. > > > My question is where I should go from here? Should i kill the startup > > script, clear out the excess wal files, start the standalone server and > > try vacuum again? > > Deleting the WAL files like that will leave your database completely > trashed. The utility that's provided to do the job you're considering is > pg_resetxlog: > > http://www.postgresql.org/docs/8.1/static/app-pgresetxlog.html > > Which is slightly safer, but note the dire warnings there. You are very > likely to get some sort of data corruption if you do that, and you won't > know where it is. You'll be facing a dump and restore to sort that out, > and if you think the server startup is taking a while on a 156GB database > you're really not going to be happy with how long a restore takes. > > The safest thing you can do here is just wait for the server to finish > recovery so it starts up. Watch the system activity with something like > vmstat. If the server process is busy using the CPU and it's doing stuff > with the disks, if you have evidence it's making progress, you'll be hard > pressed to execute any manual recovery that's any safer or more efficient > than that is. > > Someone else may be able to point you toward better estimating how far > it's got left to go, I haven't ever been stuck in your position for long > enough before to figure that out myself. Good luck. > > -- > * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Thu, 11 Dec 2008, Phillip Berry wrote: > I'm not running PITR and checkpoint_segments is set to 100 as this is > home to a very write intensive app. That's weird then. It shouldn't ever keep around more than 201 WAL segments. I've heard one report of a similarly mysterious excess of them, from Robert Treat, but that was probably caused by a hardware failure. > 1. Will copying the pg_xlog dir onto a larger partition cause any > problems? Ah, I didn't realize the disk it's on was still 100% full from what you said. You're not going to get anywhere until that's sorted out. You absolutely can move them all to another disk, you just can't delete them altogether or split them across logical disks. The usual technique is something like this: cd $PGDATA mv pg_xlog $BIGGER ln -s $BIGGER/pg_xlog And then fire the server up in single-user mode to issue the VACUUM it's begging for. Once it's back to running in regular mode again, issue a "CHECKPOINT" command, and it should erase most of those WAL files. Then you could stop the server and reverse the above: cd $PGDATA unlink pg_xlog mv $BIGGER/pg_xlog . While filling that disk and grinding to a halt is bad, it's not necessarily any worse then the alternative if you didn't have it on a dedicated filesystem--that this out of control xlog creation might create an even greater backlog of segments. > 2. Is there any reason that vacuuming would create so many WAL files? > I've looked around and can't find any explaination, though it's a > problem that seems to have happened before. Nothing should make that many WAL files, normally the only way you can get into that position is if you're running PITR recovery but not archiving the segments--so they just keep piling up but the server can't recycle them. What's supposed to happen is that every time there is a checkpoint, it deletes anything beyond 2*checkpoint_segments+1 worth of segments, that's where the 201 figure I mentioned comes from. > 3. I installed pgbouncer yesterday for connection pooling, does the fact > that it holds connections open have any bearing on how rapidly the > transaction IDs are used up? As in are transaction IDs in anyway related > to connections? It's a pretty interesting coincidence. I'm not really familiar with pgbouncer to comment on the specifics of what it does. In general, keeping transactions open longer will impact the pg_clog section of the database, but that shouldn't cause the WAL to grow out of control. Once you get this back into production, you certainly should carefully monitor how much disk space is being taken up on the WAL disk moving forward. If the count of them there gets back over 201 again, that's an indication there's something that shouldn't be happening going on. That's the sort of thing you can get more information about This isn't a real popular time of day for this list, some get some more ideas might show up once our many European members start posting in volume. As a general aside, if you ever find yourself in this position again, where you've got an urgent database problem, something you might do in parallel with posting here is trying the IRC channel: http://wiki.postgresql.org/wiki/IRC2RWNames -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith <gsmith@gregsmith.com> writes: > Not exactly. What it said was "To avoid a database shutdown, execute a > full-database VACUUM". In that context, "full" means you vacuum > everything in the database, but only regular VACUUM is needed. VACUUM > FULL, as you learned the hard way, is a more intensive operation, and it's > not needed to resolve the problem you started with. It's a somewhat > unfortunately worded HINT. Maybe we could rephrase it as "whole-database VACUUM"? regards, tom lane
Tom Lane wrote: > Greg Smith <gsmith@gregsmith.com> writes: > > Not exactly. What it said was "To avoid a database shutdown, execute a > > full-database VACUUM". In that context, "full" means you vacuum > > everything in the database, but only regular VACUUM is needed. VACUUM > > FULL, as you learned the hard way, is a more intensive operation, and it's > > not needed to resolve the problem you started with. It's a somewhat > > unfortunately worded HINT. > > Maybe we could rephrase it as "whole-database VACUUM"? "database-wide VACUUM"? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> Maybe we could rephrase it as "whole-database VACUUM"? > "database-wide VACUUM"? Yeah, that's probably better, because I think we use that phrase in the documentation already. regards, tom lane
Greg Smith <gsmith@gregsmith.com> writes: > On Thu, 11 Dec 2008, Phillip Berry wrote: >> I'm not running PITR and checkpoint_segments is set to 100 as this is >> home to a very write intensive app. > That's weird then. It shouldn't ever keep around more than 201 WAL > segments. I've heard one report of a similarly mysterious excess of them, > from Robert Treat, but that was probably caused by a hardware failure. AFAIK the only non-PITR reason for WAL files to not get recycled is if checkpoints were failing. Do you still have the postmaster log from before the original crash, and if so is there anything in there about checkpoint failures? regards, tom lane
On Thu, Dec 11, 2008 at 9:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Greg Smith <gsmith@gregsmith.com> writes: >> On Thu, 11 Dec 2008, Phillip Berry wrote: >>> I'm not running PITR and checkpoint_segments is set to 100 as this is >>> home to a very write intensive app. > >> That's weird then. It shouldn't ever keep around more than 201 WAL >> segments. I've heard one report of a similarly mysterious excess of them, >> from Robert Treat, but that was probably caused by a hardware failure. > > AFAIK the only non-PITR reason for WAL files to not get recycled is if > checkpoints were failing. Do you still have the postmaster log from > before the original crash, and if so is there anything in there about > checkpoint failures? Don't forget that the OP mentioned earlier that he had very long help open connections with possible long help open transactions.
On Thu, Dec 11, 2008 at 10:09 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > Don't forget that the OP mentioned earlier that he had very long help > open connections with possible long help open transactions. Long held. held. not help.
"Scott Marlowe" <scott.marlowe@gmail.com> writes: > On Thu, Dec 11, 2008 at 9:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> AFAIK the only non-PITR reason for WAL files to not get recycled is if >> checkpoints were failing. Do you still have the postmaster log from >> before the original crash, and if so is there anything in there about >> checkpoint failures? > Don't forget that the OP mentioned earlier that he had very long help > open connections with possible long help open transactions. Open transactions don't prevent checkpoints and thus not WAL truncation either. You're confusing this with vacuuming. regards, tom lane
On Friday 12 December 2008 03:59:42 Tom Lane wrote: > Greg Smith <gsmith@gregsmith.com> writes: > > On Thu, 11 Dec 2008, Phillip Berry wrote: > >> I'm not running PITR and checkpoint_segments is set to 100 as this is > >> home to a very write intensive app. > > > > That's weird then. It shouldn't ever keep around more than 201 WAL > > segments. I've heard one report of a similarly mysterious excess of > > them, from Robert Treat, but that was probably caused by a hardware > > failure. > > AFAIK the only non-PITR reason for WAL files to not get recycled is if > checkpoints were failing. Do you still have the postmaster log from > before the original crash, and if so is there anything in there about > checkpoint failures? No the crash happened in single user mode so their was (AFAIK) no logs of what happened outside of the console. I've grepped all the recent logs anyway and there's no mention of checkpoint problems. Cheers Phil