Обсуждение: Disc space usage
One of our build servers recently ran out of disc space while trying to copy an entire database. This led me to investigate the database cluster, which is stored on a RAID array with a total size of 1TB. Running a query to list all databases and their sizes did not add up to the amount of space being used by Postgres, so I had a look at the pgsql/base directory. It appears that there are a few large directories that do not correspond to any database. I wonder if these have been left behind accidentally by Postgres. Here are the database directories: Size (kB) Directory Database 32 pgsql_tmp 4352 11510 template0 4368 1 template1 4464 11511 postgres 5368 30103627 xav-userprofile-test 6096 8088167 outerjoins-userprofile-12.0-copy 8676 30103406 xav-test 10052 31313164 common-tgt-items-kmr-modmine 19956 1108178 modmine-3-preview-18-feb-2008 89452 14578911 common-tgt-items-kmr 118940 9952565 production-xav-13 201192 1257481 common-tgt-items-gtocmine-rns 296552 7040137 common-tgt-items-flyminebuild 1557160 9843085 1699624 18456655 common-src-items-flyminebuild 3376096 278561 3995276 9064702 production-unimine-pride-beta5 8528136 1257482 gtocmine-rns 40815456 29233051 42278196 27473906 47112412 28110832 47913532 32728815 production-flyminebuild:ensembl-anopheles 60519524 32841289 production-flyminebuild:go 67626328 27377902 69513844 32856736 production-flyminebuild:flybase-dmel-gene-fasta 74289908 32938724 production-flyminebuild:pubmed-gene 75786720 32941684 production-flyminebuild:biogrid 77361800 32944072 production-flyminebuild:update-publications 80160256 32947141 production-flyminebuild:create-references 81333908 32574190 flybasemine-production 86356140 12110825 87544200 33049747 production-flyminebuild So on this server, the wasted space takes up 276GB, which is not acceptable. I believe that if we re-initialise the cluster and re-create the databases, these directories would disappear. Taking a look at the directory 12110825, all the files inside were last accessed several months ago. So, I have a few questions: 1. Is this space used for anything, or is it just abandoned? Is this a bug? 2. How do I reclaim this wasted space in a safe manner? 3. How do I prevent this happening again? Matthew -- An ant doesn't have a lot of processing power available to it. I'm not trying to be speciesist - I wouldn't want to detract you from such a wonderful creature, but, well, there isn't a lot there, is there? -- Computer Science Lecturer
Matthew Wakeling <matthew@flymine.org> writes: > One of our build servers recently ran out of disc space while trying to > copy an entire database. This led me to investigate the database cluster, > which is stored on a RAID array with a total size of 1TB. Running a query > to list all databases and their sizes did not add up to the amount of > space being used by Postgres, so I had a look at the pgsql/base directory. > It appears that there are a few large directories that do not correspond > to any database. I wonder if these have been left behind accidentally by > Postgres. Anything under $PGDATA/base that doesn't correspond to a live row in pg_database is junk. The interesting question is how it got that way, and in particular how you seem to have managed to have repeated instances of it. I gather that you're in the habit of using CREATE DATABASE to copy large existing databases, so the most likely theory is that these are leftovers from previous failed copy attempts. Now CREATE DATABASE does attempt to clean up if its copying fails, but there are various ways to break that, for instance hitting control-C partway through the cleanup phase. So I'm wondering if maybe that's been done a few times. What PG version is this, anyway? regards, tom lane
On Wed, 8 Oct 2008, Tom Lane wrote: >> It appears that there are a few large directories that do not correspond >> to any database. I wonder if these have been left behind accidentally by >> Postgres. > > Anything under $PGDATA/base that doesn't correspond to a live row in > pg_database is junk. So I can delete it? Might be safer to stop the db server while I do that though. > The interesting question is how it got that way, and in particular how > you seem to have managed to have repeated instances of it. > > I gather that you're in the habit of using CREATE DATABASE to copy > large existing databases, so the most likely theory is that these are > leftovers from previous failed copy attempts. Now CREATE DATABASE > does attempt to clean up if its copying fails, but there are various > ways to break that, for instance hitting control-C partway through the > cleanup phase. So I'm wondering if maybe that's been done a few times. Yes, we do copy large databases quite often, and drop them again. The database cluster was initialised back in March. > What PG version is this, anyway? Postgres 8.3.0 Matthew -- Unfortunately, university regulations probably prohibit me from eating small children in front of the lecture class. -- Computer Science Lecturer
On Wed, 8 Oct 2008, Tom Lane wrote: > The interesting question is how it got that way, and in particular how > you seem to have managed to have repeated instances of it. Speaking to some of my colleagues, sometimes the createdb process fails with a very specific error message. If we wait five seconds and try again, then it succeeds. So, maybe the duff directories are from those failures. The error message is always something like this: createdb: database creation failed: ERROR: could not stat file "base/32285287/32687035": No such file or directory Just before running createdb, we always have some quite heavy write traffic. Is it possible that the changes that we just wrote haven't been checkpointed properly yet, resulting in some of those files being missing from the template database, and therefore the createdb to fail? Matthew -- Now, you would have thought these coefficients would be integers, given that we're working out integer results. Using a fraction would seem really stupid. Well, I'm quite willing to be stupid here - in fact, I'm going to use complex numbers. -- Computer Science Lecturer
Matthew Wakeling <matthew@flymine.org> writes: > On Wed, 8 Oct 2008, Tom Lane wrote: >> Anything under $PGDATA/base that doesn't correspond to a live row in >> pg_database is junk. > So I can delete it? Might be safer to stop the db server while I do that > though. In principle, at least, you shouldn't need to --- there shouldn't be any buffers representing such files. >> What PG version is this, anyway? > Postgres 8.3.0 You should consider an update to 8.3.4. A quick look in the post-8.3.0 CVS logs shows a couple of possibly relevant fixes: 2008-04-18 13:05 tgl * src/: backend/commands/dbcommands.c, include/port.h, port/dirmod.c (REL8_3_STABLE): Fix rmtree() so that it keeps going after failure to remove any individual file; the idea is that we should clean up as much as we can, even if there's some problem removing one file. Make the error messages a bit less misleading, too. In passing, const-ify function arguments. 2008-04-16 19:59 tgl * src/: backend/access/nbtree/nbtree.c, backend/access/nbtree/nbtutils.c, backend/access/transam/xlog.c, backend/commands/dbcommands.c, backend/port/ipc_test.c, backend/storage/ipc/ipc.c, include/access/nbtree.h, include/storage/ipc.h, include/utils/elog.h (REL8_3_STABLE): Repair two places where SIGTERM exit could leave shared memory state corrupted. (Neither is very important if SIGTERM is used to shut down the whole database cluster together, but there's a problem if someone tries to SIGTERM individual backends.) To do this, introduce new infrastructure macros PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP that take care of transiently pushing an on_shmem_exit cleanup hook. Also use this method for createdb cleanup --- that wasn't a shared-memory-corruption problem, but SIGTERM abort of createdb could leave orphaned files lying around. regards, tom lane
Matthew Wakeling <matthew@flymine.org> writes: > Speaking to some of my colleagues, sometimes the createdb process fails > with a very specific error message. If we wait five seconds and try again, > then it succeeds. So, maybe the duff directories are from those failures. > The error message is always something like this: > createdb: database creation failed: ERROR: could not stat file "base/32285287/32687035": No such file or directory > Just before running createdb, we always have some quite heavy write > traffic. Hmm, would that include dropping tables in the database you are about to copy? If so, this error is fairly readily explainable as a side effect of the delayed dropping of physical files in recent PG versions. (As noted in the manual, CREATE DATABASE isn't really intended as a COPY DATABASE operation --- it is expecting the source database to be pretty static. I think you could make this more reliable if you do a manual checkpoint between modifying the source database and copying it.) However, that still leaves me wondering why the leftover copied directories stick around. If the copying step failed that way, CREATE DATABASE *should* try to clean up the target tree before exiting. And AFAICS it wouldn't even report the error until after completing that cleanup. So there's still some piece of the puzzle that's missing. Do you have some specific examples of this error message at hand? Can you try to confirm whether the reported path corresponds to something in the CREATE's source database? If it's actually complaining about a stat failure in the target tree, then there's something else going on altogether. I don't see anything in that path that would give this message, but I might be missing it. regards, tom lane
On Wed, Oct 8, 2008 at 8:00 AM, Matthew Wakeling <matthew@flymine.org> wrote: > The error message is always something like this: > > createdb: database creation failed: ERROR: could not stat file > "base/32285287/32687035": No such file or directory By any chance are you running on windows with virus protection software on the server?
On Wed, 8 Oct 2008, Scott Marlowe wrote: >> The error message is always something like this: >> >> createdb: database creation failed: ERROR: could not stat file >> "base/32285287/32687035": No such file or directory > > By any chance are you running on windows with virus protection > software on the server? You insult me, sir! ;) No, it's Linux. Matthew -- For every complex problem, there is a solution that is simple, neat, and wrong. -- H. L. Mencken
On Wed, 8 Oct 2008, Tom Lane wrote: > Hmm, would that include dropping tables in the database you are about to > copy? If so, this error is fairly readily explainable as a side effect > of the delayed dropping of physical files in recent PG versions. It could quite possibly include dropping tables. We're running quite a complex system with lots going on all the time. > (As noted in the manual, CREATE DATABASE isn't really intended as a COPY > DATABASE operation --- it is expecting the source database to be pretty > static. I think you could make this more reliable if you do a manual > checkpoint between modifying the source database and copying it.) I gather this. However, I think it would be sensible to make sure it can never "corrupt the database" as it were. It's fine for it to lock everyone out of the database while the copying is happening though. The only reason for it to fail should be if someone is logged into the template database. > Do you have some specific examples of this error message at hand? > Can you try to confirm whether the reported path corresponds to > something in the CREATE's source database? If it's actually > complaining about a stat failure in the target tree, then there's > something else going on altogether. I don't see anything in that > path that would give this message, but I might be missing it. The oid in the error message is of a database that no longer exists, which indicates that it is *probably* referring to the template database. Unfortunately my colleagues just wrote the script so that it retries, so we don't have a decent log of the failures, which were a while back. However, I have now altered the script so that it fails with a message saying "Report this to Matthew", so if it happens again I'll be able to give you some more detail. Matthew -- You will see this is a 3-blackboard lecture. This is the closest you are going to get from me to high-tech teaching aids. Hey, if they put nooses on this, it would be fun! -- Computer Science Lecturer
Matthew Wakeling <matthew@flymine.org> writes: > The oid in the error message is of a database that no longer exists, which > indicates that it is *probably* referring to the template database. > Unfortunately my colleagues just wrote the script so that it retries, so > we don't have a decent log of the failures, which were a while back. > However, I have now altered the script so that it fails with a message > saying "Report this to Matthew", so if it happens again I'll be able to > give you some more detail. One other bit of possibly useful data would be to eyeball the file mod times in the orphaned subdirectories. If they were from failed CREATE DATABASEs then I'd expect every file in a given directory to have the same mod time (modulo the amount of time it takes to copy the DB, which is probably not trivial for the DB sizes you're dealing with). If you could also correlate that to the times you saw CREATE failures then it'd be pretty convincing that we know failed CREATEs are the issue. Also, I would definitely urge you to update to 8.3.4. Although I'm not seeing a mechanism for CREATE to fail to clean up like this, I'm looking at the 8.3 branch tip code, not 8.3.0 ... regards, tom lane
On Wed, 8 Oct 2008, Tom Lane wrote: > One other bit of possibly useful data would be to eyeball the file mod > times in the orphaned subdirectories. If they were from failed CREATE > DATABASEs then I'd expect every file in a given directory to have the > same mod time (modulo the amount of time it takes to copy the DB, which > is probably not trivial for the DB sizes you're dealing with). Yes, I did that, and the file modification times were in such a pattern. > If you could also correlate that to the times you saw CREATE failures > then it'd be pretty convincing that we know failed CREATEs are the > issue. Can't do that until next time it happens, because we don't have the logs from when it did happen any more. Matthew -- Jadzia: Don't forget the 34th rule of acquisition: Peace is good for business. Quark: That's the 35th. Jadzia: Oh yes, that's right. What's the 34th again? Quark: War is good for business. It's easy to get them mixed up.