Обсуждение: Disc space usage

Поиск
Список
Период
Сортировка

Disc space usage

От
Matthew Wakeling
Дата:
One of our build servers recently ran out of disc space while trying to
copy an entire database. This led me to investigate the database cluster,
which is stored on a RAID array with a total size of 1TB. Running a query
to list all databases and their sizes did not add up to the amount of
space being used by Postgres, so I had a look at the pgsql/base directory.
It appears that there are a few large directories that do not correspond
to any database. I wonder if these have been left behind accidentally by
Postgres.

Here are the database directories:

  Size (kB)    Directory    Database

  32        pgsql_tmp
  4352        11510        template0
  4368        1        template1
  4464        11511        postgres
  5368        30103627    xav-userprofile-test
  6096        8088167        outerjoins-userprofile-12.0-copy
  8676        30103406    xav-test
  10052        31313164    common-tgt-items-kmr-modmine
  19956        1108178        modmine-3-preview-18-feb-2008
  89452        14578911    common-tgt-items-kmr
  118940        9952565        production-xav-13
  201192        1257481        common-tgt-items-gtocmine-rns
  296552        7040137        common-tgt-items-flyminebuild
  1557160    9843085
  1699624    18456655    common-src-items-flyminebuild
  3376096    278561
  3995276    9064702        production-unimine-pride-beta5
  8528136    1257482        gtocmine-rns
  40815456    29233051
  42278196    27473906
  47112412    28110832
  47913532    32728815    production-flyminebuild:ensembl-anopheles
  60519524    32841289    production-flyminebuild:go
  67626328    27377902
  69513844    32856736    production-flyminebuild:flybase-dmel-gene-fasta
  74289908    32938724    production-flyminebuild:pubmed-gene
  75786720    32941684    production-flyminebuild:biogrid
  77361800    32944072    production-flyminebuild:update-publications
  80160256    32947141    production-flyminebuild:create-references
  81333908    32574190    flybasemine-production
  86356140    12110825
  87544200    33049747    production-flyminebuild

So on this server, the wasted space takes up 276GB, which is not
acceptable. I believe that if we re-initialise the cluster and re-create
the databases, these directories would disappear. Taking a look at the
directory 12110825, all the files inside were last accessed several
months ago. So, I have a few questions:

1. Is this space used for anything, or is it just abandoned? Is this a
    bug?
2. How do I reclaim this wasted space in a safe manner?
3. How do I prevent this happening again?

Matthew

--
An ant doesn't have a lot of processing power available to it. I'm not trying
to be speciesist - I wouldn't want to detract you from such a wonderful
creature, but, well, there isn't a lot there, is there?
                                        -- Computer Science Lecturer

Re: Disc space usage

От
Tom Lane
Дата:
Matthew Wakeling <matthew@flymine.org> writes:
> One of our build servers recently ran out of disc space while trying to
> copy an entire database. This led me to investigate the database cluster,
> which is stored on a RAID array with a total size of 1TB. Running a query
> to list all databases and their sizes did not add up to the amount of
> space being used by Postgres, so I had a look at the pgsql/base directory.
> It appears that there are a few large directories that do not correspond
> to any database. I wonder if these have been left behind accidentally by
> Postgres.

Anything under $PGDATA/base that doesn't correspond to a live row in
pg_database is junk.  The interesting question is how it got that way,
and in particular how you seem to have managed to have repeated
instances of it.

I gather that you're in the habit of using CREATE DATABASE to copy
large existing databases, so the most likely theory is that these are
leftovers from previous failed copy attempts.  Now CREATE DATABASE
does attempt to clean up if its copying fails, but there are various
ways to break that, for instance hitting control-C partway through the
cleanup phase.  So I'm wondering if maybe that's been done a few times.

What PG version is this, anyway?

            regards, tom lane

Re: Disc space usage

От
Matthew Wakeling
Дата:
On Wed, 8 Oct 2008, Tom Lane wrote:
>> It appears that there are a few large directories that do not correspond
>> to any database. I wonder if these have been left behind accidentally by
>> Postgres.
>
> Anything under $PGDATA/base that doesn't correspond to a live row in
> pg_database is junk.

So I can delete it? Might be safer to stop the db server while I do that
though.

> The interesting question is how it got that way, and in particular how
> you seem to have managed to have repeated instances of it.
>
> I gather that you're in the habit of using CREATE DATABASE to copy
> large existing databases, so the most likely theory is that these are
> leftovers from previous failed copy attempts.  Now CREATE DATABASE
> does attempt to clean up if its copying fails, but there are various
> ways to break that, for instance hitting control-C partway through the
> cleanup phase.  So I'm wondering if maybe that's been done a few times.

Yes, we do copy large databases quite often, and drop them again. The
database cluster was initialised back in March.

> What PG version is this, anyway?

Postgres 8.3.0

Matthew

--
Unfortunately, university regulations probably prohibit me from eating
small children in front of the lecture class.
                                        -- Computer Science Lecturer

Re: Disc space usage

От
Matthew Wakeling
Дата:
On Wed, 8 Oct 2008, Tom Lane wrote:
> The interesting question is how it got that way, and in particular how
> you seem to have managed to have repeated instances of it.

Speaking to some of my colleagues, sometimes the createdb process fails
with a very specific error message. If we wait five seconds and try again,
then it succeeds. So, maybe the duff directories are from those failures.

The error message is always something like this:

createdb: database creation failed: ERROR:  could not stat file "base/32285287/32687035": No such file or directory

Just before running createdb, we always have some quite heavy write
traffic. Is it possible that the changes that we just wrote haven't been
checkpointed properly yet, resulting in some of those files being missing
from the template database, and therefore the createdb to fail?

Matthew

--
Now, you would have thought these coefficients would be integers, given that
we're working out integer results. Using a fraction would seem really
stupid. Well, I'm quite willing to be stupid here - in fact, I'm going to
use complex numbers.                    -- Computer Science Lecturer

Re: Disc space usage

От
Tom Lane
Дата:
Matthew Wakeling <matthew@flymine.org> writes:
> On Wed, 8 Oct 2008, Tom Lane wrote:
>> Anything under $PGDATA/base that doesn't correspond to a live row in
>> pg_database is junk.

> So I can delete it? Might be safer to stop the db server while I do that
> though.

In principle, at least, you shouldn't need to --- there shouldn't be any
buffers representing such files.

>> What PG version is this, anyway?

> Postgres 8.3.0

You should consider an update to 8.3.4.  A quick look in the post-8.3.0
CVS logs shows a couple of possibly relevant fixes:

2008-04-18 13:05  tgl

    * src/: backend/commands/dbcommands.c, include/port.h,
    port/dirmod.c (REL8_3_STABLE): Fix rmtree() so that it keeps going
    after failure to remove any individual file; the idea is that we
    should clean up as much as we can, even if there's some problem
    removing one file.  Make the error messages a bit less misleading,
    too.  In passing, const-ify function arguments.

2008-04-16 19:59  tgl

    * src/: backend/access/nbtree/nbtree.c,
    backend/access/nbtree/nbtutils.c, backend/access/transam/xlog.c,
    backend/commands/dbcommands.c, backend/port/ipc_test.c,
    backend/storage/ipc/ipc.c, include/access/nbtree.h,
    include/storage/ipc.h, include/utils/elog.h (REL8_3_STABLE): Repair
    two places where SIGTERM exit could leave shared memory state
    corrupted.  (Neither is very important if SIGTERM is used to shut
    down the whole database cluster together, but there's a problem if
    someone tries to SIGTERM individual backends.)    To do this,
    introduce new infrastructure macros
    PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP that take care
    of transiently pushing an on_shmem_exit cleanup hook.  Also use
    this method for createdb cleanup --- that wasn't a
    shared-memory-corruption problem, but SIGTERM abort of createdb
    could leave orphaned files lying around.

            regards, tom lane

Re: Disc space usage

От
Tom Lane
Дата:
Matthew Wakeling <matthew@flymine.org> writes:
> Speaking to some of my colleagues, sometimes the createdb process fails
> with a very specific error message. If we wait five seconds and try again,
> then it succeeds. So, maybe the duff directories are from those failures.

> The error message is always something like this:

> createdb: database creation failed: ERROR:  could not stat file "base/32285287/32687035": No such file or directory

> Just before running createdb, we always have some quite heavy write
> traffic.

Hmm, would that include dropping tables in the database you are about to
copy?  If so, this error is fairly readily explainable as a side effect
of the delayed dropping of physical files in recent PG versions.

(As noted in the manual, CREATE DATABASE isn't really intended as a COPY
DATABASE operation --- it is expecting the source database to be pretty
static.  I think you could make this more reliable if you do a manual
checkpoint between modifying the source database and copying it.)

However, that still leaves me wondering why the leftover copied
directories stick around.  If the copying step failed that way,
CREATE DATABASE *should* try to clean up the target tree before
exiting.  And AFAICS it wouldn't even report the error until after
completing that cleanup.  So there's still some piece of the puzzle
that's missing.

Do you have some specific examples of this error message at hand?
Can you try to confirm whether the reported path corresponds to
something in the CREATE's source database?  If it's actually
complaining about a stat failure in the target tree, then there's
something else going on altogether.  I don't see anything in that
path that would give this message, but I might be missing it.

            regards, tom lane

Re: Disc space usage

От
"Scott Marlowe"
Дата:
On Wed, Oct 8, 2008 at 8:00 AM, Matthew Wakeling <matthew@flymine.org> wrote:
> The error message is always something like this:
>
> createdb: database creation failed: ERROR:  could not stat file
> "base/32285287/32687035": No such file or directory

By any chance are you running on windows with virus protection
software on the server?

Re: Disc space usage

От
Matthew Wakeling
Дата:
On Wed, 8 Oct 2008, Scott Marlowe wrote:
>> The error message is always something like this:
>>
>> createdb: database creation failed: ERROR:  could not stat file
>> "base/32285287/32687035": No such file or directory
>
> By any chance are you running on windows with virus protection
> software on the server?

You insult me, sir! ;)

No, it's Linux.

Matthew

--
For every complex problem, there is a solution that is simple, neat, and wrong.
                                                      -- H. L. Mencken

Re: Disc space usage

От
Matthew Wakeling
Дата:
On Wed, 8 Oct 2008, Tom Lane wrote:
> Hmm, would that include dropping tables in the database you are about to
> copy?  If so, this error is fairly readily explainable as a side effect
> of the delayed dropping of physical files in recent PG versions.

It could quite possibly include dropping tables. We're running quite a
complex system with lots going on all the time.

> (As noted in the manual, CREATE DATABASE isn't really intended as a COPY
> DATABASE operation --- it is expecting the source database to be pretty
> static.  I think you could make this more reliable if you do a manual
> checkpoint between modifying the source database and copying it.)

I gather this. However, I think it would be sensible to make sure it can
never "corrupt the database" as it were. It's fine for it to lock everyone
out of the database while the copying is happening though. The only reason
for it to fail should be if someone is logged into the template database.

> Do you have some specific examples of this error message at hand?
> Can you try to confirm whether the reported path corresponds to
> something in the CREATE's source database?  If it's actually
> complaining about a stat failure in the target tree, then there's
> something else going on altogether.  I don't see anything in that
> path that would give this message, but I might be missing it.

The oid in the error message is of a database that no longer exists, which
indicates that it is *probably* referring to the template database.
Unfortunately my colleagues just wrote the script so that it retries, so
we don't have a decent log of the failures, which were a while back.
However, I have now altered the script so that it fails with a message
saying "Report this to Matthew", so if it happens again I'll be able to
give you some more detail.

Matthew

--
You will see this is a 3-blackboard lecture. This is the closest you are going
to get from me to high-tech teaching aids. Hey, if they put nooses on this, it
would be fun!                           -- Computer Science Lecturer

Re: Disc space usage

От
Tom Lane
Дата:
Matthew Wakeling <matthew@flymine.org> writes:
> The oid in the error message is of a database that no longer exists, which
> indicates that it is *probably* referring to the template database.
> Unfortunately my colleagues just wrote the script so that it retries, so
> we don't have a decent log of the failures, which were a while back.
> However, I have now altered the script so that it fails with a message
> saying "Report this to Matthew", so if it happens again I'll be able to
> give you some more detail.

One other bit of possibly useful data would be to eyeball the file mod
times in the orphaned subdirectories.  If they were from failed CREATE
DATABASEs then I'd expect every file in a given directory to have the
same mod time (modulo the amount of time it takes to copy the DB, which
is probably not trivial for the DB sizes you're dealing with).  If you
could also correlate that to the times you saw CREATE failures then it'd
be pretty convincing that we know failed CREATEs are the issue.

Also, I would definitely urge you to update to 8.3.4.  Although I'm not
seeing a mechanism for CREATE to fail to clean up like this, I'm looking
at the 8.3 branch tip code, not 8.3.0 ...

            regards, tom lane

Re: Disc space usage

От
Matthew Wakeling
Дата:
On Wed, 8 Oct 2008, Tom Lane wrote:
> One other bit of possibly useful data would be to eyeball the file mod
> times in the orphaned subdirectories.  If they were from failed CREATE
> DATABASEs then I'd expect every file in a given directory to have the
> same mod time (modulo the amount of time it takes to copy the DB, which
> is probably not trivial for the DB sizes you're dealing with).

Yes, I did that, and the file modification times were in such a pattern.

> If you could also correlate that to the times you saw CREATE failures
> then it'd be pretty convincing that we know failed CREATEs are the
> issue.

Can't do that until next time it happens, because we don't have the logs
from when it did happen any more.

Matthew

--
Jadzia: Don't forget the 34th rule of acquisition: Peace is good for business.
Quark:  That's the 35th.
Jadzia: Oh yes, that's right. What's the 34th again?
Quark:  War is good for business. It's easy to get them mixed up.