Обсуждение: VACUUM FULL memory requirements

Поиск
Список
Период
Сортировка

VACUUM FULL memory requirements

От
David Schnur
Дата:
Hello,

I have a weekly task set up to VACUUM FULL a fairly large (~300M row ~50GB) database.  The intent is to free up disk space from especially-large inserts that aren't covered by the regular reclamation from a daily VACUUM.  Recently, I've been getting the following error:

(OperationalError) out of memory
DETAIL:  Failed on request of size 330.

The PG version is 8.3.5; can't upgrade to 8.4 at this time.  The machine is running a reasonably patched RHEL5, and should have at least 1-2GB RAM free at the time VACUUM is run, plus some additional GB of swap.

Some googling reveals others with the same problem, including one earlier mailing list discussion that recommended using CLUSTER instead.  As I understand it, though, CLUSTER requires at least as much free disk space as the size of the largest table, which occupies most of the total size of the database.

Is there any documentation on the memory & disk usage requirements of the more intensive commands, like VACUUM, VACUUM FULL, CLUSTER, etc.?  I would find this very useful, since, regarding my VACUUM v.s. CLUSTER decision, for example, I seem to be groping around without really understanding the situation.  I've always been under the impression that VACUUM FULL required few resources besides time.

I realize that the long-term solution is probably to partition the table, but in the short-term, is there anything I can do about this?  Thanks very much,

David

Re: VACUUM FULL memory requirements

От
"Kevin Grittner"
Дата:
David Schnur <dnschnur@gmail.com> wrote:

> I have a weekly task set up to VACUUM FULL a fairly large (~300M
> row ~50GB) database.

I hope you've been following that with a REINDEX every time;
otherwise you're causing index bloat.

> The intent is to free up disk space from especially-large inserts
> that aren't covered by the regular reclamation from a daily
> VACUUM.

Are these inserts happening in the same table(s) each time?  If so,
what are you gaining by moving the space from the database's free
space manager to the file system's free space management (and back
again) each time?

>  Recently, I've been getting the following error:
>
> (OperationalError) out of memory
> DETAIL:  Failed on request of size 330.

You might want to reduce maintenance_work_mem.  See this thread:

http://archives.postgresql.org/pgsql-performance/2009-12/msg00120.php

-Kevin

Re: VACUUM FULL memory requirements

От
David Schnur
Дата:

On Mon, Dec 14, 2009 at 12:04 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
I hope you've been following that with a REINDEX every time;
otherwise you're causing index bloat.

Yes, it REINDEXes afterwards.

Are these inserts happening in the same table(s) each time?  If so,
what are you gaining by moving the space from the database's free
space manager to the file system's free space management (and back
again) each time?

Normally the database will see inserts throughout each day, with older data deleted at the end of the day.  It's not a sharp cutoff, which is why moving over to partitions is a little trickier than in most such cases.

Regular VACUUM is fine most of the time; it frees up space for re-use, the space gets re-used, and the disk size stays constant.  But at certain non-predictable points in time, the database may expire several times more rows than usual, and in that case I want to reclaim the space for the OS, since it may not be used by the database again for some time.

Is my understanding of VACUUM v.s. VACUUM FULL correct, actually?  It appears to work that way, but the docs are a little vague on whether plain VACUUM ever frees actual disk space, or just reclaims it for the DB.

You might want to reduce maintenance_work_mem.  See this thread:
http://archives.postgresql.org/pgsql-performance/2009-12/msg00120.php

I saw that, but I think it's unlikely to be what's happening.  That case had many databases running VACUUM simultaneously, with a maintenance_work_mem of 256MB.  I have just a single database and maintenance_work_mem is at default, which is something like 16 or 32 MB I believe.

The thread below appears to be the same question asked on the PG-hackers list:

It ends with a post by Simon Riggs noting that VACUUM FULL doesn't limit itself to maintenance_work_mem anyway.

David

Re: VACUUM FULL memory requirements

От
Greg Stark
Дата:
On Mon, Dec 14, 2009 at 5:53 PM, David Schnur <dnschnur@gmail.com> wrote:
> Is my understanding of VACUUM v.s. VACUUM FULL correct, actually?  It
> appears to work that way, but the docs are a little vague on whether plain
> VACUUM ever frees actual disk space, or just reclaims it for the DB.

Basically. VACUUM does check to see if there just happens to be some
space at the end of the file it can truncate off but it doesn't go out
of its way to create such space. VACUUM FULL actually moves all the
tuples at the end of the file to earlier free spots. Moving tuples in
Postgres is really really hard because it has to be transaction-safe
and requires exclusive locking.

--
greg

Re: VACUUM FULL memory requirements

От
"Kevin Grittner"
Дата:
David Schnur <dnschnur@gmail.com> wrote:

> Regular VACUUM is fine most of the time; it frees up space for
> re-use, the space gets re-used, and the disk size stays constant.
> But at certain non-predictable points in time, the database may
> expire several times more rows than usual, and in that case I want
> to reclaim the space for the OS, since it may not be used by the
> database again for some time.

If you actually expect it to be re-used by the database sometime
later, I would just stick with normal VACUUM (with adequate fsm
settings).  The only counter-argument the jumps out at me is that
you have some actual need to use that space in the interim and you
can ensure that you're done with it and free it up before the
database needs it again.

Well, I guess if there could be another reason: is there a
performance improvement from the VACUUM FULL and REINDEX which makes
it worth the cost of such aggressive maintenance?  Even if so,
CLUSTER might help more (because of ordering the data) if you have
the disk space to support it.  Otherwise, you might want to try
pg_dump and a restore to see if you can get the equivalent of the
VACUUM FULL faster.

-Kevin

Re: VACUUM FULL memory requirements

От
David Schnur
Дата:
If you actually expect it to be re-used by the database sometime
later, I would just stick with normal VACUUM (with adequate fsm 

It may or may not be used again.  Although disk is cheap, it is a substantial amount of space, and I'd prefer it wasn't locked-up forever.

For a bit of extra context, the database is embedded within a piece of software, which is the source of the uncertainty.  If it was something I could manage myself, I would just run the VACUUM FULL by hand at those times when I knew it was necessary.  As it stands, it's not predictable, and users also tend to be less accepting of having that space locked up.

That's why I'm interested in more information on the memory & disk requirements of the various maintenance functions.  The main reason I've stayed away from CLUSTER is because of the disk requirement.  But I had no idea it was possible for VACUUM FULL to run out of memory under normal conditions, so I may need to reconsider the options.

David

Re: VACUUM FULL memory requirements

От
"Kevin Grittner"
Дата:
David Schnur <dnschnur@gmail.com> wrote:

> For a bit of extra context, the database is embedded within a
> piece of software, which is the source of the uncertainty.  If it
> was something I could manage myself, I would just run the VACUUM
> FULL by hand at those times when I knew it was necessary.  As it
> stands, it's not predictable, and users also tend to be less
> accepting of having that space locked up.
>
> That's why I'm interested in more information on the memory & disk
> requirements of the various maintenance functions.  The main
> reason I've stayed away from CLUSTER is because of the disk
> requirement.  But I had no idea it was possible for VACUUM FULL to
> run out of memory under normal conditions, so I may need to
> reconsider the options.

If you had a utility which could be run when desired, which would
incrementally move data from the end of a table to free space within
the table, would that do?  It could be run while the table was in
use for other purposes, but could cause occasional serialization
failures for updates -- so you might want to schedule it for
maintenance windows when possible.  Unlike the current VACUUM
ANALYZE, stopping it would not lose the work done up to that point.

(The above is being considered as a replacement for the current
VACUUM FULL techniques where you don't have room for a second copy
of the table.  If there's some reason that *wouldn't* work for you,
now would be a good time to weigh in.)

-Kevin

Re: VACUUM FULL memory requirements

От
Scott Marlowe
Дата:
On Mon, Dec 14, 2009 at 1:55 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> David Schnur <dnschnur@gmail.com> wrote:
>
>> For a bit of extra context, the database is embedded within a
>> piece of software, which is the source of the uncertainty.  If it
>> was something I could manage myself, I would just run the VACUUM
>> FULL by hand at those times when I knew it was necessary.  As it
>> stands, it's not predictable, and users also tend to be less
>> accepting of having that space locked up.
>>
>> That's why I'm interested in more information on the memory & disk
>> requirements of the various maintenance functions.  The main
>> reason I've stayed away from CLUSTER is because of the disk
>> requirement.  But I had no idea it was possible for VACUUM FULL to
>> run out of memory under normal conditions, so I may need to
>> reconsider the options.
>
> If you had a utility which could be run when desired, which would
> incrementally move data from the end of a table to free space within
> the table, would that do?  It could be run while the table was in
> use for other purposes, but could cause occasional serialization
> failures for updates -- so you might want to schedule it for
> maintenance windows when possible.  Unlike the current VACUUM
> ANALYZE, stopping it would not lose the work done up to that point.
>
> (The above is being considered as a replacement for the current
> VACUUM FULL techniques where you don't have room for a second copy
> of the table.  If there's some reason that *wouldn't* work for you,
> now would be a good time to weigh in.)

My usage is dictated by using slony, which means I can't do the things
I'd like, like just copying the data out of the table, truncating it,
and copying it back in / renaming the newly created table as the old
one etc.  So for me, and other slony users who can't do the wild
westish stuff we'd otherwise resort to, vacuum full is very useful in
some circumstances.  So, an improved vacuum full would be much
appreciated.  The proposed replacement, would it still be able to
recover lost space like the old one?

Re: VACUUM FULL memory requirements

От
"Kevin Grittner"
Дата:
Scott Marlowe <scott.marlowe@gmail.com> wrote:

> My usage is dictated by using slony, which means I can't do the
> things I'd like, like just copying the data out of the table,
> truncating it, and copying it back in / renaming the newly created
> table as the old one etc.  So for me, and other slony users who
> can't do the wild westish stuff we'd otherwise resort to, vacuum
> full is very useful in some circumstances.  So, an improved vacuum
> full would be much appreciated.  The proposed replacement, would
> it still be able to recover lost space like the old one?

As I understand it (with some confidence that others will correct
any errors), the default behavior of VACUUM FULL in the new
arrangement will be to copy the table -- similar to CLUSTER but
without using an index.  The thing still up in the air is whether to
support the old style VACUUM FULL behavior as an INPLACE option.  In
any event, there will probably be a utility to move tuples from the
end of the table to free space, which will allow a normal VACUUM to
free the space to the OS.  This utility is proposed as a client app,
which you will need to schedule on your own as needed.

-Kevin

Internal fragmentations statistics Was: VACUUM FULL memory requirements

От
Evan Rempel
Дата:
Is there a command/tool that will report on how FULL a table is getting?
If there is, how intrusive is it? How computationally heavy is it?

We have a database that is approx 100 million rows with
approx 2 million insert/updates per day. Each day old data
is purged from the database. The end result is a mostly static
footprint with regards to disk space used, but I would like to
know how much room is usable inside the tables as well as the
OS file system (that part is easy).

--
Evan Rempel

Re: Internal fragmentations statistics Was: VACUUM FULL memory requirements

От
Guillaume Lelarge
Дата:
Le mardi 15 décembre 2009 à 00:04:47, Evan Rempel a écrit :
> Is there a command/tool that will report on how FULL a table is getting?
> If there is, how intrusive is it? How computationally heavy is it?
>
> We have a database that is approx 100 million rows with
> approx 2 million insert/updates per day. Each day old data
> is purged from the database. The end result is a mostly static
> footprint with regards to disk space used, but I would like to
> know how much room is usable inside the tables as well as the
> OS file system (that part is easy).
>

pgstattuple contrib module give such an information. But it requires an
exclusive lock on the table it's looking at, so it's quite intrusive. For more
details, the 8.4 documentation is interesting:
http://www.postgresql.org/docs/8.4/interactive/pgstattuple.html


--
Guillaume.
 http://www.postgresqlfr.org
 http://dalibo.com

Re: Internal fragmentations statistics Was: VACUUM FULL memory requirements

От
Gurjeet Singh
Дата:
2009/12/15 Guillaume Lelarge <guillaume@lelarge.info>
Le mardi 15 décembre 2009 à 00:04:47, Evan Rempel a écrit :
> Is there a command/tool that will report on how FULL a table is getting?
> If there is, how intrusive is it? How computationally heavy is it?
>
> We have a database that is approx 100 million rows with
> approx 2 million insert/updates per day. Each day old data
> is purged from the database. The end result is a mostly static
> footprint with regards to disk space used, but I would like to
> know how much room is usable inside the tables as well as the
> OS file system (that part is easy).
>

pgstattuple contrib module give such an information. But it requires an
exclusive lock on the table it's looking at, so it's quite intrusive. For more
details, the 8.4 documentation is interesting:
http://www.postgresql.org/docs/8.4/interactive/pgstattuple.html


That doc specifically says that it takes only a read lock.

Best regards,
--
Lets call it Postgres

EnterpriseDB      http://www.enterprisedb.com

gurjeet.singh@EnterpriseDB.com

singh.gurjeet@{ gmail | hotmail | yahoo }.com
Twitter: singh_gurjeet
Skype: singh_gurjeet

Mail sent from my BlackLaptop device

Re: Internal fragmentations statistics Was: VACUUM FULL memory requirements

От
Guillaume Lelarge
Дата:
Le mercredi 16 décembre 2009 à 06:18:45, Gurjeet Singh a écrit :
> 2009/12/15 Guillaume Lelarge <guillaume@lelarge.info>
>
> > Le mardi 15 décembre 2009 à 00:04:47, Evan Rempel a écrit :
> > > Is there a command/tool that will report on how FULL a table is
> > > getting? If there is, how intrusive is it? How computationally heavy is
> > > it?
> > >
> > > We have a database that is approx 100 million rows with
> > > approx 2 million insert/updates per day. Each day old data
> > > is purged from the database. The end result is a mostly static
> > > footprint with regards to disk space used, but I would like to
> > > know how much room is usable inside the tables as well as the
> > > OS file system (that part is easy).
> >
> > pgstattuple contrib module give such an information. But it requires an
> > exclusive lock on the table it's looking at, so it's quite intrusive. For
> > more
> > details, the 8.4 documentation is interesting:
> > http://www.postgresql.org/docs/8.4/interactive/pgstattuple.html
>
> That doc specifically says that it takes only a read lock.
>

Ouch, I always thought it took an exclusive lock. Sorry.


--
Guillaume.
 http://www.postgresqlfr.org
 http://dalibo.com

Re: VACUUM FULL memory requirements

От
David Schnur
Дата:
I've been doing more testing on several different machines, but still haven't found a solution to my problem where VACUUM FULL is running out of memory.  Besides the original case on RHEL4, I've been able to reproduce it on both Windows and OSX, with 3GB and 5GB RAM, respectively.  Interestingly, it always fails on a request size of 330, even across differently-sized databases running on machines with different amounts of memory.

As far as that new tool you're describing, for my purposes it doesn't make much difference whether it's a stand-alone application or a command, though a command is a little more convenient.  The ability to free space concurrent with other database operations would be nice, but I feel like the introduction of serialization failures would negate that advantage, since I at least would end up running it non-concurrently to avoid dealing with them.

I actually find the current design of VACUUM FULL reasonable enough, if I could figure out what's causing this memory issue and prevent it from happening.

David