Обсуждение: Disk space consumed by pk not returned after vacuum or reindex

Поиск
Список
Период
Сортировка

Disk space consumed by pk not returned after vacuum or reindex

От
Kevin Johnson
Дата:
Hello,

I am running into a problem on a RHEL3 systems, running PostgreSQL 7.4

We have a database, which consistently consumes more and more of the disk space in it's lvol until it reaches 100%.  So far, we have tried to run a full vacuum on the database, with limited success.  Eventually, we had to drop and reload the database with the same data inside.  It brought the disk usage down to 73%.  It then began to creep once more toward 100%.  After some research, I was able to use the pg_class catalog to find that the items which are expanding the quickest are primary key (btree) indexes.  I attempted to run a REINDEX on one of the tables with the pk taking up the largest amount of space.  The usage according to pg_class dropped dramatically, however the disk space was not returned to the system.  So I attempted another full vacuum afterwards, and still nothing was returned to the system.  These tables are updated extremely frequently (although their total number of rows is close to constant), which is my guess as to why the pk indexes increase so rapidly in terms of their disk usage.  Unfortunately, PostgreSQL knowledge is limited, and I was wondering if anyone had experienced something similar / knows what else we can do to return this disk space back to the system?

Thank you in advance for any/all help!

Kevin

Вложения

Re: Disk space consumed by pk not returned after vacuum or reindex

От
Bruno Wolff III
Дата:
On Wed, Sep 13, 2006 at 09:53:16 -0400,
  Kevin Johnson <Kevin.Johnson@noaa.gov> wrote:
>
> We have a database, which consistently consumes more and more of the
> disk space in it's lvol until it reaches 100%.  So far, we have tried to
> run a full vacuum on the database, with limited success.  Eventually, we
> had to drop and reload the database with the same data inside.  It
> brought the disk usage down to 73%.  It then began to creep once more
> toward 100%.  After some research, I was able to use the pg_class
> catalog to find that the items which are expanding the quickest are
> primary key (btree) indexes.  I attempted to run a REINDEX on one of the
> tables with the pk taking up the largest amount of space.  The usage
> according to pg_class dropped dramatically, however the disk space was
> not returned to the system.  So I attempted another full vacuum
> afterwards, and still nothing was returned to the system.  These tables
> are updated extremely frequently (although their total number of rows is
> close to constant), which is my guess as to why the pk indexes increase
> so rapidly in terms of their disk usage.  Unfortunately, PostgreSQL
> knowledge is limited, and I was wondering if anyone had experienced
> something similar / knows what else we can do to return this disk space
> back to the system?

This is possibly index bloat due to new keys always being larger than existing
keys. This was fixed in later releases. There is still some potential for
bloat due to fragmentation, but I believe that has a constant bound.
You might try using the cluster command. I think that will both clean up
the indexes and remove dead rows and do it faster than using a vacuum full
and reindexing. The downside is that the table will be unavailable during
the cluster which might be a deal breaker for you.

Re: Disk space consumed by pk not returned after vacuum or reindex

От
Kevin Johnson
Дата:
Thank you for the suggestion, Bruno.  The clustering did the trick in reducing the current disk usage, however eventually the disk space get consumed once more.  I fear that we may just need to update the version of Postgres to help alleviate index bloat!


Bruno Wolff III wrote:
On Wed, Sep 13, 2006 at 09:53:16 -0400, Kevin Johnson <Kevin.Johnson@noaa.gov> wrote: 
We have a database, which consistently consumes more and more of the 
disk space in it's lvol until it reaches 100%.  So far, we have tried to 
run a full vacuum on the database, with limited success.  Eventually, we 
had to drop and reload the database with the same data inside.  It 
brought the disk usage down to 73%.  It then began to creep once more 
toward 100%.  After some research, I was able to use the pg_class 
catalog to find that the items which are expanding the quickest are 
primary key (btree) indexes.  I attempted to run a REINDEX on one of the 
tables with the pk taking up the largest amount of space.  The usage 
according to pg_class dropped dramatically, however the disk space was 
not returned to the system.  So I attempted another full vacuum 
afterwards, and still nothing was returned to the system.  These tables 
are updated extremely frequently (although their total number of rows is 
close to constant), which is my guess as to why the pk indexes increase 
so rapidly in terms of their disk usage.  Unfortunately, PostgreSQL 
knowledge is limited, and I was wondering if anyone had experienced 
something similar / knows what else we can do to return this disk space 
back to the system?   
This is possibly index bloat due to new keys always being larger than existing
keys. This was fixed in later releases. There is still some potential for
bloat due to fragmentation, but I believe that has a constant bound.
You might try using the cluster command. I think that will both clean up
the indexes and remove dead rows and do it faster than using a vacuum full
and reindexing. The downside is that the table will be unavailable during
the cluster which might be a deal breaker for you. 

-- 
Kevin Johnson
Raytheon
AWIPS Sr. Systems Engineer
NWS Network Control Facility
p: 301.713.9362x325
f: 301.713.1905

Re: Disk space consumed by pk not returned after vacuum or reindex

От
Jim Nasby
Дата:
I didn't see you mention what version you're running; index bloat
shouldn't be a big issue in 7.4 and above. You also didn't mention
how often you're vacuuming the table. If you don't vacuum the table
frequently enough, you're going to get bloat, plain and simple.

On Oct 5, 2006, at 11:24 AM, Kevin Johnson wrote:
> Thank you for the suggestion, Bruno.  The clustering did the trick
> in reducing the current disk usage, however eventually the disk
> space get consumed once more.  I fear that we may just need to
> update the version of Postgres to help alleviate index bloat!
>
> Bruno Wolff III wrote:
>> On Wed, Sep 13, 2006 at 09:53:16 -0400, Kevin Johnson
>> <Kevin.Johnson@noaa.gov> wrote:
>>> We have a database, which consistently consumes more and more of
>>> the disk space in it's lvol until it reaches 100%. So far, we
>>> have tried to run a full vacuum on the database, with limited
>>> success. Eventually, we had to drop and reload the database with
>>> the same data inside. It brought the disk usage down to 73%. It
>>> then began to creep once more toward 100%. After some research, I
>>> was able to use the pg_class catalog to find that the items which
>>> are expanding the quickest are  primary key (btree) indexes. I
>>> attempted to run a REINDEX on one of the tables with the pk
>>> taking up the largest amount of space. The usage according to
>>> pg_class dropped dramatically, however the disk space was not
>>> returned to the system. So I attempted another full vacuum
>>> afterwards, and still nothing was returned to the system. These
>>> tables are updated extremely frequently (although their total
>>> number of rows is close to constant), which is my guess as to why
>>> the pk indexes increase so rapidly in terms of their disk usage.
>>> Unfortunately, PostgreSQL knowledge is limited, and I was
>>> wondering if anyone had experienced something similar / knows
>>> what else we can do to return this disk space back to the system?
>> This is possibly index bloat due to new keys always being larger
>> than existing keys. This was fixed in later releases. There is
>> still some potential for bloat due to fragmentation, but I believe
>> that has a constant bound. You might try using the cluster
>> command. I think that will both clean up the indexes and remove
>> dead rows and do it faster than using a vacuum full and
>> reindexing. The downside is that the table will be unavailable
>> during the cluster which might be a deal breaker for you.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)


--
Jim Nasby                                    jimn@enterprisedb.com
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)



Re: Disk space consumed by pk not returned after vacuum or

От
Kevin Johnson
Дата:
You are correct, it is 7.4 we are running on a RHEL3 system.
The database itself is vacuumed via cron 6 times a day. 

Jim Nasby wrote:
I didn't see you mention what version you're running; index bloat shouldn't be a big issue in 7.4 and above. You also didn't mention how often you're vacuuming the table. If you don't vacuum the table frequently enough, you're going to get bloat, plain and simple.

On Oct 5, 2006, at 11:24 AM, Kevin Johnson wrote:
Thank you for the suggestion, Bruno.  The clustering did the trick in reducing the current disk usage, however eventually the disk space get consumed once more.  I fear that we may just need to update the version of Postgres to help alleviate index bloat!

Bruno Wolff III wrote:
On Wed, Sep 13, 2006 at 09:53:16 -0400, Kevin Johnson <Kevin.Johnson@noaa.gov> wrote:
We have a database, which consistently consumes more and more of the disk space in it's lvol until it reaches 100%. So far, we have tried to run a full vacuum on the database, with limited success. Eventually, we had to drop and reload the database with the same data inside. It brought the disk usage down to 73%. It then began to creep once more toward 100%. After some research, I was able to use the pg_class catalog to find that the items which are expanding the quickest are  primary key (btree) indexes. I attempted to run a REINDEX on one of the tables with the pk taking up the largest amount of space. The usage according to pg_class dropped dramatically, however the disk space was not returned to the system. So I attempted another full vacuum afterwards, and still nothing was returned to the system. These tables are updated extremely frequently (although their total number of rows is close to constant), which is my guess as to why the pk indexes increase so rapidly in terms of their disk usage. Unfortunately, PostgreSQL knowledge is limited, and I was wondering if anyone had experienced something similar / knows what else we can do to return this disk space back to the system?
This is possibly index bloat due to new keys always being larger than existing keys. This was fixed in later releases. There is still some potential for bloat due to fragmentation, but I believe that has a constant bound. You might try using the cluster command. I think that will both clean up the indexes and remove dead rows and do it faster than using a vacuum full and reindexing. The downside is that the table will be unavailable during the cluster which might be a deal breaker for you.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)


--
Jim Nasby                                    jimn@enterprisedb.com
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)



-- 
Kevin Johnson
Raytheon
AWIPS Sr. Systems Engineer
NWS Network Control Facility
p: 301.713.9362x325
f: 301.713.1905
Вложения

Re: Disk space consumed by pk not returned after vacuum or reindex

От
Jim Nasby
Дата:
On Oct 6, 2006, at 5:39 AM, Kevin Johnson wrote:
> You are correct, it is 7.4 we are running on a RHEL3 system.
> The database itself is vacuumed via cron 6 times a day.
>
> Jim Nasby wrote:
>> I didn't see you mention what version you're running; index bloat
>> shouldn't be a big issue in 7.4 and above. You also didn't mention
>> how often you're vacuuming the table. If you don't vacuum the
>> table frequently enough, you're going to get bloat, plain and simple.

For a heavy update load, even 6 times a day may not be enough.

Also, remember that the size you get after a REINDEX will be
unreasonably low, because there will be very little free space in the
index. That's fine if you won't be updating the table much, but
otherwise the index will have no choice but to grow.
--
Jim Nasby                                    jimn@enterprisedb.com
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)



Re: Disk space consumed by pk not returned after vacuum or

От
Bruno Wolff III
Дата:
On Fri, Oct 06, 2006 at 06:39:40 -0400,
  Kevin Johnson <Kevin.Johnson@noaa.gov> wrote:
> You are correct, it is 7.4 we are running on a RHEL3 system.
> The database itself is vacuumed via cron 6 times a day.

The other thing you need to watch out for is that your FSM setting is
large enough to recover all of the dead space. If your vacuum runs aren't
recovering as much dead space as is being created on average, then you are
going to have dead space accumalate.