Обсуждение: Index space growing even after cleanup via autovacuum in Postgres 9.2

Поиск
Список
Период
Сортировка

Index space growing even after cleanup via autovacuum in Postgres 9.2

От
Tirthankar Barari
Дата:
Hi,

We have a table where we insert about 10 million rows everyday. We keep
14 day's worth of entries (i.e. 140 mil). A scheduled task wakes up
every day and deletes all entries past the 14 day window (i.e. deletes
entries from the 15th day in the past).

We have autovacuum set to trigger when 1% of rows are deleted and that
is running fine consistently.

However, we are noticing that after autovacuum, our disk space
consumption is still increasing and the increase is in the index size
(by querying pg_total_relation_size("mytable") and
pg_indexes_size("mytable")).

In Postgres 9.2.2, doesn't autovacuum cleanup dead indexes and reuse
that space too? So, why do we see this not happening?

Thanks

- tirthankar



Re: Index space growing even after cleanup via autovacuum in Postgres 9.2

От
Tom Lane
Дата:
Tirthankar Barari <tbarari@verizon.com> writes:
> However, we are noticing that after autovacuum, our disk space
> consumption is still increasing and the increase is in the index size
> (by querying pg_total_relation_size("mytable") and
> pg_indexes_size("mytable")).

> In Postgres 9.2.2, doesn't autovacuum cleanup dead indexes and reuse
> that space too? So, why do we see this not happening?

You've not provided enough info to speak in more than generalities, but:
whether index space is reclaimable depends a whole lot on your usage
patterns.  There's provision to recycle pages that've become completely
empty, but as long as there's at least one key left on a leaf page,
it won't be recycled.  So for instance if you have a time-ordered index
and you delete all but one out of every hundred entries in time sequence,
you're not going to get that space back short of a REINDEX.

            regards, tom lane


Re: Index space growing even after cleanup via autovacuum in Postgres 9.2

От
Francisco Olarte
Дата:
Hi:

On Thu, Jan 9, 2014 at 7:50 PM, Tirthankar Barari <tbarari@verizon.com> wrote:
> We have a table where we insert about 10 million rows everyday. We keep 14
> day's worth of entries (i.e. 140 mil). A scheduled task wakes up every day
> and deletes all entries past the 14 day window (i.e. deletes entries from
> the 15th day in the past).

Not related to your vacuum problem, but if your pattern is something
like deleting everything inserted 15 days ago you may want to think of
using partitioning or simple inheritance. Make the scheduled task
create  a new partition/child table, redirect insertions to it, drop
the oldest partition.

Also, if the pattern is insert-only all the time, delete whole day, it
may be easier to exclude the table from autovacuum and make the
scheduled task vacuum the table after deleting.

Francisco Olarte.


Re: Index space growing even after cleanup via autovacuum in Postgres 9.2

От
Bill Moran
Дата:
On Fri, 10 Jan 2014 13:06:21 +0100 Francisco Olarte <folarte@peoplecall.com> wrote:
> Hi:
>
> On Thu, Jan 9, 2014 at 7:50 PM, Tirthankar Barari <tbarari@verizon.com> wrote:
> > We have a table where we insert about 10 million rows everyday. We keep 14
> > day's worth of entries (i.e. 140 mil). A scheduled task wakes up every day
> > and deletes all entries past the 14 day window (i.e. deletes entries from
> > the 15th day in the past).

I missed the early part of this thread, so I apologize if I'm repeating anything
that was previously stated.

In my experience, I have seen that rolling datasets like this are far better
maintained with more frequent purging.  Most database systems seem to struggle
when loading up large amounts of data, then removing large amounts of data.
What has helped immensely for me is to increase the frequency of the purging
process.  i.e. instead of puring once a day, purge once an hour, or possibly
even more frequently.  Even though the purge happens more often, it's less
overhead each time it happens since it's affecting less rows.  I have one
system that collects about 60 GPS coordinates per second and keeps 2 weeks
worth of data -- trial and error has found that the most efficient ongoing
maintenance is to purge about every 10 seconds (in the case of this particular
system, I actually wrote a program that estimates the current load on the
database and purges more or less frequently based on what else is going on,
but over the course of a day it averages out to about once very 10 seconds)

YMMV, and the suggestion about partitioning is something that might work in
your case as well.

--
Bill Moran <wmoran@potentialtech.com>


Re: Index space growing even after cleanup via autovacuum in Postgres 9.2

От
Francisco Olarte
Дата:
Hi:

On Mon, Jan 13, 2014 at 5:26 PM, Tirthankar Barari <tbarari@verizon.com> wrote:
> On 01/10/2014 07:06 AM, Francisco Olarte wrote:
>> Not related to your vacuum problem, but if your pattern is something
>> like deleting everything inserted 15 days ago you may want to think of
>> using partitioning or simple inheritance. Make the scheduled task
>> create  a new partition/child table, redirect insertions to it, drop
>> the oldest partition.
...
> Thanks for your input. The rentention window was supposed to be variable and
> dynamically changeable. So, partitioning is our last resort. Will try the
> vacuum after delete instead of autovacuum.

The ability to use partition/inheritance does not depend on an static
window, but on wether your deletion pattern is as described. Supose
you do it daily. You can name your partitions / child_tables as
child_YYYYMMDD. Then to delete data that is >N days old you just build
the cutoff date, select from the system catalog relations whos name is
like child_\d\d\d\d\d\d\d\d , whose name is greater than
child_12345678 ( substitute the curoff date ) and  whose parent is the
appropiate table and drop all of them. If the retention window just
grew ( say from 15 to 20 ), the first 5 days you'll find no child
table ( same as when starting, this looks like the window grew from 0
to N ). If it shrank from 15 to 10 the first day you'll drop 10
tables. Depending on how you change the retention window you can also
delete just the appropiate partition, ignoring error in case it does
not exists ( to acomodate window growing cases, you can even use just
a drop if exists ) and when the window shrinks you can zap extra
tables manually or on the procedure which shrinks the window. The
advantage of this is avoiding system catalog query, but I personally
would use first alternative. The logic is much the same as a deleting,
just using partition drops.

Regards.

Francisco Olarte.


Re: Index space growing even after cleanup via autovacuum in Postgres 9.2

От
Tirthankar Barari
Дата:
On 01/10/2014 07:06 AM, Francisco Olarte wrote:
> Hi:
>
> On Thu, Jan 9, 2014 at 7:50 PM, Tirthankar Barari <tbarari@verizon.com> wrote:
>> We have a table where we insert about 10 million rows everyday. We keep 14
>> day's worth of entries (i.e. 140 mil). A scheduled task wakes up every day
>> and deletes all entries past the 14 day window (i.e. deletes entries from
>> the 15th day in the past).
> Not related to your vacuum problem, but if your pattern is something
> like deleting everything inserted 15 days ago you may want to think of
> using partitioning or simple inheritance. Make the scheduled task
> create  a new partition/child table, redirect insertions to it, drop
> the oldest partition.
>
> Also, if the pattern is insert-only all the time, delete whole day, it
> may be easier to exclude the table from autovacuum and make the
> scheduled task vacuum the table after deleting.
>
> Francisco Olarte.

Thanks for your input. The rentention window was supposed to be variable and dynamically changeable. So, partitioning
isour last resort. Will try the vacuum after delete instead of autovacuum. 



Re: Index space growing even after cleanup via autovacuum in Postgres 9.2

От
Tirthankar Barari
Дата:
On 01/13/2014 01:38 PM, Francisco Olarte wrote:
> Hi:
>
> On Mon, Jan 13, 2014 at 5:26 PM, Tirthankar Barari <tbarari@verizon.com> wrote:
>> On 01/10/2014 07:06 AM, Francisco Olarte wrote:
>>> Not related to your vacuum problem, but if your pattern is something
>>> like deleting everything inserted 15 days ago you may want to think of
>>> using partitioning or simple inheritance. Make the scheduled task
>>> create  a new partition/child table, redirect insertions to it, drop
>>> the oldest partition.
> ...
>> Thanks for your input. The rentention window was supposed to be variable and
>> dynamically changeable. So, partitioning is our last resort. Will try the
>> vacuum after delete instead of autovacuum.
> The ability to use partition/inheritance does not depend on an static
> window, but on wether your deletion pattern is as described. Supose
> you do it daily. You can name your partitions / child_tables as
> child_YYYYMMDD. Then to delete data that is >N days old you just build
> the cutoff date, select from the system catalog relations whos name is
> like child_\d\d\d\d\d\d\d\d , whose name is greater than
> child_12345678 ( substitute the curoff date ) and  whose parent is the
> appropiate table and drop all of them. If the retention window just
> grew ( say from 15 to 20 ), the first 5 days you'll find no child
> table ( same as when starting, this looks like the window grew from 0
> to N ). If it shrank from 15 to 10 the first day you'll drop 10
> tables. Depending on how you change the retention window you can also
> delete just the appropiate partition, ignoring error in case it does
> not exists ( to acomodate window growing cases, you can even use just
> a drop if exists ) and when the window shrinks you can zap extra
> tables manually or on the procedure which shrinks the window. The
> advantage of this is avoiding system catalog query, but I personally
> would use first alternative. The logic is much the same as a deleting,
> just using partition drops.
>
> Regards.
>
> Francisco Olarte.
>
>
My tables are:

table test_metric (
         id varchar(255) not null, // just auto generated uuid from app
         timestamp timestamp not null,
         version int4,
         entity_id varchar(255) not null,
         primary key (id, timestamp)
     );
Indexes:
     "test_metric_pkey1" PRIMARY KEY, btree (id)
     "test_metric_entity_id_timestamp_key" UNIQUE CONSTRAINT, btree (entity_id, "timestamp")

AND

table test_metric_metrics (
         metric_id varchar(255) not null,
         metrics float8,
         metrics_key varchar(255) not null,
         primary key (metric_id, metrics_key)
     );
Indexes:
     "test_metric_metrics_pkey" PRIMARY KEY, btree (metric_id, metrics_key)
Foreign-key constraints:
     "fk3b8e13abb63406d5" FOREIGN KEY (metric_id) REFERENCES test_metric(id)

Basically, test_metric holds the timestamp and some metadata and test_metric_metrics holds the set of key/value pairs
forthe give entity and timestamp in the parent table. 

Is it possible to partition the second table by timestamp field from first table?

We are using postgres 9.2.2

Thanks,

- Tirthankar



Re: Index space growing even after cleanup via autovacuum in Postgres 9.2

От
Francisco Olarte
Дата:
On Wed, Jan 15, 2014 at 6:07 PM, Tirthankar Barari <tbarari@verizon.com> wrote:
> My tables are:
>
> table test_metric (
>         id varchar(255) not null, // just auto generated uuid from app
>         timestamp timestamp not null,
>         version int4,
>         entity_id varchar(255) not null,
>         primary key (id, timestamp)
>     );
> Indexes:
>     "test_metric_pkey1" PRIMARY KEY, btree (id)
>     "test_metric_entity_id_timestamp_key" UNIQUE CONSTRAINT, btree
> (entity_id, "timestamp")
>
> AND
>
> table test_metric_metrics (
>         metric_id varchar(255) not null,
>         metrics float8,
>         metrics_key varchar(255) not null,
>         primary key (metric_id, metrics_key)
>     );
> Indexes:
>     "test_metric_metrics_pkey" PRIMARY KEY, btree (metric_id, metrics_key)
> Foreign-key constraints:
>     "fk3b8e13abb63406d5" FOREIGN KEY (metric_id) REFERENCES test_metric(id)
>
> Basically, test_metric holds the timestamp and some metadata and
> test_metric_metrics holds the set of key/value pairs for the give entity and
> timestamp in the parent table.
>
> Is it possible to partition the second table by timestamp field from first
> table?


mmmm... I think not. Although you could copy the timestamp field into
the second table, but then things will begin to get hairy, as you'll
need to maintain redirection rules for insertions on both tables, and
I do not know from memory which are the rules when mixing foreign keys
and inheritance, and will need to test them anyway to convince myself
it works.

To dessign a solution for these, knowledge of the application and the
access patterns is needed. I just wanted to point that for maintaining
rolling logs of data inheritance plus some logic can greatly enhance
your performance, as you do not need deletes, just table droppings
which are normally faster. I do it to maintain call records, but I
have logic for my particular access pattern, which creates partitions
on demand and just uses copy against the relevant partition exploiting
timestamp locality and invariance ( cdrs are normally inserted for
recent calls, and timestamps do not normally change ) and it has made
maintenance much easier ( as I just coalesce old paritions into
unheriting archive tables and drop them ). You maybe able to dessign
something like this, but if you begin to have complex relatinoships,
probably the way to go until you explore the relationships is to just
cranck up the deleting / vacuuming frequency ( many small delete /
vacuum normally lead to more total time but less impact on db
performance, as it has been pointed previously ), and, if the tables
use the classical pattern for log tables ( insert at one end, delete
at the other, extremely rare updates ) it will probably perform better
if excluded from autovacuum and vacuumed explictly after deletion
batches, or just periodically ( as inserting does not leave many
oportunities to recover space, but you may want to analyze if vaccum
is set infrequently ).

Francisco Olarte.