Обсуждение: database size growing continously

Поиск
Список
Период
Сортировка

database size growing continously

От
Peter Meszaros
Дата:
Hi All,

I use postgresql 8.3.7 as a huge queue. There is a very simple table
with six columns and two indices, and about 6 million records are
written into it in every day continously commited every 10 seconds from
8 clients. The table stores approximately 120 million records, because a
cron job daily deletes those ones are older than 20 day. Autovacuum is
on and every settings is the factory default except some unrelated ones
(listen address, authorization). But my database is growing,
characteristically ~600MByte/day, but sometimes much slower (eg. 10MB,
or even 0!!!).

I've also tried a test on another server running the same postgresql,
where 300 million record was loaded into a freshly created database,
and 25 million was deleted with single DELETE command.  The 'vacuum
verbose phaseangle;' command seems to be running forever for hours:

phasor=# vacuum VERBOSE phaseangle;
INFO:  vacuuming "public.phaseangle"
INFO:  scanned index "i" to remove 2796006 row versions
DETAIL:  CPU 9.49s/120.30u sec elapsed 224.20 sec.
INFO:  scanned index "t" to remove 2796006 row versions
DETAIL:  CPU 13.57s/105.70u sec elapsed 192.71 sec.
INFO:  "phaseangle": removed 2796006 row versions in 24748 pages
DETAIL:  CPU 0.65s/0.30u sec elapsed 39.97 sec.
INFO:  scanned index "i" to remove 2795924 row versions
DETAIL:  CPU 9.58s/121.63u sec elapsed 239.06 sec.
INFO:  scanned index "t" to remove 2795924 row versions
DETAIL:  CPU 13.10s/103.59u sec elapsed 190.84 sec.
INFO:  "phaseangle": removed 2795924 row versions in 24743 pages
DETAIL:  CPU 0.68s/0.28u sec elapsed 40.21 sec.
INFO:  scanned index "i" to remove 2796014 row versions
DETAIL:  CPU 9.65s/117.28u sec elapsed 231.92 sec.
INFO:  scanned index "t" to remove 2796014 row versions
DETAIL:  CPU 13.48s/103.59u sec elapsed 194.49 sec.
INFO:  "phaseangle": removed 2796014 row versions in 24774 pages
DETAIL:  CPU 0.69s/0.28u sec elapsed 40.26 sec.
INFO:  scanned index "i" to remove 2795935 row versions
DETAIL:  CPU 9.55s/119.02u sec elapsed 226.85 sec.
INFO:  scanned index "t" to remove 2795935 row versions
DETAIL:  CPU 13.09s/102.84u sec elapsed 194.74 sec.
INFO:  "phaseangle": removed 2795935 row versions in 25097 pages
DETAIL:  CPU 0.67s/0.28u sec elapsed 41.21 sec.

still running...

These are the very same problems?
Should I delete mor frequently in smaller chunks? It seems to have a
limit...

Thanks

Peter

--

Re: database size growing continously

От
Ludwik Dylag
Дата:
I would recomend increasing fsm max_fsm_pages and shared_buffers
This changes did speed up vacuum full on my database.
With shared_buffers remember to increase max shm in your OS.

Ludwik

2009/10/29 Peter Meszaros <pme@prolan.hu>
Hi All,

I use postgresql 8.3.7 as a huge queue. There is a very simple table
with six columns and two indices, and about 6 million records are
written into it in every day continously commited every 10 seconds from
8 clients. The table stores approximately 120 million records, because a
cron job daily deletes those ones are older than 20 day. Autovacuum is
on and every settings is the factory default except some unrelated ones
(listen address, authorization). But my database is growing,
characteristically ~600MByte/day, but sometimes much slower (eg. 10MB,
or even 0!!!).

I've also tried a test on another server running the same postgresql,
where 300 million record was loaded into a freshly created database,
and 25 million was deleted with single DELETE command.  The 'vacuum
verbose phaseangle;' command seems to be running forever for hours:

phasor=# vacuum VERBOSE phaseangle;
INFO:  vacuuming "public.phaseangle"
INFO:  scanned index "i" to remove 2796006 row versions
DETAIL:  CPU 9.49s/120.30u sec elapsed 224.20 sec.
INFO:  scanned index "t" to remove 2796006 row versions
DETAIL:  CPU 13.57s/105.70u sec elapsed 192.71 sec.
INFO:  "phaseangle": removed 2796006 row versions in 24748 pages
DETAIL:  CPU 0.65s/0.30u sec elapsed 39.97 sec.
INFO:  scanned index "i" to remove 2795924 row versions
DETAIL:  CPU 9.58s/121.63u sec elapsed 239.06 sec.
INFO:  scanned index "t" to remove 2795924 row versions
DETAIL:  CPU 13.10s/103.59u sec elapsed 190.84 sec.
INFO:  "phaseangle": removed 2795924 row versions in 24743 pages
DETAIL:  CPU 0.68s/0.28u sec elapsed 40.21 sec.
INFO:  scanned index "i" to remove 2796014 row versions
DETAIL:  CPU 9.65s/117.28u sec elapsed 231.92 sec.
INFO:  scanned index "t" to remove 2796014 row versions
DETAIL:  CPU 13.48s/103.59u sec elapsed 194.49 sec.
INFO:  "phaseangle": removed 2796014 row versions in 24774 pages
DETAIL:  CPU 0.69s/0.28u sec elapsed 40.26 sec.
INFO:  scanned index "i" to remove 2795935 row versions
DETAIL:  CPU 9.55s/119.02u sec elapsed 226.85 sec.
INFO:  scanned index "t" to remove 2795935 row versions
DETAIL:  CPU 13.09s/102.84u sec elapsed 194.74 sec.
INFO:  "phaseangle": removed 2795935 row versions in 25097 pages
DETAIL:  CPU 0.67s/0.28u sec elapsed 41.21 sec.

still running...

These are the very same problems?
Should I delete mor frequently in smaller chunks? It seems to have a
limit...

Thanks

Peter

--

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



--
Ludwik Dyląg

Re: database size growing continously

От
Josh Rovero
Дата:
On Thu, 2009-10-29 at 15:44 +0100, Peter Meszaros wrote:
> Hi All,
>
> I use postgresql 8.3.7 as a huge queue. There is a very simple table
> with six columns and two indices, and about 6 million records are
> written into it in every day continously commited every 10 seconds from
> 8 clients. The table stores approximately 120 million records, because a
> cron job daily deletes those ones are older than 20 day. Autovacuum is
> on and every settings is the factory default except some unrelated ones
> (listen address, authorization). But my database is growing,
> characteristically ~600MByte/day, but sometimes much slower (eg. 10MB,
> or even 0!!!).

Do you ever "vacuum full" to reclaim empty record space?

--
P.J. "Josh" Rovero            Vice President     Sonalysts, Inc.
Email:  rovero@sonalysts.com  www.sonalysts.com  215 Parkway North
Work:   (860)326-3671                            Waterford, CT 06385



Re: database size growing continously

От
Matthew Wakeling
Дата:
On Thu, 29 Oct 2009, Josh Rovero wrote:
> Do you ever "vacuum full" to reclaim empty record space?

Unless you expect the size of the database to permanently decrease by a
significant amount, that is a waste of time, and may cause bloat in
indexes. In this case, since the space will be used again fairly soon, it
is better to just VACUUM, or autovacuum. Just make sure the free space map
can cope with it.

Matthew

--
 import oz.wizards.Magic;
   if (Magic.guessRight())...           -- Computer Science Lecturer

Re: database size growing continously

От
Ludwik Dylag
Дата:
2009/10/29 Peter Meszaros <pme@prolan.hu>
Hi All,

I use postgresql 8.3.7 as a huge queue. There is a very simple table
with six columns and two indices, and about 6 million records are
written into it in every day continously commited every 10 seconds from
8 clients. The table stores approximately 120 million records, because a
cron job daily deletes those ones are older than 20 day. Autovacuum is
on and every settings is the factory default except some unrelated ones
(listen address, authorization). But my database is growing,
characteristically ~600MByte/day, but sometimes much slower (eg. 10MB,
or even 0!!!).

I've also tried a test on another server running the same postgresql,
where 300 million record was loaded into a freshly created database,
and 25 million was deleted with single DELETE command.  The 'vacuum
verbose phaseangle;' command seems to be running forever for hours:

Try increasing max_fsm_pages and shared_buffers
These changes did speed up vacuum full on my database.
With shared_buffers remember to increase max shm in your OS.

Ludwik
 

phasor=# vacuum VERBOSE phaseangle;
INFO:  vacuuming "public.phaseangle"
INFO:  scanned index "i" to remove 2796006 row versions
DETAIL:  CPU 9.49s/120.30u sec elapsed 224.20 sec.
INFO:  scanned index "t" to remove 2796006 row versions
DETAIL:  CPU 13.57s/105.70u sec elapsed 192.71 sec.
INFO:  "phaseangle": removed 2796006 row versions in 24748 pages
DETAIL:  CPU 0.65s/0.30u sec elapsed 39.97 sec.
INFO:  scanned index "i" to remove 2795924 row versions
DETAIL:  CPU 9.58s/121.63u sec elapsed 239.06 sec.
INFO:  scanned index "t" to remove 2795924 row versions
DETAIL:  CPU 13.10s/103.59u sec elapsed 190.84 sec.
INFO:  "phaseangle": removed 2795924 row versions in 24743 pages
DETAIL:  CPU 0.68s/0.28u sec elapsed 40.21 sec.
INFO:  scanned index "i" to remove 2796014 row versions
DETAIL:  CPU 9.65s/117.28u sec elapsed 231.92 sec.
INFO:  scanned index "t" to remove 2796014 row versions
DETAIL:  CPU 13.48s/103.59u sec elapsed 194.49 sec.
INFO:  "phaseangle": removed 2796014 row versions in 24774 pages
DETAIL:  CPU 0.69s/0.28u sec elapsed 40.26 sec.
INFO:  scanned index "i" to remove 2795935 row versions
DETAIL:  CPU 9.55s/119.02u sec elapsed 226.85 sec.
INFO:  scanned index "t" to remove 2795935 row versions
DETAIL:  CPU 13.09s/102.84u sec elapsed 194.74 sec.
INFO:  "phaseangle": removed 2795935 row versions in 25097 pages
DETAIL:  CPU 0.67s/0.28u sec elapsed 41.21 sec.

still running...

These are the very same problems?
Should I delete mor frequently in smaller chunks? It seems to have a
limit...

Thanks

Peter

--

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



--
Ludwik Dyląg

Re: database size growing continously

От
Chris Ernst
Дата:
Hi Peter,

Sounds like you're experiencing index bloat and vacuums do nothing to
help that.  You can do one of 2 thing to remedy this:

1) The fastest and simplest (but most disruptive) way is to use REINDEX.
 But this will exclusively lock the table while rebuilding the indexes:

  REINDEX TABLE phaseangle;

2) The slower but less disruptive way is to do a concurrent build of
each index and then drop the old ones. For example, to rebuild the "i"
index:

  CREATE INDEX CONCURRENTLY i_new ON phaseangle (<indexed columns>);
  DROP INDEX i;
  ALTER INDEX i_new RENAME TO i;
  ANALYZE phaseangle (<indexed columns>);

Do this regularly to keep the index sizes in check.

    - Chris

Peter Meszaros wrote:
> Hi All,
>
> I use postgresql 8.3.7 as a huge queue. There is a very simple table
> with six columns and two indices, and about 6 million records are
> written into it in every day continously commited every 10 seconds from
> 8 clients. The table stores approximately 120 million records, because a
> cron job daily deletes those ones are older than 20 day. Autovacuum is
> on and every settings is the factory default except some unrelated ones
> (listen address, authorization). But my database is growing,
> characteristically ~600MByte/day, but sometimes much slower (eg. 10MB,
> or even 0!!!).
>
> I've also tried a test on another server running the same postgresql,
> where 300 million record was loaded into a freshly created database,
> and 25 million was deleted with single DELETE command.  The 'vacuum
> verbose phaseangle;' command seems to be running forever for hours:
>
> phasor=# vacuum VERBOSE phaseangle;
> INFO:  vacuuming "public.phaseangle"
> INFO:  scanned index "i" to remove 2796006 row versions
> DETAIL:  CPU 9.49s/120.30u sec elapsed 224.20 sec.
> INFO:  scanned index "t" to remove 2796006 row versions
> DETAIL:  CPU 13.57s/105.70u sec elapsed 192.71 sec.
> INFO:  "phaseangle": removed 2796006 row versions in 24748 pages
> DETAIL:  CPU 0.65s/0.30u sec elapsed 39.97 sec.
> INFO:  scanned index "i" to remove 2795924 row versions
> DETAIL:  CPU 9.58s/121.63u sec elapsed 239.06 sec.
> INFO:  scanned index "t" to remove 2795924 row versions
> DETAIL:  CPU 13.10s/103.59u sec elapsed 190.84 sec.
> INFO:  "phaseangle": removed 2795924 row versions in 24743 pages
> DETAIL:  CPU 0.68s/0.28u sec elapsed 40.21 sec.
> INFO:  scanned index "i" to remove 2796014 row versions
> DETAIL:  CPU 9.65s/117.28u sec elapsed 231.92 sec.
> INFO:  scanned index "t" to remove 2796014 row versions
> DETAIL:  CPU 13.48s/103.59u sec elapsed 194.49 sec.
> INFO:  "phaseangle": removed 2796014 row versions in 24774 pages
> DETAIL:  CPU 0.69s/0.28u sec elapsed 40.26 sec.
> INFO:  scanned index "i" to remove 2795935 row versions
> DETAIL:  CPU 9.55s/119.02u sec elapsed 226.85 sec.
> INFO:  scanned index "t" to remove 2795935 row versions
> DETAIL:  CPU 13.09s/102.84u sec elapsed 194.74 sec.
> INFO:  "phaseangle": removed 2795935 row versions in 25097 pages
> DETAIL:  CPU 0.67s/0.28u sec elapsed 41.21 sec.
>
> still running...
>
> These are the very same problems?
> Should I delete mor frequently in smaller chunks? It seems to have a
> limit...
>
> Thanks
>
> Peter
>


Re: database size growing continously

От
"Joshua D. Drake"
Дата:
On Thu, 2009-10-29 at 17:00 +0100, Ludwik Dylag wrote:
> 2009/10/29 Peter Meszaros <pme@prolan.hu>
>         Hi All,
>
>         I use postgresql 8.3.7 as a huge queue. There is a very simple
>         table
>         with six columns and two indices, and about 6 million records
>         are
>         written into it in every day continously commited every 10
>         seconds from
>         8 clients. The table stores approximately 120 million records,
>         because a
>         cron job daily deletes those ones are older than 20 day.
>         Autovacuum is
>         on and every settings is the factory default except some
>         unrelated ones
>         (listen address, authorization). But my database is growing,
>         characteristically ~600MByte/day, but sometimes much slower
>         (eg. 10MB,
>         or even 0!!!).
>
>         I've also tried a test on another server running the same
>         postgresql,
>         where 300 million record was loaded into a freshly created
>         database,
>         and 25 million was deleted with single DELETE command.  The
>         'vacuum
>         verbose phaseangle;' command seems to be running forever for
>         hours:
>
>
> Try increasing max_fsm_pages and shared_buffers
> These changes did speed up vacuum full on my database.
> With shared_buffers remember to increase max shm in your OS.

If you overran your max_fsm_pages you are going to have indexes that are
not properly cleaned up, even after a vacuum full. You will need to
cluster or reindex.

Joshua D. Drake






--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - Salamander

Re: database size growing continously

От
Steve Crawford
Дата:
Peter Meszaros wrote:
> Hi All,
>
> I use postgresql 8.3.7 as a huge queue. There is a very simple table
> with six columns and two indices, and about 6 million records are
> written into it in every day continously commited every 10 seconds from
> 8 clients. The table stores approximately 120 million records, because a
> cron job daily deletes those ones are older than 20 day. Autovacuum is
> on and every settings is the factory default except some unrelated ones
> (listen address, authorization). But my database is growing,
> characteristically ~600MByte/day, but sometimes much slower (eg. 10MB,
> or even 0!!!)...

Can you try running against 8.4.1? I believe there are a number of
improvements that should help in your case. For one thing, the
max_fsm_pages and max_fsm_relation "knobs" are gone - it happens
automagically. I believe there are some substantial improvements in
space reuse along with numerous improvements not directly related to
your question.

Cheers,
Steve


Re: database size growing continously

От
Steve Crawford
Дата:
Peter Meszaros wrote:
> Hi All,
>
> I use postgresql 8.3.7 as a huge queue. There is a very simple table
> with six columns and two indices, and about 6 million records are
> written into it in every day continously commited every 10 seconds from
> 8 clients. The table stores approximately 120 million records, because a
> cron job daily deletes those ones are older than 20 day.
You may be an ideal candidate for table partitioning - this is
frequently used for rotating log table maintenance.

Use a parent table and 20 child tables. Create a new child every day and
drop the 20-day-old table. Table drops are far faster and lower-impact
than delete-from a 120-million row table. Index-bloat is limited to
one-day of inserts and will be eliminated in 20-days. No deletes means
no vacuum requirement on the affected tables. Single tables are limited
to about 6-million records. A clever backup scheme can ignore
prior-days' static child-tables (and you could keep
historical-data-dumps off-line for later use if desired).

Read up on it here:
http://www.postgresql.org/docs/8.4/interactive/ddl-partitioning.html

Cheers,
Steve


Re: database size growing continously

От
Scott Marlowe
Дата:
On Thu, Oct 29, 2009 at 8:44 AM, Peter Meszaros <pme@prolan.hu> wrote:
> Hi All,
>
> I use postgresql 8.3.7 as a huge queue. There is a very simple table
> with six columns and two indices, and about 6 million records are
> written into it in every day continously commited every 10 seconds from
> 8 clients. The table stores approximately 120 million records, because a
> cron job daily deletes those ones are older than 20 day. Autovacuum is
> on and every settings is the factory default except some unrelated ones
> (listen address, authorization). But my database is growing,
> characteristically ~600MByte/day, but sometimes much slower (eg. 10MB,
> or even 0!!!).

Sounds like you're blowing out your free space map.  Things to try:

1: delete your rows in smaller batches.  Like every hour delete
everything over 20 days so you don't delete them all at once one time
a day.
2: crank up max fsm pages large enough to hold all the dead tuples.
3: lower the autovacuum cost delay
4: get faster hard drives so that vacuum can keep up without causing
your system to slow to a crawl while vacuum is running.

Re: database size growing continously

От
"Joshua D. Drake"
Дата:
On Thu, 2009-10-29 at 17:00 +0100, Ludwik Dylag wrote:
> 2009/10/29 Peter Meszaros <pme@prolan.hu>
>         Hi All,
>
>         I use postgresql 8.3.7 as a huge queue. There is a very simple
>         table
>         with six columns and two indices, and about 6 million records
>         are
>         written into it in every day continously commited every 10
>         seconds from
>         8 clients. The table stores approximately 120 million records,
>         because a
>         cron job daily deletes those ones are older than 20 day.
>         Autovacuum is
>         on and every settings is the factory default except some
>         unrelated ones
>         (listen address, authorization). But my database is growing,
>         characteristically ~600MByte/day, but sometimes much slower
>         (eg. 10MB,
>         or even 0!!!).
>
>         I've also tried a test on another server running the same
>         postgresql,
>         where 300 million record was loaded into a freshly created
>         database,
>         and 25 million was deleted with single DELETE command.  The
>         'vacuum
>         verbose phaseangle;' command seems to be running forever for
>         hours:
>
>
> Try increasing max_fsm_pages and shared_buffers
> These changes did speed up vacuum full on my database.
> With shared_buffers remember to increase max shm in your OS.

If you overran your max_fsm_pages you are going to have indexes that are
not properly cleaned up, even after a vacuum full. You will need to
cluster or reindex.

Joshua D. Drake






--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564
Consulting, Training, Support, Custom Development, Engineering
If the world pushes look it in the eye and GRR. Then push back harder. - Salamander


Re: database size growing continously

От
Merlin Moncure
Дата:
On Thu, Oct 29, 2009 at 11:40 AM, Steve Crawford
<scrawford@pinpointresearch.com> wrote:
> Peter Meszaros wrote:
>>
>> Hi All,
>>
>> I use postgresql 8.3.7 as a huge queue. There is a very simple table
>> with six columns and two indices, and about 6 million records are
>> written into it in every day continously commited every 10 seconds from
>> 8 clients. The table stores approximately 120 million records, because a
>> cron job daily deletes those ones are older than 20 day.
>
> You may be an ideal candidate for table partitioning - this is frequently
> used for rotating log table maintenance.
>
> Use a parent table and 20 child tables. Create a new child every day and
> drop the 20-day-old table. Table drops are far faster and lower-impact than
> delete-from a 120-million row table. Index-bloat is limited to one-day of
> inserts and will be eliminated in 20-days. No deletes means no vacuum
> requirement on the affected tables. Single tables are limited to about
> 6-million records. A clever backup scheme can ignore prior-days' static
> child-tables (and you could keep historical-data-dumps off-line for later
> use if desired).
>
> Read up on it here:
> http://www.postgresql.org/docs/8.4/interactive/ddl-partitioning.html

From a performance point of view, this is going to be the best option.
 It might push some complexity though into his queries to invoke
constraint exclusion or deal directly with the child partitions.

merlin

Re: database size growing continously

От
Jeremy Harris
Дата:
On 10/30/2009 12:43 PM, Merlin Moncure wrote:
> On Thu, Oct 29, 2009 at 11:40 AM, Steve Crawford
> <scrawford@pinpointresearch.com>  wrote:
>> Use a parent table and 20 child tables. Create a new child every day and
>> drop the 20-day-old table. Table drops are far faster and lower-impact than
>> delete-from a 120-million row table. Index-bloat is limited to one-day of
>> inserts and will be eliminated in 20-days.
[...]
>> Read up on it here:
>> http://www.postgresql.org/docs/8.4/interactive/ddl-partitioning.html
>
>  From a performance point of view, this is going to be the best option.
>   It might push some complexity though into his queries to invoke
> constraint exclusion or deal directly with the child partitions.

Seeking to understand.... is the use of partitions and constraint-exclusion
pretty much a hack to get around poor performance, which really ought
to be done invisibly and automatically by a DBMS?

Much as indexes per se are, in the SQL/Codd worldview?

Or, is there more to it?


I appreciate the "Simple Matter Of Programming" problem.

Thanks,
     Jeremy

Re: database size growing continously

От
Anj Adu
Дата:
Any relational database worth its salt has partitioning for a reason.

1. Maintenance.  You will need to delete data at some
point.(cleanup)...Partitions are the only way to do it effectively.
2. Performance.  Partitioning offer a way to query smaller slices of
data automatically (i.e the query optimizer will choose the partition
for you) ...very large tables are a no-no in any relational
database.(sheer size has limitations)


On Fri, Oct 30, 2009 at 11:57 AM, Jeremy Harris <jgh@wizmail.org> wrote:
> On 10/30/2009 12:43 PM, Merlin Moncure wrote:
>>
>> On Thu, Oct 29, 2009 at 11:40 AM, Steve Crawford
>> <scrawford@pinpointresearch.com>  wrote:
>>>
>>> Use a parent table and 20 child tables. Create a new child every day and
>>> drop the 20-day-old table. Table drops are far faster and lower-impact
>>> than
>>> delete-from a 120-million row table. Index-bloat is limited to one-day of
>>> inserts and will be eliminated in 20-days.
>
> [...]
>>>
>>> Read up on it here:
>>> http://www.postgresql.org/docs/8.4/interactive/ddl-partitioning.html
>>
>>  From a performance point of view, this is going to be the best option.
>>  It might push some complexity though into his queries to invoke
>> constraint exclusion or deal directly with the child partitions.
>
> Seeking to understand.... is the use of partitions and constraint-exclusion
> pretty much a hack to get around poor performance, which really ought
> to be done invisibly and automatically by a DBMS?
>
> Much as indexes per se are, in the SQL/Codd worldview?
>
> Or, is there more to it?
>
>
> I appreciate the "Simple Matter Of Programming" problem.
>
> Thanks,
>    Jeremy
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: database size growing continously

От
Greg Stark
Дата:
On Fri, Oct 30, 2009 at 12:53 PM, Anj Adu <fotographs@gmail.com> wrote:
> Any relational database worth its salt has partitioning for a reason.
>
> 1. Maintenance.  You will need to delete data at some
> point.(cleanup)...Partitions are the only way to do it effectively.

This is true and it's unavoidably a manual process. The database will
not know what segments of the data you intend to load and unload en
masse.

> 2. Performance.  Partitioning offer a way to query smaller slices of
> data automatically (i.e the query optimizer will choose the partition
> for you) ...very large tables are a no-no in any relational
> database.(sheer size has limitations)

This I dispute. Databases are designed to be scalable and very large
tables should perform just as well as smaller tables.

Where partitions win for performance is when you know something about
how your data is accessed and you can optimize the access by
partitioning along the same keys. For example if you're doing a
sequential scan of just one partition or doing a merge join of two
equivalently partitioned tables and the partitions can be sorted in
memory.

However in these cases it is possible the database will become more
intelligent and be able to achieve the same performance gains
automatically. Bitmap index scans should perform comparably to the
sequential scan of individual partitions for example.

--
greg

Re: database size growing continously

От
Anj Adu
Дата:
Database are designed to handle very large tables..but effectiveness
is always at question. A full table scan on a partitioned table is
always preferable to a FTS on a super large table. The nature of the
query will of-course dictate performance..but you run into definite
limitations with very large tables.

On Fri, Oct 30, 2009 at 1:01 PM, Greg Stark <gsstark@mit.edu> wrote:
> On Fri, Oct 30, 2009 at 12:53 PM, Anj Adu <fotographs@gmail.com> wrote:
>> Any relational database worth its salt has partitioning for a reason.
>>
>> 1. Maintenance.  You will need to delete data at some
>> point.(cleanup)...Partitions are the only way to do it effectively.
>
> This is true and it's unavoidably a manual process. The database will
> not know what segments of the data you intend to load and unload en
> masse.
>
>> 2. Performance.  Partitioning offer a way to query smaller slices of
>> data automatically (i.e the query optimizer will choose the partition
>> for you) ...very large tables are a no-no in any relational
>> database.(sheer size has limitations)
>
> This I dispute. Databases are designed to be scalable and very large
> tables should perform just as well as smaller tables.
>
> Where partitions win for performance is when you know something about
> how your data is accessed and you can optimize the access by
> partitioning along the same keys. For example if you're doing a
> sequential scan of just one partition or doing a merge join of two
> equivalently partitioned tables and the partitions can be sorted in
> memory.
>
> However in these cases it is possible the database will become more
> intelligent and be able to achieve the same performance gains
> automatically. Bitmap index scans should perform comparably to the
> sequential scan of individual partitions for example.
>
> --
> greg
>

Re: database size growing continously

От
Jeremy Harris
Дата:
On 10/30/2009 08:01 PM, Greg Stark wrote:
> On Fri, Oct 30, 2009 at 12:53 PM, Anj Adu<fotographs@gmail.com>  wrote:
>> Any relational database worth its salt has partitioning for a reason.
>>
>> 1. Maintenance.  You will need to delete data at some
>> point.(cleanup)...Partitions are the only way to do it effectively.
>
> This is true and it's unavoidably a manual process. The database will
> not know what segments of the data you intend to load and unload en
> masse.
>
>> 2. Performance.  Partitioning offer a way to query smaller slices of
>> data automatically (i.e the query optimizer will choose the partition
>> for you) ...very large tables are a no-no in any relational
>> database.(sheer size has limitations)
>
> This I dispute. Databases are designed to be scalable and very large
> tables should perform just as well as smaller tables.
>
> Where partitions win for performance is when you know something about
> how your data is accessed and you can optimize the access by
> partitioning along the same keys. For example if you're doing a
> sequential scan of just one partition or doing a merge join of two
> equivalently partitioned tables and the partitions can be sorted in
> memory.
>
> However in these cases it is possible the database will become more
> intelligent and be able to achieve the same performance gains
> automatically. Bitmap index scans should perform comparably to the
> sequential scan of individual partitions for example.
>

So, on the becoming more intelligent front:  PostgreSQL already does
some operations as background maintenance (autovacuum).  Extending
this to de-bloat indices does not seem conceptually impossible, nor for
the collection of table-data statistics for planner guidance (also, why
could a full-table-scan not collect stats as a side-effect?).  Further out,
how about the gathering of statistics on queries to guide the automatic
creation of indices?  Or to set up a partitioning scheme on a previously
monolithic table?

- Jeremy


Re: database size growing continously

От
Greg Stark
Дата:
On Fri, Oct 30, 2009 at 1:18 PM, Jeremy Harris <jgh@wizmail.org> wrote:
> So, on the becoming more intelligent front:  PostgreSQL already does
> some operations as background maintenance (autovacuum).  Extending
> this to de-bloat indices does not seem conceptually impossible

It could be done but it's not easy because there will be people
concurrently scanning the index. Vacuum is limited to operations it
can do without blocking other jobs.

>, nor for the collection of table-data statistics for planner guidance

Well autovacuum already does this.


> (also, why
> could a full-table-scan not collect stats as a side-effect?).

That's a good idea but there are difficulties with it. The full table
scan might not run to completion meaning you may have done a lot of
work for nothing. Also gathering and processing that data is fairly
expensive, especially for higher stats targets. It requires sorting
the data by each column which takes some cpu time which we wouldn't
want to make sql queries wait for.

>  Further out, how about the gathering of statistics on queries to guide the automatic
> creation of indices?

I think we do need more run-time stats. How to make use of them would
be a complex question. We could automatically tune the cost
parameters, we could automatically try other plans and see if they run
faster, we could even automatically build indexes. Not all of these
would be appropriate in every environment though.

>  Or to set up a partitioning scheme on a previously monolithic table?

Well that involves moving data that other users might be busy
accessing. Again we wouldn't want an automatic background job blocking
user queries.

--
greg

Re: database size growing continously

От
Peter Meszaros
Дата:
Thank you all for the fast responses!

I changed the delete's schedule from daily to hourly and I will let you
know the result. This seems to be the most promising step.

The next one is tuning 'max_fsm_pages'.
Increasing max_fsm_pages can be also helpful, but I've read that
'vacuum verbose ...' will issue warnings if max_fsm_pages is too small.
I've never seen such messag, this command is either run and finish or
goes to an endless loop as it was written in my initial e-mail.


On Thu, Oct 29, 2009 at 10:59:48AM -0600, Scott Marlowe wrote:
> On Thu, Oct 29, 2009 at 8:44 AM, Peter Meszaros <pme@prolan.hu> wrote:
> > Hi All,
> >
> > I use postgresql 8.3.7 as a huge queue. There is a very simple table
> > with six columns and two indices, and about 6 million records are
> > written into it in every day continously commited every 10 seconds from
> > 8 clients. The table stores approximately 120 million records, because a
> > cron job daily deletes those ones are older than 20 day. Autovacuum is
> > on and every settings is the factory default except some unrelated ones
> > (listen address, authorization). But my database is growing,
> > characteristically ~600MByte/day, but sometimes much slower (eg. 10MB,
> > or even 0!!!).
>
> Sounds like you're blowing out your free space map.  Things to try:
>
> 1: delete your rows in smaller batches.  Like every hour delete
> everything over 20 days so you don't delete them all at once one time
> a day.
> 2: crank up max fsm pages large enough to hold all the dead tuples.
> 3: lower the autovacuum cost delay
> 4: get faster hard drives so that vacuum can keep up without causing
> your system to slow to a crawl while vacuum is running.

--
E-mail: pmeATprolanDOThu
Phone: +36-20-954-3100/8139
Mobile: +36-20-9543139
Fax: +36-26-540420
http://www.prolan.hu
Mon Nov  2 13:20:39 CET 2009

Re: database size growing continously

От
Anj Adu
Дата:
I would recommend (if at all possible) to partition the table and drop
the old partitions when not needed. This will guarantee the space
free-up without VACUUM overhead. Deletes will kill you at some point
and you dont want too much of the VACUUM IO overhead impacting your
performance.

On Mon, Nov 2, 2009 at 4:50 AM, Peter Meszaros <pme@prolan.hu> wrote:
> Thank you all for the fast responses!
>
> I changed the delete's schedule from daily to hourly and I will let you
> know the result. This seems to be the most promising step.
>
> The next one is tuning 'max_fsm_pages'.
> Increasing max_fsm_pages can be also helpful, but I've read that
> 'vacuum verbose ...' will issue warnings if max_fsm_pages is too small.
> I've never seen such messag, this command is either run and finish or
> goes to an endless loop as it was written in my initial e-mail.
>
>
> On Thu, Oct 29, 2009 at 10:59:48AM -0600, Scott Marlowe wrote:
>> On Thu, Oct 29, 2009 at 8:44 AM, Peter Meszaros <pme@prolan.hu> wrote:
>> > Hi All,
>> >
>> > I use postgresql 8.3.7 as a huge queue. There is a very simple table
>> > with six columns and two indices, and about 6 million records are
>> > written into it in every day continously commited every 10 seconds from
>> > 8 clients. The table stores approximately 120 million records, because a
>> > cron job daily deletes those ones are older than 20 day. Autovacuum is
>> > on and every settings is the factory default except some unrelated ones
>> > (listen address, authorization). But my database is growing,
>> > characteristically ~600MByte/day, but sometimes much slower (eg. 10MB,
>> > or even 0!!!).
>>
>> Sounds like you're blowing out your free space map.  Things to try:
>>
>> 1: delete your rows in smaller batches.  Like every hour delete
>> everything over 20 days so you don't delete them all at once one time
>> a day.
>> 2: crank up max fsm pages large enough to hold all the dead tuples.
>> 3: lower the autovacuum cost delay
>> 4: get faster hard drives so that vacuum can keep up without causing
>> your system to slow to a crawl while vacuum is running.
>
> --
> E-mail: pmeATprolanDOThu
> Phone: +36-20-954-3100/8139
> Mobile: +36-20-9543139
> Fax: +36-26-540420
> http://www.prolan.hu
> Mon Nov  2 13:20:39 CET 2009
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: database size growing continously

От
Robert Haas
Дата:
On Mon, Nov 2, 2009 at 7:50 AM, Peter Meszaros <pme@prolan.hu> wrote:
> Increasing max_fsm_pages can be also helpful, but I've read that
> 'vacuum verbose ...' will issue warnings if max_fsm_pages is too small.
> I've never seen such messag, this command is either run and finish or
> goes to an endless loop as it was written in my initial e-mail.

I don't think it goes into an endless loop.  I think it runs for a
really long time because your database is bloated, in need of
vacuuming, and probably has blown out the free space map.  But since
you haven't let it run to completion you haven't seem the message.

...Robert