Обсуждение: database size growing continously
Hi All, I use postgresql 8.3.7 as a huge queue. There is a very simple table with six columns and two indices, and about 6 million records are written into it in every day continously commited every 10 seconds from 8 clients. The table stores approximately 120 million records, because a cron job daily deletes those ones are older than 20 day. Autovacuum is on and every settings is the factory default except some unrelated ones (listen address, authorization). But my database is growing, characteristically ~600MByte/day, but sometimes much slower (eg. 10MB, or even 0!!!). I've also tried a test on another server running the same postgresql, where 300 million record was loaded into a freshly created database, and 25 million was deleted with single DELETE command. The 'vacuum verbose phaseangle;' command seems to be running forever for hours: phasor=# vacuum VERBOSE phaseangle; INFO: vacuuming "public.phaseangle" INFO: scanned index "i" to remove 2796006 row versions DETAIL: CPU 9.49s/120.30u sec elapsed 224.20 sec. INFO: scanned index "t" to remove 2796006 row versions DETAIL: CPU 13.57s/105.70u sec elapsed 192.71 sec. INFO: "phaseangle": removed 2796006 row versions in 24748 pages DETAIL: CPU 0.65s/0.30u sec elapsed 39.97 sec. INFO: scanned index "i" to remove 2795924 row versions DETAIL: CPU 9.58s/121.63u sec elapsed 239.06 sec. INFO: scanned index "t" to remove 2795924 row versions DETAIL: CPU 13.10s/103.59u sec elapsed 190.84 sec. INFO: "phaseangle": removed 2795924 row versions in 24743 pages DETAIL: CPU 0.68s/0.28u sec elapsed 40.21 sec. INFO: scanned index "i" to remove 2796014 row versions DETAIL: CPU 9.65s/117.28u sec elapsed 231.92 sec. INFO: scanned index "t" to remove 2796014 row versions DETAIL: CPU 13.48s/103.59u sec elapsed 194.49 sec. INFO: "phaseangle": removed 2796014 row versions in 24774 pages DETAIL: CPU 0.69s/0.28u sec elapsed 40.26 sec. INFO: scanned index "i" to remove 2795935 row versions DETAIL: CPU 9.55s/119.02u sec elapsed 226.85 sec. INFO: scanned index "t" to remove 2795935 row versions DETAIL: CPU 13.09s/102.84u sec elapsed 194.74 sec. INFO: "phaseangle": removed 2795935 row versions in 25097 pages DETAIL: CPU 0.67s/0.28u sec elapsed 41.21 sec. still running... These are the very same problems? Should I delete mor frequently in smaller chunks? It seems to have a limit... Thanks Peter --
I would recomend increasing fsm max_fsm_pages and shared_buffers
This changes did speed up vacuum full on my database.
With shared_buffers remember to increase max shm in your OS.
Ludwik
--
Ludwik Dyląg
2009/10/29 Peter Meszaros <pme@prolan.hu>
Hi All,
I use postgresql 8.3.7 as a huge queue. There is a very simple table
with six columns and two indices, and about 6 million records are
written into it in every day continously commited every 10 seconds from
8 clients. The table stores approximately 120 million records, because a
cron job daily deletes those ones are older than 20 day. Autovacuum is
on and every settings is the factory default except some unrelated ones
(listen address, authorization). But my database is growing,
characteristically ~600MByte/day, but sometimes much slower (eg. 10MB,
or even 0!!!).
I've also tried a test on another server running the same postgresql,
where 300 million record was loaded into a freshly created database,
and 25 million was deleted with single DELETE command. The 'vacuum
verbose phaseangle;' command seems to be running forever for hours:
phasor=# vacuum VERBOSE phaseangle;
INFO: vacuuming "public.phaseangle"
INFO: scanned index "i" to remove 2796006 row versions
DETAIL: CPU 9.49s/120.30u sec elapsed 224.20 sec.
INFO: scanned index "t" to remove 2796006 row versions
DETAIL: CPU 13.57s/105.70u sec elapsed 192.71 sec.
INFO: "phaseangle": removed 2796006 row versions in 24748 pages
DETAIL: CPU 0.65s/0.30u sec elapsed 39.97 sec.
INFO: scanned index "i" to remove 2795924 row versions
DETAIL: CPU 9.58s/121.63u sec elapsed 239.06 sec.
INFO: scanned index "t" to remove 2795924 row versions
DETAIL: CPU 13.10s/103.59u sec elapsed 190.84 sec.
INFO: "phaseangle": removed 2795924 row versions in 24743 pages
DETAIL: CPU 0.68s/0.28u sec elapsed 40.21 sec.
INFO: scanned index "i" to remove 2796014 row versions
DETAIL: CPU 9.65s/117.28u sec elapsed 231.92 sec.
INFO: scanned index "t" to remove 2796014 row versions
DETAIL: CPU 13.48s/103.59u sec elapsed 194.49 sec.
INFO: "phaseangle": removed 2796014 row versions in 24774 pages
DETAIL: CPU 0.69s/0.28u sec elapsed 40.26 sec.
INFO: scanned index "i" to remove 2795935 row versions
DETAIL: CPU 9.55s/119.02u sec elapsed 226.85 sec.
INFO: scanned index "t" to remove 2795935 row versions
DETAIL: CPU 13.09s/102.84u sec elapsed 194.74 sec.
INFO: "phaseangle": removed 2795935 row versions in 25097 pages
DETAIL: CPU 0.67s/0.28u sec elapsed 41.21 sec.
still running...
These are the very same problems?
Should I delete mor frequently in smaller chunks? It seems to have a
limit...
Thanks
Peter
--
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
--
Ludwik Dyląg
On Thu, 2009-10-29 at 15:44 +0100, Peter Meszaros wrote: > Hi All, > > I use postgresql 8.3.7 as a huge queue. There is a very simple table > with six columns and two indices, and about 6 million records are > written into it in every day continously commited every 10 seconds from > 8 clients. The table stores approximately 120 million records, because a > cron job daily deletes those ones are older than 20 day. Autovacuum is > on and every settings is the factory default except some unrelated ones > (listen address, authorization). But my database is growing, > characteristically ~600MByte/day, but sometimes much slower (eg. 10MB, > or even 0!!!). Do you ever "vacuum full" to reclaim empty record space? -- P.J. "Josh" Rovero Vice President Sonalysts, Inc. Email: rovero@sonalysts.com www.sonalysts.com 215 Parkway North Work: (860)326-3671 Waterford, CT 06385
On Thu, 29 Oct 2009, Josh Rovero wrote: > Do you ever "vacuum full" to reclaim empty record space? Unless you expect the size of the database to permanently decrease by a significant amount, that is a waste of time, and may cause bloat in indexes. In this case, since the space will be used again fairly soon, it is better to just VACUUM, or autovacuum. Just make sure the free space map can cope with it. Matthew -- import oz.wizards.Magic; if (Magic.guessRight())... -- Computer Science Lecturer
2009/10/29 Peter Meszaros <pme@prolan.hu>
Hi All,
I use postgresql 8.3.7 as a huge queue. There is a very simple table
with six columns and two indices, and about 6 million records are
written into it in every day continously commited every 10 seconds from
8 clients. The table stores approximately 120 million records, because a
cron job daily deletes those ones are older than 20 day. Autovacuum is
on and every settings is the factory default except some unrelated ones
(listen address, authorization). But my database is growing,
characteristically ~600MByte/day, but sometimes much slower (eg. 10MB,
or even 0!!!).
I've also tried a test on another server running the same postgresql,
where 300 million record was loaded into a freshly created database,
and 25 million was deleted with single DELETE command. The 'vacuum
verbose phaseangle;' command seems to be running forever for hours:
Try increasing max_fsm_pages and shared_buffers
These changes did speed up vacuum full on my database.
With shared_buffers remember to increase max shm in your OS.
Ludwik
phasor=# vacuum VERBOSE phaseangle;
INFO: vacuuming "public.phaseangle"
INFO: scanned index "i" to remove 2796006 row versions
DETAIL: CPU 9.49s/120.30u sec elapsed 224.20 sec.
INFO: scanned index "t" to remove 2796006 row versions
DETAIL: CPU 13.57s/105.70u sec elapsed 192.71 sec.
INFO: "phaseangle": removed 2796006 row versions in 24748 pages
DETAIL: CPU 0.65s/0.30u sec elapsed 39.97 sec.
INFO: scanned index "i" to remove 2795924 row versions
DETAIL: CPU 9.58s/121.63u sec elapsed 239.06 sec.
INFO: scanned index "t" to remove 2795924 row versions
DETAIL: CPU 13.10s/103.59u sec elapsed 190.84 sec.
INFO: "phaseangle": removed 2795924 row versions in 24743 pages
DETAIL: CPU 0.68s/0.28u sec elapsed 40.21 sec.
INFO: scanned index "i" to remove 2796014 row versions
DETAIL: CPU 9.65s/117.28u sec elapsed 231.92 sec.
INFO: scanned index "t" to remove 2796014 row versions
DETAIL: CPU 13.48s/103.59u sec elapsed 194.49 sec.
INFO: "phaseangle": removed 2796014 row versions in 24774 pages
DETAIL: CPU 0.69s/0.28u sec elapsed 40.26 sec.
INFO: scanned index "i" to remove 2795935 row versions
DETAIL: CPU 9.55s/119.02u sec elapsed 226.85 sec.
INFO: scanned index "t" to remove 2795935 row versions
DETAIL: CPU 13.09s/102.84u sec elapsed 194.74 sec.
INFO: "phaseangle": removed 2795935 row versions in 25097 pages
DETAIL: CPU 0.67s/0.28u sec elapsed 41.21 sec.
still running...
These are the very same problems?
Should I delete mor frequently in smaller chunks? It seems to have a
limit...
Thanks
Peter
--
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
--
Ludwik Dyląg
Hi Peter, Sounds like you're experiencing index bloat and vacuums do nothing to help that. You can do one of 2 thing to remedy this: 1) The fastest and simplest (but most disruptive) way is to use REINDEX. But this will exclusively lock the table while rebuilding the indexes: REINDEX TABLE phaseangle; 2) The slower but less disruptive way is to do a concurrent build of each index and then drop the old ones. For example, to rebuild the "i" index: CREATE INDEX CONCURRENTLY i_new ON phaseangle (<indexed columns>); DROP INDEX i; ALTER INDEX i_new RENAME TO i; ANALYZE phaseangle (<indexed columns>); Do this regularly to keep the index sizes in check. - Chris Peter Meszaros wrote: > Hi All, > > I use postgresql 8.3.7 as a huge queue. There is a very simple table > with six columns and two indices, and about 6 million records are > written into it in every day continously commited every 10 seconds from > 8 clients. The table stores approximately 120 million records, because a > cron job daily deletes those ones are older than 20 day. Autovacuum is > on and every settings is the factory default except some unrelated ones > (listen address, authorization). But my database is growing, > characteristically ~600MByte/day, but sometimes much slower (eg. 10MB, > or even 0!!!). > > I've also tried a test on another server running the same postgresql, > where 300 million record was loaded into a freshly created database, > and 25 million was deleted with single DELETE command. The 'vacuum > verbose phaseangle;' command seems to be running forever for hours: > > phasor=# vacuum VERBOSE phaseangle; > INFO: vacuuming "public.phaseangle" > INFO: scanned index "i" to remove 2796006 row versions > DETAIL: CPU 9.49s/120.30u sec elapsed 224.20 sec. > INFO: scanned index "t" to remove 2796006 row versions > DETAIL: CPU 13.57s/105.70u sec elapsed 192.71 sec. > INFO: "phaseangle": removed 2796006 row versions in 24748 pages > DETAIL: CPU 0.65s/0.30u sec elapsed 39.97 sec. > INFO: scanned index "i" to remove 2795924 row versions > DETAIL: CPU 9.58s/121.63u sec elapsed 239.06 sec. > INFO: scanned index "t" to remove 2795924 row versions > DETAIL: CPU 13.10s/103.59u sec elapsed 190.84 sec. > INFO: "phaseangle": removed 2795924 row versions in 24743 pages > DETAIL: CPU 0.68s/0.28u sec elapsed 40.21 sec. > INFO: scanned index "i" to remove 2796014 row versions > DETAIL: CPU 9.65s/117.28u sec elapsed 231.92 sec. > INFO: scanned index "t" to remove 2796014 row versions > DETAIL: CPU 13.48s/103.59u sec elapsed 194.49 sec. > INFO: "phaseangle": removed 2796014 row versions in 24774 pages > DETAIL: CPU 0.69s/0.28u sec elapsed 40.26 sec. > INFO: scanned index "i" to remove 2795935 row versions > DETAIL: CPU 9.55s/119.02u sec elapsed 226.85 sec. > INFO: scanned index "t" to remove 2795935 row versions > DETAIL: CPU 13.09s/102.84u sec elapsed 194.74 sec. > INFO: "phaseangle": removed 2795935 row versions in 25097 pages > DETAIL: CPU 0.67s/0.28u sec elapsed 41.21 sec. > > still running... > > These are the very same problems? > Should I delete mor frequently in smaller chunks? It seems to have a > limit... > > Thanks > > Peter >
On Thu, 2009-10-29 at 17:00 +0100, Ludwik Dylag wrote: > 2009/10/29 Peter Meszaros <pme@prolan.hu> > Hi All, > > I use postgresql 8.3.7 as a huge queue. There is a very simple > table > with six columns and two indices, and about 6 million records > are > written into it in every day continously commited every 10 > seconds from > 8 clients. The table stores approximately 120 million records, > because a > cron job daily deletes those ones are older than 20 day. > Autovacuum is > on and every settings is the factory default except some > unrelated ones > (listen address, authorization). But my database is growing, > characteristically ~600MByte/day, but sometimes much slower > (eg. 10MB, > or even 0!!!). > > I've also tried a test on another server running the same > postgresql, > where 300 million record was loaded into a freshly created > database, > and 25 million was deleted with single DELETE command. The > 'vacuum > verbose phaseangle;' command seems to be running forever for > hours: > > > Try increasing max_fsm_pages and shared_buffers > These changes did speed up vacuum full on my database. > With shared_buffers remember to increase max shm in your OS. If you overran your max_fsm_pages you are going to have indexes that are not properly cleaned up, even after a vacuum full. You will need to cluster or reindex. Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering If the world pushes look it in the eye and GRR. Then push back harder. - Salamander
Peter Meszaros wrote: > Hi All, > > I use postgresql 8.3.7 as a huge queue. There is a very simple table > with six columns and two indices, and about 6 million records are > written into it in every day continously commited every 10 seconds from > 8 clients. The table stores approximately 120 million records, because a > cron job daily deletes those ones are older than 20 day. Autovacuum is > on and every settings is the factory default except some unrelated ones > (listen address, authorization). But my database is growing, > characteristically ~600MByte/day, but sometimes much slower (eg. 10MB, > or even 0!!!)... Can you try running against 8.4.1? I believe there are a number of improvements that should help in your case. For one thing, the max_fsm_pages and max_fsm_relation "knobs" are gone - it happens automagically. I believe there are some substantial improvements in space reuse along with numerous improvements not directly related to your question. Cheers, Steve
Peter Meszaros wrote: > Hi All, > > I use postgresql 8.3.7 as a huge queue. There is a very simple table > with six columns and two indices, and about 6 million records are > written into it in every day continously commited every 10 seconds from > 8 clients. The table stores approximately 120 million records, because a > cron job daily deletes those ones are older than 20 day. You may be an ideal candidate for table partitioning - this is frequently used for rotating log table maintenance. Use a parent table and 20 child tables. Create a new child every day and drop the 20-day-old table. Table drops are far faster and lower-impact than delete-from a 120-million row table. Index-bloat is limited to one-day of inserts and will be eliminated in 20-days. No deletes means no vacuum requirement on the affected tables. Single tables are limited to about 6-million records. A clever backup scheme can ignore prior-days' static child-tables (and you could keep historical-data-dumps off-line for later use if desired). Read up on it here: http://www.postgresql.org/docs/8.4/interactive/ddl-partitioning.html Cheers, Steve
On Thu, Oct 29, 2009 at 8:44 AM, Peter Meszaros <pme@prolan.hu> wrote: > Hi All, > > I use postgresql 8.3.7 as a huge queue. There is a very simple table > with six columns and two indices, and about 6 million records are > written into it in every day continously commited every 10 seconds from > 8 clients. The table stores approximately 120 million records, because a > cron job daily deletes those ones are older than 20 day. Autovacuum is > on and every settings is the factory default except some unrelated ones > (listen address, authorization). But my database is growing, > characteristically ~600MByte/day, but sometimes much slower (eg. 10MB, > or even 0!!!). Sounds like you're blowing out your free space map. Things to try: 1: delete your rows in smaller batches. Like every hour delete everything over 20 days so you don't delete them all at once one time a day. 2: crank up max fsm pages large enough to hold all the dead tuples. 3: lower the autovacuum cost delay 4: get faster hard drives so that vacuum can keep up without causing your system to slow to a crawl while vacuum is running.
On Thu, 2009-10-29 at 17:00 +0100, Ludwik Dylag wrote: > 2009/10/29 Peter Meszaros <pme@prolan.hu> > Hi All, > > I use postgresql 8.3.7 as a huge queue. There is a very simple > table > with six columns and two indices, and about 6 million records > are > written into it in every day continously commited every 10 > seconds from > 8 clients. The table stores approximately 120 million records, > because a > cron job daily deletes those ones are older than 20 day. > Autovacuum is > on and every settings is the factory default except some > unrelated ones > (listen address, authorization). But my database is growing, > characteristically ~600MByte/day, but sometimes much slower > (eg. 10MB, > or even 0!!!). > > I've also tried a test on another server running the same > postgresql, > where 300 million record was loaded into a freshly created > database, > and 25 million was deleted with single DELETE command. The > 'vacuum > verbose phaseangle;' command seems to be running forever for > hours: > > > Try increasing max_fsm_pages and shared_buffers > These changes did speed up vacuum full on my database. > With shared_buffers remember to increase max shm in your OS. If you overran your max_fsm_pages you are going to have indexes that are not properly cleaned up, even after a vacuum full. You will need to cluster or reindex. Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 503.667.4564 Consulting, Training, Support, Custom Development, Engineering If the world pushes look it in the eye and GRR. Then push back harder. - Salamander
On Thu, Oct 29, 2009 at 11:40 AM, Steve Crawford <scrawford@pinpointresearch.com> wrote: > Peter Meszaros wrote: >> >> Hi All, >> >> I use postgresql 8.3.7 as a huge queue. There is a very simple table >> with six columns and two indices, and about 6 million records are >> written into it in every day continously commited every 10 seconds from >> 8 clients. The table stores approximately 120 million records, because a >> cron job daily deletes those ones are older than 20 day. > > You may be an ideal candidate for table partitioning - this is frequently > used for rotating log table maintenance. > > Use a parent table and 20 child tables. Create a new child every day and > drop the 20-day-old table. Table drops are far faster and lower-impact than > delete-from a 120-million row table. Index-bloat is limited to one-day of > inserts and will be eliminated in 20-days. No deletes means no vacuum > requirement on the affected tables. Single tables are limited to about > 6-million records. A clever backup scheme can ignore prior-days' static > child-tables (and you could keep historical-data-dumps off-line for later > use if desired). > > Read up on it here: > http://www.postgresql.org/docs/8.4/interactive/ddl-partitioning.html From a performance point of view, this is going to be the best option. It might push some complexity though into his queries to invoke constraint exclusion or deal directly with the child partitions. merlin
On 10/30/2009 12:43 PM, Merlin Moncure wrote: > On Thu, Oct 29, 2009 at 11:40 AM, Steve Crawford > <scrawford@pinpointresearch.com> wrote: >> Use a parent table and 20 child tables. Create a new child every day and >> drop the 20-day-old table. Table drops are far faster and lower-impact than >> delete-from a 120-million row table. Index-bloat is limited to one-day of >> inserts and will be eliminated in 20-days. [...] >> Read up on it here: >> http://www.postgresql.org/docs/8.4/interactive/ddl-partitioning.html > > From a performance point of view, this is going to be the best option. > It might push some complexity though into his queries to invoke > constraint exclusion or deal directly with the child partitions. Seeking to understand.... is the use of partitions and constraint-exclusion pretty much a hack to get around poor performance, which really ought to be done invisibly and automatically by a DBMS? Much as indexes per se are, in the SQL/Codd worldview? Or, is there more to it? I appreciate the "Simple Matter Of Programming" problem. Thanks, Jeremy
Any relational database worth its salt has partitioning for a reason. 1. Maintenance. You will need to delete data at some point.(cleanup)...Partitions are the only way to do it effectively. 2. Performance. Partitioning offer a way to query smaller slices of data automatically (i.e the query optimizer will choose the partition for you) ...very large tables are a no-no in any relational database.(sheer size has limitations) On Fri, Oct 30, 2009 at 11:57 AM, Jeremy Harris <jgh@wizmail.org> wrote: > On 10/30/2009 12:43 PM, Merlin Moncure wrote: >> >> On Thu, Oct 29, 2009 at 11:40 AM, Steve Crawford >> <scrawford@pinpointresearch.com> wrote: >>> >>> Use a parent table and 20 child tables. Create a new child every day and >>> drop the 20-day-old table. Table drops are far faster and lower-impact >>> than >>> delete-from a 120-million row table. Index-bloat is limited to one-day of >>> inserts and will be eliminated in 20-days. > > [...] >>> >>> Read up on it here: >>> http://www.postgresql.org/docs/8.4/interactive/ddl-partitioning.html >> >> From a performance point of view, this is going to be the best option. >> It might push some complexity though into his queries to invoke >> constraint exclusion or deal directly with the child partitions. > > Seeking to understand.... is the use of partitions and constraint-exclusion > pretty much a hack to get around poor performance, which really ought > to be done invisibly and automatically by a DBMS? > > Much as indexes per se are, in the SQL/Codd worldview? > > Or, is there more to it? > > > I appreciate the "Simple Matter Of Programming" problem. > > Thanks, > Jeremy > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On Fri, Oct 30, 2009 at 12:53 PM, Anj Adu <fotographs@gmail.com> wrote: > Any relational database worth its salt has partitioning for a reason. > > 1. Maintenance. You will need to delete data at some > point.(cleanup)...Partitions are the only way to do it effectively. This is true and it's unavoidably a manual process. The database will not know what segments of the data you intend to load and unload en masse. > 2. Performance. Partitioning offer a way to query smaller slices of > data automatically (i.e the query optimizer will choose the partition > for you) ...very large tables are a no-no in any relational > database.(sheer size has limitations) This I dispute. Databases are designed to be scalable and very large tables should perform just as well as smaller tables. Where partitions win for performance is when you know something about how your data is accessed and you can optimize the access by partitioning along the same keys. For example if you're doing a sequential scan of just one partition or doing a merge join of two equivalently partitioned tables and the partitions can be sorted in memory. However in these cases it is possible the database will become more intelligent and be able to achieve the same performance gains automatically. Bitmap index scans should perform comparably to the sequential scan of individual partitions for example. -- greg
Database are designed to handle very large tables..but effectiveness is always at question. A full table scan on a partitioned table is always preferable to a FTS on a super large table. The nature of the query will of-course dictate performance..but you run into definite limitations with very large tables. On Fri, Oct 30, 2009 at 1:01 PM, Greg Stark <gsstark@mit.edu> wrote: > On Fri, Oct 30, 2009 at 12:53 PM, Anj Adu <fotographs@gmail.com> wrote: >> Any relational database worth its salt has partitioning for a reason. >> >> 1. Maintenance. You will need to delete data at some >> point.(cleanup)...Partitions are the only way to do it effectively. > > This is true and it's unavoidably a manual process. The database will > not know what segments of the data you intend to load and unload en > masse. > >> 2. Performance. Partitioning offer a way to query smaller slices of >> data automatically (i.e the query optimizer will choose the partition >> for you) ...very large tables are a no-no in any relational >> database.(sheer size has limitations) > > This I dispute. Databases are designed to be scalable and very large > tables should perform just as well as smaller tables. > > Where partitions win for performance is when you know something about > how your data is accessed and you can optimize the access by > partitioning along the same keys. For example if you're doing a > sequential scan of just one partition or doing a merge join of two > equivalently partitioned tables and the partitions can be sorted in > memory. > > However in these cases it is possible the database will become more > intelligent and be able to achieve the same performance gains > automatically. Bitmap index scans should perform comparably to the > sequential scan of individual partitions for example. > > -- > greg >
On 10/30/2009 08:01 PM, Greg Stark wrote: > On Fri, Oct 30, 2009 at 12:53 PM, Anj Adu<fotographs@gmail.com> wrote: >> Any relational database worth its salt has partitioning for a reason. >> >> 1. Maintenance. You will need to delete data at some >> point.(cleanup)...Partitions are the only way to do it effectively. > > This is true and it's unavoidably a manual process. The database will > not know what segments of the data you intend to load and unload en > masse. > >> 2. Performance. Partitioning offer a way to query smaller slices of >> data automatically (i.e the query optimizer will choose the partition >> for you) ...very large tables are a no-no in any relational >> database.(sheer size has limitations) > > This I dispute. Databases are designed to be scalable and very large > tables should perform just as well as smaller tables. > > Where partitions win for performance is when you know something about > how your data is accessed and you can optimize the access by > partitioning along the same keys. For example if you're doing a > sequential scan of just one partition or doing a merge join of two > equivalently partitioned tables and the partitions can be sorted in > memory. > > However in these cases it is possible the database will become more > intelligent and be able to achieve the same performance gains > automatically. Bitmap index scans should perform comparably to the > sequential scan of individual partitions for example. > So, on the becoming more intelligent front: PostgreSQL already does some operations as background maintenance (autovacuum). Extending this to de-bloat indices does not seem conceptually impossible, nor for the collection of table-data statistics for planner guidance (also, why could a full-table-scan not collect stats as a side-effect?). Further out, how about the gathering of statistics on queries to guide the automatic creation of indices? Or to set up a partitioning scheme on a previously monolithic table? - Jeremy
On Fri, Oct 30, 2009 at 1:18 PM, Jeremy Harris <jgh@wizmail.org> wrote: > So, on the becoming more intelligent front: PostgreSQL already does > some operations as background maintenance (autovacuum). Extending > this to de-bloat indices does not seem conceptually impossible It could be done but it's not easy because there will be people concurrently scanning the index. Vacuum is limited to operations it can do without blocking other jobs. >, nor for the collection of table-data statistics for planner guidance Well autovacuum already does this. > (also, why > could a full-table-scan not collect stats as a side-effect?). That's a good idea but there are difficulties with it. The full table scan might not run to completion meaning you may have done a lot of work for nothing. Also gathering and processing that data is fairly expensive, especially for higher stats targets. It requires sorting the data by each column which takes some cpu time which we wouldn't want to make sql queries wait for. > Further out, how about the gathering of statistics on queries to guide the automatic > creation of indices? I think we do need more run-time stats. How to make use of them would be a complex question. We could automatically tune the cost parameters, we could automatically try other plans and see if they run faster, we could even automatically build indexes. Not all of these would be appropriate in every environment though. > Or to set up a partitioning scheme on a previously monolithic table? Well that involves moving data that other users might be busy accessing. Again we wouldn't want an automatic background job blocking user queries. -- greg
Thank you all for the fast responses! I changed the delete's schedule from daily to hourly and I will let you know the result. This seems to be the most promising step. The next one is tuning 'max_fsm_pages'. Increasing max_fsm_pages can be also helpful, but I've read that 'vacuum verbose ...' will issue warnings if max_fsm_pages is too small. I've never seen such messag, this command is either run and finish or goes to an endless loop as it was written in my initial e-mail. On Thu, Oct 29, 2009 at 10:59:48AM -0600, Scott Marlowe wrote: > On Thu, Oct 29, 2009 at 8:44 AM, Peter Meszaros <pme@prolan.hu> wrote: > > Hi All, > > > > I use postgresql 8.3.7 as a huge queue. There is a very simple table > > with six columns and two indices, and about 6 million records are > > written into it in every day continously commited every 10 seconds from > > 8 clients. The table stores approximately 120 million records, because a > > cron job daily deletes those ones are older than 20 day. Autovacuum is > > on and every settings is the factory default except some unrelated ones > > (listen address, authorization). But my database is growing, > > characteristically ~600MByte/day, but sometimes much slower (eg. 10MB, > > or even 0!!!). > > Sounds like you're blowing out your free space map. Things to try: > > 1: delete your rows in smaller batches. Like every hour delete > everything over 20 days so you don't delete them all at once one time > a day. > 2: crank up max fsm pages large enough to hold all the dead tuples. > 3: lower the autovacuum cost delay > 4: get faster hard drives so that vacuum can keep up without causing > your system to slow to a crawl while vacuum is running. -- E-mail: pmeATprolanDOThu Phone: +36-20-954-3100/8139 Mobile: +36-20-9543139 Fax: +36-26-540420 http://www.prolan.hu Mon Nov 2 13:20:39 CET 2009
I would recommend (if at all possible) to partition the table and drop the old partitions when not needed. This will guarantee the space free-up without VACUUM overhead. Deletes will kill you at some point and you dont want too much of the VACUUM IO overhead impacting your performance. On Mon, Nov 2, 2009 at 4:50 AM, Peter Meszaros <pme@prolan.hu> wrote: > Thank you all for the fast responses! > > I changed the delete's schedule from daily to hourly and I will let you > know the result. This seems to be the most promising step. > > The next one is tuning 'max_fsm_pages'. > Increasing max_fsm_pages can be also helpful, but I've read that > 'vacuum verbose ...' will issue warnings if max_fsm_pages is too small. > I've never seen such messag, this command is either run and finish or > goes to an endless loop as it was written in my initial e-mail. > > > On Thu, Oct 29, 2009 at 10:59:48AM -0600, Scott Marlowe wrote: >> On Thu, Oct 29, 2009 at 8:44 AM, Peter Meszaros <pme@prolan.hu> wrote: >> > Hi All, >> > >> > I use postgresql 8.3.7 as a huge queue. There is a very simple table >> > with six columns and two indices, and about 6 million records are >> > written into it in every day continously commited every 10 seconds from >> > 8 clients. The table stores approximately 120 million records, because a >> > cron job daily deletes those ones are older than 20 day. Autovacuum is >> > on and every settings is the factory default except some unrelated ones >> > (listen address, authorization). But my database is growing, >> > characteristically ~600MByte/day, but sometimes much slower (eg. 10MB, >> > or even 0!!!). >> >> Sounds like you're blowing out your free space map. Things to try: >> >> 1: delete your rows in smaller batches. Like every hour delete >> everything over 20 days so you don't delete them all at once one time >> a day. >> 2: crank up max fsm pages large enough to hold all the dead tuples. >> 3: lower the autovacuum cost delay >> 4: get faster hard drives so that vacuum can keep up without causing >> your system to slow to a crawl while vacuum is running. > > -- > E-mail: pmeATprolanDOThu > Phone: +36-20-954-3100/8139 > Mobile: +36-20-9543139 > Fax: +36-26-540420 > http://www.prolan.hu > Mon Nov 2 13:20:39 CET 2009 > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
On Mon, Nov 2, 2009 at 7:50 AM, Peter Meszaros <pme@prolan.hu> wrote: > Increasing max_fsm_pages can be also helpful, but I've read that > 'vacuum verbose ...' will issue warnings if max_fsm_pages is too small. > I've never seen such messag, this command is either run and finish or > goes to an endless loop as it was written in my initial e-mail. I don't think it goes into an endless loop. I think it runs for a really long time because your database is bloated, in need of vacuuming, and probably has blown out the free space map. But since you haven't let it run to completion you haven't seem the message. ...Robert