Обсуждение: autovacuum on a -mostly- r/o table
Hello, we are running a 7.3 postgres db with only a big table (avg 500.000records) and 7 indexes for a search engine. we have 2 of this databases and we can switch from one to another. Last week we decided to give a try to 8.1 on one of them and everything went fine, db is faster (about 2 or 3 times in our case) and the server load is higher - which should mean that faster response time is achieved by taking a better use of the server. We also activated the autovacuum feature to give it a try and that's were our problems started. I left the standard autovacuum configuration just to wait and see, pg decided to start a vacuum on the table just midday when users were launching search queries on the table and server load reached a very high value so that in a couple of minutes the db was unusable With pg7.3 we use to vacuum the db night time, mostly because the insert and updates in this table is made in a batch way: a single task that puts 100.000 records in the db in 10/20minutes, so the best time to actually vacuum the db would be after this batch. I have read that autovacuum cannot check to see pg load before launching vacuum but is there any patch about it? that would sort out the problem in a good and simple way. Otherwise, which kind of set of parameters I should put in autovacuum configuration? I am stuck because in our case the table gets mostly read and if I set up things as to vacuum the table after a specific amount of insert/updates, I cannot foresee whether this could happen during daytime when server is under high load. How can I configure the vacuum to run after the daily batch insert/update? Any help appreciated Thank you very much Edoardo
On Wednesday 27 September 2006 09:08, Edoardo Ceccarelli <eddy@axa.it> wrote: > > How can I configure the vacuum to run after the daily batch > insert/update? > If you really only want it to run then, you should disable autovacuum and continue to run the vacuum manually. You might also investigate the vacuum cost delay options, which will make vacuum take longer but will have less of an impact on your database while running. -- "If a nation values anything more than freedom, it will lose its freedom; and the irony of it is that if it is comfort or money that it values more, it will lose that too." -- Somerset Maugham, Author
[Edoardo Ceccarelli - Wed at 06:08:30PM +0200] > We also activated the autovacuum feature to give it a try and that's > were our problems started. (...) > How can I configure the vacuum to run after the daily batch insert/update? I think you shouldn't use autovacuum in your case. We haven't dared testing out autovacuum yet even though we probably should, so we're running vacuum at fixed times of the day. We have a very simple script to do this, the most important part of it reads: echo "vacuum verbose analyze;" | psql $DB_NAME > $logdir/$filename 2>&1
On Wed, 2006-09-27 at 18:08, Edoardo Ceccarelli wrote: > How can I configure the vacuum to run after the daily batch insert/update? Check out this: http://www.postgresql.org/docs/8.1/static/catalog-pg-autovacuum.html By inserting the right row you can disable autovacuum to vacuum your big tables, and then you can schedule vacuum nightly for those just as before. There's still a benefit in that you don't need to care about vacuuming the rest of the tables, which will be done just in time. Cheers, Csaba.
In response to Edoardo Ceccarelli <eddy@axa.it>: > Hello, > > we are running a 7.3 postgres db with only a big table (avg > 500.000records) and 7 indexes for a search engine. > we have 2 of this databases and we can switch from one to another. > Last week we decided to give a try to 8.1 on one of them and everything > went fine, db is faster (about 2 or 3 times in our case) and the server > load is higher - which should mean that faster response time is achieved > by taking a better use of the server. > > We also activated the autovacuum feature to give it a try and that's > were our problems started. > I left the standard autovacuum configuration just to wait and see, pg > decided to start a vacuum on the table just midday when users were > launching search queries on the table and server load reached a very > high value so that in a couple of minutes the db was unusable > > With pg7.3 we use to vacuum the db night time, mostly because the insert > and updates in this table is made in a batch way: a single task that > puts 100.000 records in the db in 10/20minutes, so the best time to > actually vacuum the db would be after this batch. > > I have read that autovacuum cannot check to see pg load before launching > vacuum but is there any patch about it? that would sort out the problem > in a good and simple way. > Otherwise, which kind of set of parameters I should put in autovacuum > configuration? I am stuck because in our case the table gets mostly read > and if I set up things as to vacuum the table after a specific amount of > insert/updates, I cannot foresee whether this could happen during > daytime when server is under high load. > How can I configure the vacuum to run after the daily batch insert/update? It doesn't sound as if your setup is a good match for autovacuum. You might be better off going back to the cron vacuums. That's the beauty of Postgres -- it gives you the choice. If you want to continue with autovac, you may want to experiment with vacuum_cost_delay and associated parameters, which can lessen the impact of vacuuming. -- Bill Moran Collaborative Fusion Inc.
Bill Moran wrote:
The db is constantly monitored during high peak so that we can switch to a backup pg7.3 database that is being vacuumed every night.In response to Edoardo Ceccarelli <eddy@axa.it>:I have read that autovacuum cannot check to see pg load before launching vacuum but is there any patch about it? that would sort out the problem in a good and simple way. Otherwise, which kind of set of parameters I should put in autovacuum configuration? I am stuck because in our case the table gets mostly read and if I set up things as to vacuum the table after a specific amount of insert/updates, I cannot foresee whether this could happen during daytime when server is under high load. How can I configure the vacuum to run after the daily batch insert/update?It doesn't sound as if your setup is a good match for autovacuum. You might be better off going back to the cron vacuums. That's the beauty of Postgres -- it gives you the choice. If you want to continue with autovac, you may want to experiment with vacuum_cost_delay and associated parameters, which can lessen the impact of vacuuming.
This is giving me the opportunity to try it so I tried this:
vacuum_cost_delay = 200
vacuum_cost_page_hit = 5
vacuum_cost_page_miss = 10
vacuum_cost_page_dirty = 20
vacuum_cost_limit = 100
I know these values affect the normal vacuum process but apparently this means setting
#autovacuum_vacuum_cost_delay = -1 # default vacuum cost delay for
# autovac, -1 means use
# vacuum_cost_delay
and
#autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for
# autovac, -1 means use
# vacuum_cost_limit
for the rest of them I am currently trying the deafults:
#autovacuum_naptime = 60 # time between autovacuum runs, in secs
#autovacuum_vacuum_threshold = 1000 # min # of tuple updates before vacuum
#autovacuum_analyze_threshold = 500 # min # of tuple updates before analyze
#autovacuum_vacuum_scale_factor = 0.4 # fraction of rel size before vacuum
#autovacuum_analyze_scale_factor = 0.2 # fraction of rel size before analyze
Does anybody know which process is actually AUTO-vacuum-ing the db?
So that I can check when is running...
Rod Taylor wrote:
but autovacuum could check load -before- and -during- it's execution and it could adjust himself automatically to perform more or less aggressively depending on the difference between those two values.
Maybe with a parameter like: maximum-autovacuum-load=0.2
that would mean: "never load the machine more than 20% for the autovacuum"
...another thing is, how could autovacuum check for machine load, this is something I cannot imagine right now...
True,On Wed, 2006-09-27 at 18:08 +0200, Edoardo Ceccarelli wrote:I have read that autovacuum cannot check to see pg load before launching vacuum but is there any patch about it? that would sort out the problem in a good and simple way.In some cases the solution to high load is to vacuum the tables being hit the heaviest -- meaning that simply checking machine load isn't enough to make that decision. In fact, that high load problem is exactly why autovacuum was created in the first place.
but autovacuum could check load -before- and -during- it's execution and it could adjust himself automatically to perform more or less aggressively depending on the difference between those two values.
Maybe with a parameter like: maximum-autovacuum-load=0.2
that would mean: "never load the machine more than 20% for the autovacuum"
...another thing is, how could autovacuum check for machine load, this is something I cannot imagine right now...
[Edoardo Ceccarelli - Wed at 06:49:23PM +0200] > ...another thing is, how could autovacuum check for machine load, this > is something I cannot imagine right now... One solution I made for our application, is to check the pg_stats_activity view. It requires some config to get the stats available in that view, though. When the application is to start a low-priority transaction, it will first do: select count(*) from pg_stat_activity where current_query not like '<IDL%' and query_start+?<now(); if the returned value is high, the application will sleep a bit and try again later.
On Wed, 2006-09-27 at 18:08 +0200, Edoardo Ceccarelli wrote: > > I have read that autovacuum cannot check to see pg load before > launching > vacuum but is there any patch about it? that would sort out the > problem > in a good and simple way. In some cases the solution to high load is to vacuum the tables being hit the heaviest -- meaning that simply checking machine load isn't enough to make that decision. In fact, that high load problem is exactly why autovacuum was created in the first place. --
Csaba Nagy wrote: > On Wed, 2006-09-27 at 18:08, Edoardo Ceccarelli wrote: > >> How can I configure the vacuum to run after the daily batch insert/update? >> > > Check out this: > http://www.postgresql.org/docs/8.1/static/catalog-pg-autovacuum.html > > By inserting the right row you can disable autovacuum to vacuum your big > tables, and then you can schedule vacuum nightly for those just as > before. There's still a benefit in that you don't need to care about > vacuuming the rest of the tables, which will be done just in time. In addition autovacuum respects the work of manual or cron based vacuums, so if you issue a vacuum right after a daily batch insert / update, autovacuum won't repeat the work of that manual vacuum.
In response to Edoardo Ceccarelli <eddy@axa.it>: > Rod Taylor wrote: > > On Wed, 2006-09-27 at 18:08 +0200, Edoardo Ceccarelli wrote: > > > >> I have read that autovacuum cannot check to see pg load before > >> launching > >> vacuum but is there any patch about it? that would sort out the > >> problem > >> in a good and simple way. > >> > > > > In some cases the solution to high load is to vacuum the tables being > > hit the heaviest -- meaning that simply checking machine load isn't > > enough to make that decision. > > > > In fact, that high load problem is exactly why autovacuum was created in > > the first place. > > > True, > but autovacuum could check load -before- and -during- it's execution and > it could adjust himself automatically to perform more or less > aggressively depending on the difference between those two values. > Maybe with a parameter like: maximum-autovacuum-load=0.2 > that would mean: "never load the machine more than 20% for the autovacuum" This is pretty non-trivial. How do you define 20% load? 20% of the CPU? Does that mean that it's OK for autovac to use 3% cpu and 100% of your IO? Ok, so we need to calculate an average of IO and CPU -- which disks? If your WAL logs are on one disk, and you've used tablespaces to spread the rest of your DB across different partitions, it can be pretty difficult to determine which IO parameters you want to take into consideration. -- Bill Moran Collaborative Fusion Inc. **************************************************************** IMPORTANT: This message contains confidential information and is intended only for the individual named. If the reader of this message is not an intended recipient (or the individual responsible for the delivery of this message to an intended recipient), please be advised that any re-use, dissemination, distribution or copying of this message is prohibited. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. ****************************************************************
As I said before, it could be done, the main requirement is to find a way for pg to check for a value of the system load; of course it has to be an average value between disk and cpu, of course the daemon would have to collect sample of this values continuously, and of course everything would be better if the server it's only running PostgresTrue, but autovacuum could check load -before- and -during- it's execution and it could adjust himself automatically to perform more or less aggressively depending on the difference between those two values. Maybe with a parameter like: maximum-autovacuum-load=0.2 that would mean: "never load the machine more than 20% for the autovacuum"This is pretty non-trivial. How do you define 20% load? 20% of the CPU? Does that mean that it's OK for autovac to use 3% cpu and 100% of your IO? Ok, so we need to calculate an average of IO and CPU -- which disks? If your WAL logs are on one disk, and you've used tablespaces to spread the rest of your DB across different partitions, it can be pretty difficult to determine which IO parameters you want to take into consideration.
Still I think you are right, it wouldn't suit exactly every situations but it could be an "emergency" feature:
What happened to me was very clear:
server running pg8 under heavy load, cpu's were 90% idle as usual.
At some point the vacuum started, the server reached 50 of overall load and cpu's were 1% idle
I think any test can detect such a situation, regardles if load it's more I/O based or CPU based