Обсуждение: autovacuum on a -mostly- r/o table

Поиск
Список
Период
Сортировка

autovacuum on a -mostly- r/o table

От
Edoardo Ceccarelli
Дата:
Hello,

we are running a 7.3 postgres db with only a big table (avg
500.000records) and 7 indexes for a search engine.
we have 2 of this databases and we can switch from one to another.
Last week we decided to give a try to 8.1 on one of them and everything
went fine, db is faster (about 2 or 3 times in our case) and the server
load is higher - which should mean that faster response time is achieved
by taking a better use of the server.

We also activated the autovacuum feature to give it a try and that's
were our problems started.
I left the standard autovacuum configuration just to wait and see, pg
decided to start a vacuum on the table just midday when users were
launching search queries on the table and server load reached a very
high value so that in a couple of minutes the db was unusable

With pg7.3 we use to vacuum the db night time, mostly because the insert
and updates in this table is made in a batch way: a single task that
puts 100.000 records in the db in 10/20minutes, so the best time to
actually vacuum the db would be after this batch.

I have read that autovacuum cannot check to see pg load before launching
vacuum but is there any patch about it? that would sort out the problem
in a good and simple way.
Otherwise, which kind of set of parameters I should put in autovacuum
configuration? I am stuck because in our case the table gets mostly read
and if I set up things as to vacuum the table after a specific amount of
insert/updates, I cannot foresee whether this could happen during
daytime when server is under high load.
How can I configure the vacuum to run after the daily batch insert/update?

Any help appreciated
Thank you very much
Edoardo



Re: autovacuum on a -mostly- r/o table

От
Alan Hodgson
Дата:
On Wednesday 27 September 2006 09:08, Edoardo Ceccarelli <eddy@axa.it>
wrote:
>
> How can I configure the vacuum to run after the daily batch
> insert/update?
>

If you really only want it to run then, you should disable autovacuum and
continue to run the vacuum manually.

You might also investigate the vacuum cost delay options, which will make
vacuum take longer but will have less of an impact on your database while
running.


--
"If a nation values anything more than freedom, it will lose its freedom;
and the irony of it is that if it is comfort or money that it values more,
it will lose that too." -- Somerset Maugham, Author


Re: autovacuum on a -mostly- r/o table

От
Tobias Brox
Дата:
[Edoardo Ceccarelli - Wed at 06:08:30PM +0200]
> We also activated the autovacuum feature to give it a try and that's
> were our problems started.
(...)
> How can I configure the vacuum to run after the daily batch insert/update?

I think you shouldn't use autovacuum in your case.

We haven't dared testing out autovacuum yet even though we probably
should, so we're running vacuum at fixed times of the day.  We have a
very simple script to do this, the most important part of it reads:

echo "vacuum verbose analyze;" | psql $DB_NAME > $logdir/$filename 2>&1


Re: autovacuum on a -mostly- r/o table

От
Csaba Nagy
Дата:
On Wed, 2006-09-27 at 18:08, Edoardo Ceccarelli wrote:
> How can I configure the vacuum to run after the daily batch insert/update?

Check out this:
http://www.postgresql.org/docs/8.1/static/catalog-pg-autovacuum.html

By inserting the right row you can disable autovacuum to vacuum your big
tables, and then you can schedule vacuum nightly for those just as
before. There's still a benefit in that you don't need to care about
vacuuming the rest of the tables, which will be done just in time.

Cheers,
Csaba.


Re: autovacuum on a -mostly- r/o table

От
Bill Moran
Дата:
In response to Edoardo Ceccarelli <eddy@axa.it>:

> Hello,
>
> we are running a 7.3 postgres db with only a big table (avg
> 500.000records) and 7 indexes for a search engine.
> we have 2 of this databases and we can switch from one to another.
> Last week we decided to give a try to 8.1 on one of them and everything
> went fine, db is faster (about 2 or 3 times in our case) and the server
> load is higher - which should mean that faster response time is achieved
> by taking a better use of the server.
>
> We also activated the autovacuum feature to give it a try and that's
> were our problems started.
> I left the standard autovacuum configuration just to wait and see, pg
> decided to start a vacuum on the table just midday when users were
> launching search queries on the table and server load reached a very
> high value so that in a couple of minutes the db was unusable
>
> With pg7.3 we use to vacuum the db night time, mostly because the insert
> and updates in this table is made in a batch way: a single task that
> puts 100.000 records in the db in 10/20minutes, so the best time to
> actually vacuum the db would be after this batch.
>
> I have read that autovacuum cannot check to see pg load before launching
> vacuum but is there any patch about it? that would sort out the problem
> in a good and simple way.
> Otherwise, which kind of set of parameters I should put in autovacuum
> configuration? I am stuck because in our case the table gets mostly read
> and if I set up things as to vacuum the table after a specific amount of
> insert/updates, I cannot foresee whether this could happen during
> daytime when server is under high load.
> How can I configure the vacuum to run after the daily batch insert/update?

It doesn't sound as if your setup is a good match for autovacuum.  You
might be better off going back to the cron vacuums.  That's the
beauty of Postgres -- it gives you the choice.

If you want to continue with autovac, you may want to experiment with
vacuum_cost_delay and associated parameters, which can lessen the
impact of vacuuming.

--
Bill Moran
Collaborative Fusion Inc.

Re: autovacuum on a -mostly- r/o table

От
Edoardo Ceccarelli
Дата:


Bill Moran wrote:
In response to Edoardo Ceccarelli <eddy@axa.it>:

I have read that autovacuum cannot check to see pg load before launching 
vacuum but is there any patch about it? that would sort out the problem 
in a good and simple way.
Otherwise, which kind of set of parameters I should put in autovacuum 
configuration? I am stuck because in our case the table gets mostly read 
and if I set up things as to vacuum the table after a specific amount of 
insert/updates, I cannot foresee whether this could happen during 
daytime when server is under high load.
How can I configure the vacuum to run after the daily batch insert/update?   
It doesn't sound as if your setup is a good match for autovacuum.  You
might be better off going back to the cron vacuums.  That's the
beauty of Postgres -- it gives you the choice.

If you want to continue with autovac, you may want to experiment with
vacuum_cost_delay and associated parameters, which can lessen the
impact of vacuuming.
 
The db is constantly monitored during high peak so that we can switch to a backup pg7.3 database that is being vacuumed every night.
This is giving me the opportunity to try it so I tried this:

vacuum_cost_delay = 200
vacuum_cost_page_hit = 5
vacuum_cost_page_miss = 10
vacuum_cost_page_dirty = 20
vacuum_cost_limit = 100

I know these values affect the normal vacuum process but apparently this means setting
#autovacuum_vacuum_cost_delay = -1      # default vacuum cost delay for
                                        # autovac, -1 means use
                                        # vacuum_cost_delay

and
#autovacuum_vacuum_cost_limit = -1      # default vacuum cost limit for
                                        # autovac, -1 means use
                                        # vacuum_cost_limit


for the rest of them I am currently trying the deafults:

#autovacuum_naptime = 60                # time between autovacuum runs, in secs
#autovacuum_vacuum_threshold = 1000     # min # of tuple updates before vacuum
#autovacuum_analyze_threshold = 500     # min # of tuple updates before analyze
#autovacuum_vacuum_scale_factor = 0.4   # fraction of rel size before vacuum
#autovacuum_analyze_scale_factor = 0.2  # fraction of rel size before analyze

Does anybody know which process is actually AUTO-vacuum-ing the db?
So that I can check when is running...




Re: autovacuum on a -mostly- r/o table

От
Edoardo Ceccarelli
Дата:
Rod Taylor wrote:
On Wed, 2006-09-27 at 18:08 +0200, Edoardo Ceccarelli wrote: 
I have read that autovacuum cannot check to see pg load before
launching 
vacuum but is there any patch about it? that would sort out the
problem 
in a good and simple way.    
In some cases the solution to high load is to vacuum the tables being
hit the heaviest -- meaning that simply checking machine load isn't
enough to make that decision.

In fact, that high load problem is exactly why autovacuum was created in
the first place. 
True,
but autovacuum could check load -before- and -during- it's execution and it could adjust himself automatically to perform more or less aggressively depending on the difference between those two values.
Maybe with a parameter like: maximum-autovacuum-load=0.2
that would mean: "never load the machine more than 20% for the autovacuum"

...another thing is, how could autovacuum check for machine load, this is something I cannot imagine right now...

Re: autovacuum on a -mostly- r/o table

От
Tobias Brox
Дата:
[Edoardo Ceccarelli - Wed at 06:49:23PM +0200]
> ...another thing is, how could autovacuum check for machine load, this
> is something I cannot imagine right now...

One solution I made for our application, is to check the
pg_stats_activity view.  It requires some config to get the stats
available in that view, though.  When the application is to start a
low-priority transaction, it will first do:

  select count(*) from pg_stat_activity where current_query not like
  '<IDL%' and query_start+?<now();

if the returned value is high, the application will sleep a bit and try
again later.


Re: autovacuum on a -mostly- r/o table

От
Rod Taylor
Дата:
On Wed, 2006-09-27 at 18:08 +0200, Edoardo Ceccarelli wrote:
>
> I have read that autovacuum cannot check to see pg load before
> launching
> vacuum but is there any patch about it? that would sort out the
> problem
> in a good and simple way.

In some cases the solution to high load is to vacuum the tables being
hit the heaviest -- meaning that simply checking machine load isn't
enough to make that decision.

In fact, that high load problem is exactly why autovacuum was created in
the first place.
--


Re: [ADMIN] autovacuum on a -mostly- r/o table

От
"Matthew T. O'Connor"
Дата:
Csaba Nagy wrote:
> On Wed, 2006-09-27 at 18:08, Edoardo Ceccarelli wrote:
>
>> How can I configure the vacuum to run after the daily batch insert/update?
>>
>
> Check out this:
> http://www.postgresql.org/docs/8.1/static/catalog-pg-autovacuum.html
>
> By inserting the right row you can disable autovacuum to vacuum your big
> tables, and then you can schedule vacuum nightly for those just as
> before. There's still a benefit in that you don't need to care about
> vacuuming the rest of the tables, which will be done just in time.

In addition autovacuum respects the work of manual or cron based
vacuums, so if you issue a vacuum right after a daily batch insert /
update, autovacuum won't repeat the work of that manual vacuum.



Re: autovacuum on a -mostly- r/o table

От
Bill Moran
Дата:
In response to Edoardo Ceccarelli <eddy@axa.it>:

> Rod Taylor wrote:
> > On Wed, 2006-09-27 at 18:08 +0200, Edoardo Ceccarelli wrote:
> >
> >> I have read that autovacuum cannot check to see pg load before
> >> launching
> >> vacuum but is there any patch about it? that would sort out the
> >> problem
> >> in a good and simple way.
> >>
> >
> > In some cases the solution to high load is to vacuum the tables being
> > hit the heaviest -- meaning that simply checking machine load isn't
> > enough to make that decision.
> >
> > In fact, that high load problem is exactly why autovacuum was created in
> > the first place.
> >
> True,
> but autovacuum could check load -before- and -during- it's execution and
> it could adjust himself automatically to perform more or less
> aggressively depending on the difference between those two values.
> Maybe with a parameter like: maximum-autovacuum-load=0.2
> that would mean: "never load the machine more than 20% for the autovacuum"

This is pretty non-trivial.  How do you define 20% load?  20% of the
CPU?  Does that mean that it's OK for autovac to use 3% cpu and 100% of
your IO?  Ok, so we need to calculate an average of IO and CPU -- which
disks?  If your WAL logs are on one disk, and you've used tablespaces
to spread the rest of your DB across different partitions, it can
be pretty difficult to determine which IO parameters you want to take
into consideration.

--
Bill Moran
Collaborative Fusion Inc.

****************************************************************
IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.
****************************************************************

Re: autovacuum on a -mostly- r/o table

От
Edoardo Ceccarelli
Дата:

True,
but autovacuum could check load -before- and -during- it's execution and 
it could adjust himself automatically to perform more or less 
aggressively depending on the difference between those two values.
Maybe with a parameter like: maximum-autovacuum-load=0.2
that would mean: "never load the machine more than 20% for the autovacuum"   
This is pretty non-trivial.  How do you define 20% load?  20% of the
CPU?  Does that mean that it's OK for autovac to use 3% cpu and 100% of
your IO?  Ok, so we need to calculate an average of IO and CPU -- which
disks?  If your WAL logs are on one disk, and you've used tablespaces
to spread the rest of your DB across different partitions, it can
be pretty difficult to determine which IO parameters you want to take
into consideration.
 
As I said before, it could be done, the main requirement is to find a way for pg to check for a value of the system load; of course it has to be an average value between disk and cpu, of course the daemon would have to collect sample of this values continuously, and of course everything would be better if the server it's only running Postgres
Still I think you are right, it wouldn't suit exactly every situations but it could be an "emergency" feature:
What happened to me was very clear:
server running pg8 under heavy load, cpu's were 90% idle as usual.
At some point the vacuum started, the server reached 50 of overall load and cpu's were 1% idle
I think any test can detect such a situation, regardles if load it's more I/O based or CPU based