Обсуждение: vacuuming problems continued
Hi, We just don't seem to be getting much benefit from autovacuum. Running a manual vacuum seems to still be doing a LOT, which suggests to me that I should either run a cron job and disable autovacuum, or just run a cron job on top of autovacuum. The problem is that if I run the same query (an update query) on the db it takes 4 - 6 times longer than on a fresh copy (dumped then restored to a different name on the same machine/postgres). There is clearly an issue here... I have been thinking about strategies and am still a bit lost. Our apps are up 24/7 and we didn't code for the eventuality of having the db going offline for maintenance... we live and learn! Would it be wise to, every week or so, dump then restore the db (closing all our apps and then restarting them)? The dump is only about 270MB, and restore is about 10mins (quite a few large indexes). It seems that we have no real need for vacuum full (I am clutching at straws here...), so in theory I could just vacuum/analyse/reindex and things would be OK. Will a fresh restore be much more performant than a fully vacuumed/analysed/reindexed db? Probably? Possibly? I believe I understand the autovacuum docs but... Help! 8-] Cheers Antoine -- This is where I should put some witty comment.
Antoine <melser.anton@gmail.com> writes: > We just don't seem to be getting much benefit from autovacuum. Running > a manual vacuum seems to still be doing a LOT, which suggests to me > that I should either run a cron job and disable autovacuum, or just > run a cron job on top of autovacuum. The default autovac parameters are very unaggressive --- have you experimented with changing them? Do you have the FSM set large enough? regards, tom lane
On Thu, Jun 01, 2006 at 11:05:55AM -0400, Tom Lane wrote: > Antoine <melser.anton@gmail.com> writes: > > We just don't seem to be getting much benefit from autovacuum. Running > > a manual vacuum seems to still be doing a LOT, which suggests to me > > that I should either run a cron job and disable autovacuum, or just > > run a cron job on top of autovacuum. > > The default autovac parameters are very unaggressive --- have you > experimented with changing them? Do you have the FSM set large enough? Do any of the tables have a high 'churn rate' (a lot of updates) but are supposed to be small? In cases like that autovacuum may not be enough. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Thu, Jun 01, 2006 at 01:54:08PM +0200, Antoine wrote: > Hi, > We just don't seem to be getting much benefit from autovacuum. Running > a manual vacuum seems to still be doing a LOT, which suggests to me > that I should either run a cron job and disable autovacuum, or just > run a cron job on top of autovacuum. What the others said; but also, which version of autovacuum (== which version of the database) is this? Because the early versions had a number of missing bits to them that tended to mean the whole thing didn't hang together very well. > I have been thinking about strategies and am still a bit lost. Our > apps are up 24/7 and we didn't code for the eventuality of having the > db going offline for maintenance... we live and learn! You shouldn't need to, with anything after 7.4, if your vacuum regimen is right. There's something of a black art to it, though. A -- Andrew Sullivan | ajs@crankycanuck.ca This work was visionary and imaginative, and goes to show that visionary and imaginative work need not end up well. --Dennis Ritchie
Andrew Sullivan wrote: > On Thu, Jun 01, 2006 at 01:54:08PM +0200, Antoine wrote: >> Hi, >> We just don't seem to be getting much benefit from autovacuum. Running >> a manual vacuum seems to still be doing a LOT, which suggests to me >> that I should either run a cron job and disable autovacuum, or just >> run a cron job on top of autovacuum. Don't know if this was covered in an earlier thread. Bear with me if so. I'm working with 7.4.8 and 8.0.3 systems, and pg_autovacuum does have some glitches ... in part solved by the integrated autovac in 8.1: - in our env, clients occasionally hit max_connections. This is a known and (sort of) desired pushback on load. However, that sometimes knocks pg_autovacuum out. - db server goes down for any reason: same problem. Just restarting pg_autovacuum is not good enough; when pg_autovacuum terminates, it loses its state, so big tables that change less than 50% between such terminations may never get vacuumed (!) For that reason, it's taken a switch to a Perl script run from cron every 5 minutes, that persists state in a table. The script is not a plug-compatible match for pg_autovacuum (hardcoded rates; hardcoded distinction between user and system tables), but you may find it useful. -- Engineers think that equations approximate reality. Physicists think that reality approximates the equations. Mathematicians never make the connection. #!/usr/bin/env perl use strict; use warnings; use Carp; use DBI; use POSIX qw(strftime); # Hardcoded (aggressive) autovacuum parameters: my ($VAC_BASE, $VAC_RATE) = (1000, 0.8); my ($ANA_BASE, $ANA_RATE) = ( 500, 0.4); my $VERBOSE = 'VERBOSE'; my $start = time; my $stamp = strftime "==== %FT%T autovac: ", localtime; open STDERR, ">&STDOUT"; # Redirect PG "VERBOSE" output. my $dbh = DBI->connect("dbi:Pg(PrintError=1,RaiseError=1):"); # REVISIT: move this to schema: my $oid = $dbh->selectall_arrayref(<<"__SQL__")->[0][0]; SELECT oid FROM pg_class WHERE relname = 'autovac_state'; __SQL__ $dbh->do(<<"__SQL__") if !defined $oid; CREATE TABLE public.autovac_state( relid oid NOT NULL PRIMARY KEY, name text NOT NULL, analyze_time timestamptz, vacuum_time timestamptz, analyze_tups bigint, -- (n_tup_ins+n_tup_upd+n_tup_del) at last ANALYZE vacuum_tups bigint -- (n_tup_upd+n_tup_del) at last VACUUM ); __SQL__ # Calculate updates/additions to autovac_state: my $anavac = join ";", map {$_->[0]} @{$dbh->selectall_arrayref(<<"__SQL__")}; SELECT * INTO TEMP new_state FROM ( SELECT relid, -- identify tables by ID, so that (re)created tables always -- are treated as fresh tables. name, -- for constructing the vacuum/analyze command old_relid, -- NULL means this will need a new state table entry analyze_tups,-- _tups are used to update autovac_state vacuum_tups, CASE WHEN analyze_tups - prev_analyze_tups NOT BETWEEN 0 AND analyze_point OR old_relid IS NULL THEN now() END AS analyze_time, CASE WHEN vacuum_tups - prev_vacuum_tups NOT BETWEEN 0 AND vacuum_point THEN now() END AS vacuum_time FROM ( SELECT N.nspname || '.' || C.relname AS name, A.relid AS old_relid, C.oid AS relid, S.n_tup_ins + S.n_tup_upd + S.n_tup_del AS analyze_tups, S.n_tup_upd + S.n_tup_del AS vacuum_tups, COALESCE(A.analyze_tups,0) AS prev_analyze_tups, COALESCE(A.vacuum_tups,0) AS prev_vacuum_tups, CASE WHEN nspname ~ '^pg_' THEN 1.0 ELSE $ANA_RATE END * C.reltuples + $ANA_BASE AS analyze_point, CASE WHEN nspname ~ '^pg_' THEN 2.0 ELSE $VAC_RATE END * C.reltuples + $VAC_BASE AS vacuum_point FROM pg_class AS C JOIN pg_namespace AS N ON N.oid = C.relnamespace JOIN pg_stat_all_tables AS S ON S.relid = C.oid LEFT JOIN autovac_state AS A ON A.relid = S.relid WHERE N.nspname NOT LIKE 'pg_temp%' ) AS X ) AS X WHERE analyze_time IS NOT NULL OR vacuum_time IS NOT NULL; SELECT CASE WHEN vacuum_time IS NOT NULL THEN 'VACUUM ANALYZE $VERBOSE ' || name ELSE 'ANALYZE $VERBOSE ' || name END FROM new_state; __SQL__ if ($anavac) { print STDERR $stamp."start\n"; $dbh->do(<<"__SQL__"); $anavac; UPDATE autovac_state SET analyze_tups = N.analyze_tups, vacuum_tups = CASE WHEN N.vacuum_time IS NULL THEN autovac_state.vacuum_tups ELSE N.vacuum_tups END, analyze_time = COALESCE(N.analyze_time, autovac_state.analyze_time), vacuum_time = COALESCE(N.vacuum_time, autovac_state.vacuum_time) FROM new_state AS N WHERE N.relid = autovac_state.relid; INSERT INTO autovac_state SELECT relid, name, analyze_time, vacuum_time, analyze_tups, vacuum_tups FROM new_state WHERE old_relid IS NULL; DELETE FROM autovac_state WHERE analyze_time < now() - '1 day'::INTERVAL AND relid NOT IN (SELECT oid FROM pg_class); __SQL__ print STDERR $stamp.(time - $start)." secs\n"; } $dbh->do("DROP TABLE new_state"); 1; __END__ =head1 SYNOPSIS autovac - autovacuum with persistent state. =head1 DESCRIPTION C<autovac> is a replacement for C<pg_autovacuum>. C<autovac> does a single C<pg_autovacuum> step, then saves its state in the C<public.autovac_state> table. It should be run from cron, say, every 5 minutes. C<autovac> runs more aggressively for user tables (pg_autovacuum -V 0.8 -v 1000) than for system tables. When pg_stats_user_tables.(ins/upd/del) counts have been zeroed, C<autovac> vacuums all tables. To check when C<autovac> last analyzed/vacuumed a given table: SELECT vacuum_time, analyze_time FROM autovac_state WHERE name = 'public.message' =head1 OUTPUT If any action is taken, C<autovac> prints: ==== yyyy-mm-dd hh:mm:ss autovac start ... VACUUM/ANALYZE VERBOSE output ... ==== yyyy-mm-dd hh:mm:ss autovac <n> secs =head1 PG_AUTOVACUUM =cut
Mischa Sandberg wrote: > Andrew Sullivan wrote: >> On Thu, Jun 01, 2006 at 01:54:08PM +0200, Antoine wrote: >>> Hi, >>> We just don't seem to be getting much benefit from autovacuum. Running >>> a manual vacuum seems to still be doing a LOT, which suggests to me >>> that I should either run a cron job and disable autovacuum, or just >>> run a cron job on top of autovacuum. > > Don't know if this was covered in an earlier thread. Bear with me if so. > > I'm working with 7.4.8 and 8.0.3 systems, and pg_autovacuum does have > some glitches ... in part solved by the integrated autovac in 8.1: > > - in our env, clients occasionally hit max_connections. This is a known > and (sort of) desired pushback on load. However, that sometimes knocks > pg_autovacuum out. That is when you use: superuser_reserved_connections In the postgresql.conf > - db server goes down for any reason: same problem. I believe you can use stats_reset_on_server_start = on For that little problem. Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
Joshua D. Drake wrote: >> - in our env, clients occasionally hit max_connections. This is a >> known and (sort of) desired pushback on load. However, that sometimes >> knocks pg_autovacuum out. > > That is when you use: > > superuser_reserved_connections Blush. Good point. Though, when we hit max_connections on 7.4.8 systems, it's been a lemonade-from-lemons plus that vacuuming didn't fire up on top of everything else :-) >> - db server goes down for any reason: same problem. > > I believe you can use > stats_reset_on_server_start = on We do. The problem is not the loss of pg_stat_user_tables.(n_tup_ins,...) It's the loss of pg_autovacuum's CountAtLastVacuum (and ...Analyze) numbers, which are kept in process memory. Never considered patching pg_autovacuum to just sleep and try again, rather than exit, on a failed db connection. -- Engineers think that equations approximate reality. Physicists think that reality approximates the equations. Mathematicians never make the connection.
Hi all and thanks for your responses. I haven't yet had a chance to tweak the autovac settings but I really don't think that things can be maxing out even the default settings. We have about 4 machines that are connected 24/7 - they were doing constant read/inserts (24/7) but that was because the code was rubbish. I managed to whinge enough to get the programme to read, do the work, then insert, and that means they are accessing (connected but idle) for less than 5% of the day. We have about another 10 machines that access (reads and updates) from 8-5. It is running on a P4 with 256 or 512meg of ram and I simply refuse to believe this load is anything significant... :-(. There are only two tables that see any action, and the smaller one is almost exclusively inserts. Much as I believe it shouldn't be possible the ratio of 5:1 for the db vs fresh copy has given me a taste for a copy/drop scenario... I will try and increase settings and keep you posted. Cheers Antoine -- This is where I should put some witty comment.
Bloat doesn't depend on your update/delete rate; it depends on how many update/deletes occur between vacuums. Long running transactions also come into play. As for performance, a P4 with 512M of ram is pretty much a toy in the database world; it wouldn't be very hard to swamp it. But without actual details there's no way to know. On Wed, Jun 07, 2006 at 10:52:26PM +0200, Antoine wrote: > Hi all and thanks for your responses. I haven't yet had a chance to > tweak the autovac settings but I really don't think that things can be > maxing out even the default settings. > We have about 4 machines that are connected 24/7 - they were doing > constant read/inserts (24/7) but that was because the code was > rubbish. I managed to whinge enough to get the programme to read, do > the work, then insert, and that means they are accessing (connected > but idle) for less than 5% of the day. We have about another 10 > machines that access (reads and updates) from 8-5. It is running on a > P4 with 256 or 512meg of ram and I simply refuse to believe this load > is anything significant... :-(. > There are only two tables that see any action, and the smaller one is > almost exclusively inserts. > Much as I believe it shouldn't be possible the ratio of 5:1 for the db > vs fresh copy has given me a taste for a copy/drop scenario... > I will try and increase settings and keep you posted. > Cheers > Antoine > > > -- > This is where I should put some witty comment. > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461