Обсуждение: vacuuming problems continued

Поиск
Список
Период
Сортировка

vacuuming problems continued

От
Antoine
Дата:
Hi,
We just don't seem to be getting much benefit from autovacuum. Running
a manual vacuum seems to still be doing a LOT, which suggests to me
that I should either run a cron job and disable autovacuum, or just
run a cron job on top of autovacuum.
The problem is that if I run the same query (an update query) on the
db it takes 4 - 6 times longer than on a fresh copy (dumped then
restored to a different name on the same machine/postgres). There is
clearly an issue here...
I have been thinking about strategies and am still a bit lost. Our
apps are up 24/7 and we didn't code for the eventuality of having the
db going offline for maintenance... we live and learn!
Would it be wise to, every week or so, dump then restore the db
(closing all our apps and then restarting them)? The dump is only
about 270MB, and restore is about 10mins (quite a few large indexes).
It seems that we have no real need for vacuum full (I am clutching at
straws here...), so in theory I could just vacuum/analyse/reindex and
things would be OK. Will a fresh restore be much more performant than
a fully vacuumed/analysed/reindexed db? Probably? Possibly?
I believe I understand the autovacuum docs but...
Help!
8-]
Cheers
Antoine

--
This is where I should put some witty comment.

Re: vacuuming problems continued

От
Tom Lane
Дата:
Antoine <melser.anton@gmail.com> writes:
> We just don't seem to be getting much benefit from autovacuum. Running
> a manual vacuum seems to still be doing a LOT, which suggests to me
> that I should either run a cron job and disable autovacuum, or just
> run a cron job on top of autovacuum.

The default autovac parameters are very unaggressive --- have you
experimented with changing them?  Do you have the FSM set large enough?

            regards, tom lane

Re: vacuuming problems continued

От
"Jim C. Nasby"
Дата:
On Thu, Jun 01, 2006 at 11:05:55AM -0400, Tom Lane wrote:
> Antoine <melser.anton@gmail.com> writes:
> > We just don't seem to be getting much benefit from autovacuum. Running
> > a manual vacuum seems to still be doing a LOT, which suggests to me
> > that I should either run a cron job and disable autovacuum, or just
> > run a cron job on top of autovacuum.
>
> The default autovac parameters are very unaggressive --- have you
> experimented with changing them?  Do you have the FSM set large enough?

Do any of the tables have a high 'churn rate' (a lot of updates) but are
supposed to be small? In cases like that autovacuum may not be enough.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: vacuuming problems continued

От
Andrew Sullivan
Дата:
On Thu, Jun 01, 2006 at 01:54:08PM +0200, Antoine wrote:
> Hi,
> We just don't seem to be getting much benefit from autovacuum. Running
> a manual vacuum seems to still be doing a LOT, which suggests to me
> that I should either run a cron job and disable autovacuum, or just
> run a cron job on top of autovacuum.

What the others said; but also, which version of autovacuum (== which
version of the database) is this?  Because the early versions had a
number of missing bits to them that tended to mean the whole thing
didn't hang together very well.

> I have been thinking about strategies and am still a bit lost. Our
> apps are up 24/7 and we didn't code for the eventuality of having the
> db going offline for maintenance... we live and learn!

You shouldn't need to, with anything after 7.4, if your vacuum
regimen is right.  There's something of a black art to it, though.

A
--
Andrew Sullivan  | ajs@crankycanuck.ca
This work was visionary and imaginative, and goes to show that visionary
and imaginative work need not end up well.
        --Dennis Ritchie

Re: vacuuming problems continued

От
Mischa Sandberg
Дата:
Andrew Sullivan wrote:
> On Thu, Jun 01, 2006 at 01:54:08PM +0200, Antoine wrote:
>> Hi,
>> We just don't seem to be getting much benefit from autovacuum. Running
>> a manual vacuum seems to still be doing a LOT, which suggests to me
>> that I should either run a cron job and disable autovacuum, or just
>> run a cron job on top of autovacuum.

Don't know if this was covered in an earlier thread. Bear with me if so.

I'm working with 7.4.8 and 8.0.3 systems, and pg_autovacuum does have some
glitches ... in part solved by the integrated autovac in 8.1:

- in our env, clients occasionally hit max_connections. This is a known and
(sort of) desired pushback on load. However, that sometimes knocks pg_autovacuum
out.

- db server goes down for any reason: same problem.


Just restarting pg_autovacuum is not good enough; when pg_autovacuum terminates,
it loses its state, so big tables that change less than 50% between such
terminations may never get vacuumed (!)

For that reason, it's taken a switch to a Perl script run from cron every 5
minutes, that persists state in a table. The script is not a plug-compatible
match for pg_autovacuum (hardcoded rates; hardcoded distinction between user and
system tables), but you may find it useful.

--
Engineers think that equations approximate reality.
Physicists think that reality approximates the equations.
Mathematicians never make the connection.
#!/usr/bin/env perl

use strict;
use warnings;
use Carp;
use DBI;
use POSIX        qw(strftime);

# Hardcoded (aggressive) autovacuum parameters:
my ($VAC_BASE, $VAC_RATE)  = (1000, 0.8);
my ($ANA_BASE, $ANA_RATE)  = ( 500, 0.4);

my $VERBOSE = 'VERBOSE';
my $start = time;
my $stamp = strftime "==== %FT%T autovac: ", localtime;

open STDERR, ">&STDOUT"; # Redirect PG "VERBOSE" output.
my $dbh = DBI->connect("dbi:Pg(PrintError=1,RaiseError=1):");

# REVISIT: move this to schema:

my $oid = $dbh->selectall_arrayref(<<"__SQL__")->[0][0];
    SELECT oid FROM pg_class WHERE relname = 'autovac_state';
__SQL__

$dbh->do(<<"__SQL__") if !defined $oid;
    CREATE TABLE public.autovac_state(
        relid           oid NOT NULL PRIMARY KEY,
        name            text NOT NULL,
        analyze_time    timestamptz,
        vacuum_time     timestamptz,
        analyze_tups bigint, -- (n_tup_ins+n_tup_upd+n_tup_del) at last ANALYZE
        vacuum_tups  bigint  -- (n_tup_upd+n_tup_del) at last VACUUM
    );
__SQL__

# Calculate updates/additions to autovac_state:

my $anavac = join ";", map {$_->[0]} @{$dbh->selectall_arrayref(<<"__SQL__")};

    SELECT * INTO TEMP new_state FROM (
    SELECT
        relid,      -- identify tables by ID, so that (re)created tables always
                --      are treated as fresh tables.
                name,       -- for constructing the vacuum/analyze command
        old_relid,  -- NULL means this will need a new state table entry
        analyze_tups,-- _tups are used to update autovac_state
        vacuum_tups,

        CASE WHEN analyze_tups - prev_analyze_tups NOT BETWEEN 0 AND analyze_point
            OR old_relid IS NULL
            THEN now()
        END AS analyze_time,

        CASE WHEN vacuum_tups - prev_vacuum_tups NOT BETWEEN 0 AND vacuum_point
            THEN now()
        END AS vacuum_time

    FROM (
        SELECT  N.nspname || '.' || C.relname               AS name,
            A.relid                                     AS old_relid,
            C.oid                                       AS relid,
            S.n_tup_ins + S.n_tup_upd + S.n_tup_del     AS analyze_tups,
            S.n_tup_upd + S.n_tup_del                   AS vacuum_tups,
            COALESCE(A.analyze_tups,0)                   AS prev_analyze_tups,
            COALESCE(A.vacuum_tups,0)                    AS prev_vacuum_tups,
            CASE WHEN nspname ~ '^pg_' THEN 1.0 ELSE $ANA_RATE END
            * C.reltuples + $ANA_BASE        AS analyze_point,
            CASE WHEN nspname ~ '^pg_' THEN 2.0 ELSE $VAC_RATE END
            * C.reltuples + $VAC_BASE        AS vacuum_point
        FROM    pg_class                AS C
        JOIN    pg_namespace            AS N ON N.oid = C.relnamespace
        JOIN    pg_stat_all_tables      AS S ON S.relid = C.oid
        LEFT JOIN autovac_state         AS A ON A.relid = S.relid
        WHERE   N.nspname NOT LIKE 'pg_temp%'
    ) AS X
    ) AS X
    WHERE   analyze_time IS NOT NULL OR vacuum_time IS NOT NULL;

    SELECT CASE WHEN vacuum_time IS NOT NULL
                THEN 'VACUUM ANALYZE $VERBOSE ' || name
                ELSE 'ANALYZE $VERBOSE ' || name
            END
    FROM new_state;
__SQL__

if ($anavac) {
    print STDERR $stamp."start\n";

    $dbh->do(<<"__SQL__");

        $anavac;

        UPDATE  autovac_state
        SET     analyze_tups = N.analyze_tups,
                vacuum_tups     = CASE WHEN N.vacuum_time IS NULL
                                       THEN autovac_state.vacuum_tups
                                       ELSE N.vacuum_tups
                                  END,
                analyze_time    = COALESCE(N.analyze_time, autovac_state.analyze_time),
                vacuum_time     = COALESCE(N.vacuum_time, autovac_state.vacuum_time)
        FROM    new_state AS N
        WHERE   N.relid = autovac_state.relid;

        INSERT INTO autovac_state
        SELECT  relid, name, analyze_time, vacuum_time, analyze_tups, vacuum_tups
        FROM    new_state
    WHERE     old_relid IS NULL;

    DELETE FROM autovac_state
    WHERE    analyze_time < now() - '1 day'::INTERVAL
    AND    relid NOT IN (SELECT oid FROM pg_class);
__SQL__

    print STDERR $stamp.(time - $start)." secs\n";
}

$dbh->do("DROP TABLE new_state");

1;
__END__

=head1 SYNOPSIS

autovac - autovacuum with persistent state.

=head1 DESCRIPTION

C<autovac> is a replacement for C<pg_autovacuum>.
C<autovac> does a single C<pg_autovacuum> step, then saves its state in
the C<public.autovac_state> table. It should be run from cron, say,
every 5 minutes.


C<autovac> runs more aggressively for user tables
(pg_autovacuum -V 0.8 -v 1000) than for system tables.

When pg_stats_user_tables.(ins/upd/del) counts have been zeroed,
C<autovac> vacuums all tables.

To check when C<autovac> last analyzed/vacuumed a given table:

    SELECT vacuum_time, analyze_time FROM autovac_state
    WHERE name = 'public.message'

=head1 OUTPUT

If any action is taken, C<autovac> prints:

    ==== yyyy-mm-dd hh:mm:ss autovac start
    ... VACUUM/ANALYZE VERBOSE output ...
    ==== yyyy-mm-dd hh:mm:ss autovac <n> secs

=head1 PG_AUTOVACUUM

=cut

Re: vacuuming problems continued

От
"Joshua D. Drake"
Дата:
Mischa Sandberg wrote:
> Andrew Sullivan wrote:
>> On Thu, Jun 01, 2006 at 01:54:08PM +0200, Antoine wrote:
>>> Hi,
>>> We just don't seem to be getting much benefit from autovacuum. Running
>>> a manual vacuum seems to still be doing a LOT, which suggests to me
>>> that I should either run a cron job and disable autovacuum, or just
>>> run a cron job on top of autovacuum.
>
> Don't know if this was covered in an earlier thread. Bear with me if so.
>
> I'm working with 7.4.8 and 8.0.3 systems, and pg_autovacuum does have
> some glitches ... in part solved by the integrated autovac in 8.1:
>
> - in our env, clients occasionally hit max_connections. This is a known
> and (sort of) desired pushback on load. However, that sometimes knocks
> pg_autovacuum out.

That is when you use:

superuser_reserved_connections

In the postgresql.conf

> - db server goes down for any reason: same problem.

I believe you can use

stats_reset_on_server_start = on

For that little problem.

Joshua D. Drake

--

    === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
    Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/



Re: vacuuming problems continued

От
Mischa Sandberg
Дата:
Joshua D. Drake wrote:
>> - in our env, clients occasionally hit max_connections. This is a
>> known and (sort of) desired pushback on load. However, that sometimes
>> knocks pg_autovacuum out.
>
> That is when you use:
>
> superuser_reserved_connections

Blush. Good point. Though, when we hit max_connections on 7.4.8 systems,
it's been a lemonade-from-lemons plus that vacuuming didn't fire up on top of
everything else :-)

>> - db server goes down for any reason: same problem.
>
> I believe you can use
> stats_reset_on_server_start = on

We do. The problem is not the loss of pg_stat_user_tables.(n_tup_ins,...)
It's the loss of pg_autovacuum's CountAtLastVacuum (and ...Analyze)
numbers, which are kept in process memory. Never considered patching
pg_autovacuum to just sleep and try again, rather than exit, on a failed
db connection.

--
Engineers think that equations approximate reality.
Physicists think that reality approximates the equations.
Mathematicians never make the connection.

Re: vacuuming problems continued

От
Antoine
Дата:
Hi all and thanks for your responses. I haven't yet had a chance to
tweak the autovac settings but I really don't think that things can be
maxing out even the default settings.
We have about 4 machines that are connected 24/7 - they were doing
constant read/inserts (24/7) but that was because the code was
rubbish. I managed to whinge enough to get the programme to read, do
the work, then insert, and that means they are accessing (connected
but idle) for less than 5% of the day. We have about another 10
machines that access (reads and updates) from 8-5. It is running on a
P4 with 256 or 512meg of ram and I simply refuse to believe this load
is anything significant... :-(.
There are only two tables that see any action, and the smaller one is
almost exclusively inserts.
Much as I believe it shouldn't be possible the ratio of 5:1 for the db
vs fresh copy has given me a taste for a copy/drop scenario...
I will try and increase settings and keep you posted.
Cheers
Antoine


--
This is where I should put some witty comment.

Re: vacuuming problems continued

От
"Jim C. Nasby"
Дата:
Bloat doesn't depend on your update/delete rate; it depends on how many
update/deletes occur between vacuums. Long running transactions also
come into play.

As for performance, a P4 with 512M of ram is pretty much a toy in the
database world; it wouldn't be very hard to swamp it.

But without actual details there's no way to know.

On Wed, Jun 07, 2006 at 10:52:26PM +0200, Antoine wrote:
> Hi all and thanks for your responses. I haven't yet had a chance to
> tweak the autovac settings but I really don't think that things can be
> maxing out even the default settings.
> We have about 4 machines that are connected 24/7 - they were doing
> constant read/inserts (24/7) but that was because the code was
> rubbish. I managed to whinge enough to get the programme to read, do
> the work, then insert, and that means they are accessing (connected
> but idle) for less than 5% of the day. We have about another 10
> machines that access (reads and updates) from 8-5. It is running on a
> P4 with 256 or 512meg of ram and I simply refuse to believe this load
> is anything significant... :-(.
> There are only two tables that see any action, and the smaller one is
> almost exclusively inserts.
> Much as I believe it shouldn't be possible the ratio of 5:1 for the db
> vs fresh copy has given me a taste for a copy/drop scenario...
> I will try and increase settings and keep you posted.
> Cheers
> Antoine
>
>
> --
> This is where I should put some witty comment.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461