Обсуждение: 8.1.8 autovacuum missing databases

Поиск
Список
Период
Сортировка

8.1.8 autovacuum missing databases

От
Ian Westmacott
Дата:
I have a Postgres 8.1.8 system with three databases.  Although
autovacuum is enabled (I'm using all default autovacuum configuration
settings) and pg_autovacuum is empty, the logs indicate that only
template1 and one of the three databases are being processed by
autovacuum.  Are there other reasons why the other two databases might
be skipped, or some way I could diagnose the issue?

Thanks,

    --Ian



Re: 8.1.8 autovacuum missing databases

От
Alvaro Herrera
Дата:
Ian Westmacott wrote:
> I have a Postgres 8.1.8 system with three databases.  Although
> autovacuum is enabled (I'm using all default autovacuum configuration
> settings) and pg_autovacuum is empty, the logs indicate that only
> template1 and one of the three databases are being processed by
> autovacuum.  Are there other reasons why the other two databases might
> be skipped, or some way I could diagnose the issue?

Is autovacuum dying before being able to finish the vacuuming of
template1 or the other database?

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: 8.1.8 autovacuum missing databases

От
Ian Westmacott
Дата:
On Tue, 2008-04-29 at 16:33 -0400, Alvaro Herrera wrote:
> Is autovacuum dying before being able to finish the vacuuming of
> template1 or the other database?

Not as far as I can tell.  There are no indications of any crash or
error in the log file (I just bumped the log level up to debug1).  Just
autovacuum processing template1 or the one other database every minute.
I ran VACUUM ANALYZE on both template1 and the one database manually
just to make sure there wasn't a lock or something preventing
completion, but they both completed without error.

    --Ian



Re: 8.1.8 autovacuum missing databases

От
Alvaro Herrera
Дата:
Ian Westmacott wrote:
> On Tue, 2008-04-29 at 16:33 -0400, Alvaro Herrera wrote:
> > Is autovacuum dying before being able to finish the vacuuming of
> > template1 or the other database?
>
> Not as far as I can tell.  There are no indications of any crash or
> error in the log file (I just bumped the log level up to debug1).  Just
> autovacuum processing template1 or the one other database every minute.
> I ran VACUUM ANALYZE on both template1 and the one database manually
> just to make sure there wasn't a lock or something preventing
> completion, but they both completed without error.

Okay.

On 8.2 you could have the problem that leftover temp tables are messing
up the frozenxid calculations, but I don't think that can happen on 8.1
at all.

Can you verify whether age(pg_database.datfrozenxid) is shrinking after
vacuuming?  If the age() of a database is higher than the applicable
max_freeze_age, then it will always be chosen.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: 8.1.8 autovacuum missing databases

От
Ian Westmacott
Дата:
On Tue, 2008-04-29 at 17:12 -0400, Alvaro Herrera wrote:
> Can you verify whether age(pg_database.datfrozenxid) is shrinking after
> vacuuming?  If the age() of a database is higher than the applicable
> max_freeze_age, then it will always be chosen.

One of the databases is about 1.5TB, so that could take a while.
Anything else I could look at in the meantime?  The age of these
databases right now is about 290039577.  Also, autovacuum doesn't always
choose the same DB, it alternates between template1 and one of mine --
going back through a couple of months of logs I don't see it ever
choosing the same one twice in a row.

I'll note that I have 5 of these essentially identical systems (same
hardware/software platform, databases, configuration, and load).  Two of
them have this issue (autovacuum processing multiple databases, but not
all) and the other three don't.  A commonality between the two that have
this issue is that template1 is processed but postgres is not.  One of
the two is processing the 1.5TB DB but the other is not.

Thanks,

    --Ian



Re: 8.1.8 autovacuum missing databases

От
Alvaro Herrera
Дата:
Ian Westmacott wrote:
> On Tue, 2008-04-29 at 17:12 -0400, Alvaro Herrera wrote:
> > Can you verify whether age(pg_database.datfrozenxid) is shrinking after
> > vacuuming?  If the age() of a database is higher than the applicable
> > max_freeze_age, then it will always be chosen.
>
> One of the databases is about 1.5TB, so that could take a while.
> Anything else I could look at in the meantime?  The age of these
> databases right now is about 290039577.  Also, autovacuum doesn't always
> choose the same DB, it alternates between template1 and one of mine --
> going back through a couple of months of logs I don't see it ever
> choosing the same one twice in a row.

Right -- if you have two databases over the wraparound edge, it will
alternate between them.

I still think that the autovac is dying before completing the task.  Did
you investigate whether there are "ERROR" messages coming from
autovacuum?  No PG crashes would happen.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: 8.1.8 autovacuum missing databases

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Right -- if you have two databases over the wraparound edge, it will
> alternate between them.

> I still think that the autovac is dying before completing the task.  Did
> you investigate whether there are "ERROR" messages coming from
> autovacuum?  No PG crashes would happen.

AFAICS, if it's triggering a whole_db run and the run fails to get to
the point of updating pg_database, then the next run will pick the very
same db.

I'm noticing though that the 8.1 logic pays attention to both
datvacuumxid and datfrozenxid.  Could we see the age() of both of
those columns for all the databases?

            regards, tom lane

Re: 8.1.8 autovacuum missing databases

От
Ian Westmacott
Дата:
On Wed, 2008-04-30 at 12:27 -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > I still think that the autovac is dying before completing the task.  Did
> > you investigate whether there are "ERROR" messages coming from
> > autovacuum?  No PG crashes would happen.

There are no ERROR messages at all.  Sample log snippet:

2008-04-30 12:03:36 EDT LOG:  autovacuum: processing database
"template1"
2008-04-30 12:04:36 EDT LOG:  autovacuum: processing database
"itvtrackdatapos"
2008-04-30 12:05:36 EDT LOG:  autovacuum: processing database
"template1"
2008-04-30 12:06:36 EDT LOG:  autovacuum: processing database
"itvtrackdatapos"
2008-04-30 12:07:36 EDT LOG:  autovacuum: processing database
"template1"
2008-04-30 12:08:36 EDT LOG:  autovacuum: processing database
"itvtrackdatapos"
2008-04-30 12:09:36 EDT LOG:  autovacuum: processing database
"template1"
2008-04-30 12:10:36 EDT LOG:  autovacuum: processing database
"itvtrackdatapos"


> I'm noticing though that the 8.1 logic pays attention to both
> datvacuumxid and datfrozenxid.  Could we see the age() of both of
> those columns for all the databases.

The are all identical:

itvtrackdata=> select datname,age(datfrozenxid),age(datvacuumxid) from
pg_database;
     datname      |    age    |    age
------------------+-----------+-----------
 postgres         | 295995059 | 295995059
 itvtrackdata     | 295995059 | 295995059
 itvtrackdatauser | 295995059 | 295995059
 itvtrackdatapos  | 295995059 | 295995059
 template1        | 295995059 | 295995059
 template0        | 295995059 | 295995059
(6 rows)

itvtrackdata=>


    --Ian



Re: 8.1.8 autovacuum missing databases

От
Tom Lane
Дата:
Ian Westmacott <ianw@intellivid.com> writes:
>> I'm noticing though that the 8.1 logic pays attention to both
>> datvacuumxid and datfrozenxid.  Could we see the age() of both of
>> those columns for all the databases.

> The are all identical:

Oh, that's really strange.  Could we see

select ctid,xmin,* from pg_database

            regards, tom lane

Re: 8.1.8 autovacuum missing databases

От
Ian Westmacott
Дата:
On Wed, 2008-04-30 at 13:07 -0400, Tom Lane wrote:
> Oh, that's really strange.  Could we see
>
> select ctid,xmin,* from pg_database

itvtrackdata=> select ctid,xmin,* from pg_database;
 ctid  |   xmin   |     datname      | datdba | encoding | datistemplate | datallowconn | datconnlimit | datlastsysoid
|datvacuumxid | datfrozenxid | dattablespace | datconfig |         datacl 

-------+----------+------------------+--------+----------+---------------+--------------+--------------+---------------+--------------+--------------+---------------+-----------+------------------------
 (0,1) |      564 | postgres         |     10 |        6 | f             | t            |           -1 |         10792
|         499 |          499 |          1663 |           | 
 (0,2) |      577 | itvtrackdata     |  16384 |        6 | f             | t            |           -1 |         10792
|         499 |          499 |          1663 |           | 
 (0,3) |      605 | itvtrackdatauser |  16384 |        6 | f             | t            |           -1 |         10792
|         499 |          499 |          1663 |           | 
 (0,5) | 18350176 | itvtrackdatapos  |  16384 |        6 | f             | t            |           -1 |         10792
|         499 |          499 |          1663 |           | 
 (0,6) |      557 | template1        |     10 |        6 | t             | t            |           -1 |         10792
|         499 |          499 |          1663 |           | {postgres=CT/postgres} 
 (0,7) |      558 | template0        |     10 |        6 | t             | f            |           -1 |         10792
|         499 |          499 |          1663 |           | {postgres=CT/postgres} 
(6 rows)

itvtrackdata=>

    --Ian



Re: 8.1.8 autovacuum missing databases

От
Tom Lane
Дата:
Ian Westmacott <ianw@intellivid.com> writes:
> itvtrackdata=> select ctid,xmin,* from pg_database;
>  ctid  |   xmin   |     datname      | datdba | encoding | datistemplate | datallowconn | datconnlimit |
datlastsysoid| datvacuumxid | datfrozenxid | dattablespace | datconfig |         datacl 
>
-------+----------+------------------+--------+----------+---------------+--------------+--------------+---------------+--------------+--------------+---------------+-----------+------------------------
>  (0,1) |      564 | postgres         |     10 |        6 | f             | t            |           -1 |
10792|          499 |          499 |          1663 |           | 
>  (0,2) |      577 | itvtrackdata     |  16384 |        6 | f             | t            |           -1 |
10792|          499 |          499 |          1663 |           | 
>  (0,3) |      605 | itvtrackdatauser |  16384 |        6 | f             | t            |           -1 |
10792|          499 |          499 |          1663 |           | 
>  (0,5) | 18350176 | itvtrackdatapos  |  16384 |        6 | f             | t            |           -1 |
10792|          499 |          499 |          1663 |           | 
>  (0,6) |      557 | template1        |     10 |        6 | t             | t            |           -1 |
10792|          499 |          499 |          1663 |           | {postgres=CT/postgres} 
>  (0,7) |      558 | template0        |     10 |        6 | t             | f            |           -1 |
10792|          499 |          499 |          1663 |           | {postgres=CT/postgres} 
> (6 rows)

499 is the value that those columns would have immediately after initdb,
in an 8.1 database.  So what this says is that vacuum has never
succeeded in updating the values at all, in any of your databases.
It definitely *should* be doing that given the size of the age() values
you're reporting.  Moreover, after a look through the 8.1.8 source code
I cannot see how it would not update the values without throwing an
ERROR or at least a WARNING into the postmaster log.  (What have you got
log_min_messages set to, anyway?  Maybe the complaint is getting
suppressed?)

            regards, tom lane

Re: 8.1.8 autovacuum missing databases

От
Ian Westmacott
Дата:
On Wed, 2008-04-30 at 16:43 -0400, Tom Lane wrote:
> Ian Westmacott <ianw@intellivid.com> writes:
> 499 is the value that those columns would have immediately after initdb,
> in an 8.1 database.  So what this says is that vacuum has never
> succeeded in updating the values at all, in any of your databases.
> It definitely *should* be doing that given the size of the age() values
> you're reporting.  Moreover, after a look through the 8.1.8 source code
> I cannot see how it would not update the values without throwing an
> ERROR or at least a WARNING into the postmaster log.  (What have you got
> log_min_messages set to, anyway?  Maybe the complaint is getting
> suppressed?)

log_min_messages is set to the default (notice), until I reset it to
debug1 a couple of days ago.  I combed back through the logs to initial
installation in Feb.  There are no ERRORs or WARNINGs that are obviously
related.  However, what I did find was a couple of small log files with
timestamps in the future (Jan 2009).

Baffled, I did some digging around and it looks like the user initially
installed this system with the system clock set in Jan 2009.  The system
ran for about 20 minutes, during which time two of our application
databases were created and processed by autovacuum.  Then the user
corrected the time problem.  During that 20 minutes in the future,
autovacuum processed postgres, template1, and two of our databases
(itvtrackdata and itvtrackdatauser).  autovacuum hasn't touched our two
since time was corrected.

I imagine if a database's last process time is in the future, that would
mess up autovacuum's choice algorithm (at the least).  Can I confirm
this is the problem?  Is there a way to fix it short of dump/restore?
Anything else I should worry about in this installation?

I mentioned earlier that I had two installations like this.  The second
one doesn't seem to have any time issues.  But in that case the only
database being skipped is postgres.  Do I need to worry about that?

Thanks,

    --Ian



Re: 8.1.8 autovacuum missing databases

От
Tom Lane
Дата:
Ian Westmacott <ianw@intellivid.com> writes:
> I imagine if a database's last process time is in the future, that would
> mess up autovacuum's choice algorithm (at the least).  Can I confirm
> this is the problem?  Is there a way to fix it short of dump/restore?

Hmm, that would explain a lot wouldn't it ... except that it looks to
me like your databases have old enough datvacuumxid to force vacuuming
regardless of last_autovac_time.  So I'm not fully convinced.

AFAICS there is no convenient way in 8.1 to examine the per-database
last_autovac_time entries, which'd be needed to confirm this theory.
We should check that just to be sure.  Please do the following:

1. Stop the postmaster.
2. Move the file $PGDATA/global/pgstat.stat somewhere else.
3. Restart the postmaster.
4. Send pgstat.stat as a binary attachment to me or Alvaro.

This will reset all your statistics counters and also the
last_autovac_time settings.  If autovac starts to behave more
normally then we'll know that was it.

> I mentioned earlier that I had two installations like this.  The second
> one doesn't seem to have any time issues.  But in that case the only
> database being skipped is postgres.  Do I need to worry about that?

Is the age() getting unreasonably large for postgres?  It might be
skipping it because there's been no activity.

            regards, tom lane

Re: 8.1.8 autovacuum missing databases

От
Ian Westmacott
Дата:
On Thu, 2008-05-01 at 09:57 -0400, Tom Lane wrote:
> AFAICS there is no convenient way in 8.1 to examine the per-database
> last_autovac_time entries, which'd be needed to confirm this theory.
> We should check that just to be sure.  Please do the following:
>
> 1. Stop the postmaster.
> 2. Move the file $PGDATA/global/pgstat.stat somewhere else.
> 3. Restart the postmaster.
> 4. Send pgstat.stat as a binary attachment to me or Alvaro.

Done; I'll send pgstat.stat separately.

> This will reset all your statistics counters and also the
> last_autovac_time settings.  If autovac starts to behave more
> normally then we'll know that was it.

I don't know if it is normal, but it is different.  Now its just going
back and forth between itvtrackdata and itvtrackdatapos:

2008-05-01 13:30:37 EDT 10482 LOG:  transaction ID wrap limit is
2147484146, limited by database "postgres"
2008-05-01 13:32:03 EDT 11295 LOG:  autovacuum: processing database
"itvtrackdata"
2008-05-01 13:34:03 EDT 12384 LOG:  autovacuum: processing database
"itvtrackdatapos"
2008-05-01 13:35:03 EDT 12897 LOG:  autovacuum: processing database
"itvtrackdata"
2008-05-01 13:36:33 EDT 13769 LOG:  autovacuum: processing database
"itvtrackdatapos"
2008-05-01 13:37:33 EDT 14292 LOG:  autovacuum: processing database
"itvtrackdata"
2008-05-01 13:39:03 EDT 15124 LOG:  autovacuum: processing database
"itvtrackdatapos"
2008-05-01 13:40:03 EDT 15713 LOG:  autovacuum: processing database
"itvtrackdata"
2008-05-01 13:41:03 EDT 16387 LOG:  autovacuum: processing database
"itvtrackdatapos"
2008-05-01 13:42:03 EDT 16925 LOG:  autovacuum: processing database
"itvtrackdata"
2008-05-01 13:43:03 EDT 17510 LOG:  autovacuum: processing database
"itvtrackdatapos"
2008-05-01 13:44:03 EDT 18085 LOG:  autovacuum: processing database
"itvtrackdata"
2008-05-01 13:45:03 EDT 18722 LOG:  autovacuum: processing database
"itvtrackdatapos"
2008-05-01 13:46:03 EDT 19447 LOG:  autovacuum: processing database
"itvtrackdata"

> > I mentioned earlier that I had two installations like this.  The second
> > one doesn't seem to have any time issues.  But in that case the only
> > database being skipped is postgres.  Do I need to worry about that?
>
> Is the age() getting unreasonably large for postgres?  It might be
> skipping it because there's been no activity.

itvtrackdata=> select datname,age(datfrozenxid),age(datvacuumxid) from
pg_database;
     datname      |    age    |    age
------------------+-----------+-----------
 postgres         | 157900061 | 157900061
 itvtrackdata     | 157900061 | 157900061
 itvtrackdatauser | 157900061 | 157900061
 itvtrackdatapos  | 157900061 | 157900061
 template1        | 157900061 | 157900061
 template0        | 157900061 | 157900061
(6 rows)

itvtrackdata=>


(they're all 499 too).  Thanks,

    --Ian



Re: 8.1.8 autovacuum missing databases

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> I'm wondering if there could be a hidden bug in the age arithmetic
> somehow -- say because it's using unsigned comparison of signed
> variables, or the Max() macro is flawed, or something weird like that.

Umm ... actually, there's a bug in the hand calculations I did at the
start of this thread.  The threshold that 8.1 uses is

                         (int32) ((MaxTransactionId >> 3) * 3 - 100000)

which works out to 1610512733, which is considerably more than the 300M
or so age() values that Ian is showing.  I had somehow convinced myself
that it was less and he should be getting whole-db vacuums, but in
reality he shouldn't yet.  It's only in 8.2 and up that we have a much
smaller threshold for trying to advance datfrozenxid.

So actually, what we are looking at is the not-whole-db logic, and
it's working exactly as designed.  The only unexpected behavior was
the choice not to vacuum some of the DBs, which we now know is explained
by their having a last_autovac_time in the future.

Perhaps it is worth adding a defense to autovac to not believe last
autovac times that are in the future, since the system clock is not
under our control and could go backwards.  Other than that I don't
think there's any bug here.

            regards, tom lane

Re: 8.1.8 autovacuum missing databases

От
Tom Lane
Дата:
I wrote:
> Perhaps it is worth adding a defense to autovac to not believe last
> autovac times that are in the future, since the system clock is not
> under our control and could go backwards.  Other than that I don't
> think there's any bug here.

Oh, the other thing that comes to mind is that it's a bad move not
to have exposed last_autovac_time anywhere in the pgstat views.
Obviously we cannot back-patch that, but I think it should be fixed
going forward.  This whole thing might've been much less of a mystery
if that state had been visible.

            regards, tom lane

Re: 8.1.8 autovacuum missing databases

От
Andrew Sullivan
Дата:
On Fri, May 02, 2008 at 12:45:07PM -0400, Tom Lane wrote:

> to have exposed last_autovac_time anywhere in the pgstat views.
> Obviously we cannot back-patch that, but I think it should be fixed
> going forward.

That's already fixed in 8.2 and forward (see
pg_stat_get_last_autovacuum_time() and the pg_stat%tables views).
This feature is actually something I think makes upgrading from 8.1 to
8.2 worth the effort.

A

--
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/

Re: 8.1.8 autovacuum missing databases

От
Tom Lane
Дата:
Andrew Sullivan <ajs@commandprompt.com> writes:
> On Fri, May 02, 2008 at 12:45:07PM -0400, Tom Lane wrote:
>> to have exposed last_autovac_time anywhere in the pgstat views.
>> Obviously we cannot back-patch that, but I think it should be fixed
>> going forward.

> That's already fixed in 8.2 and forward (see
> pg_stat_get_last_autovacuum_time() and the pg_stat%tables views).

Uh, no, what that function returns is the per-table
autovac_vacuum_timestamp.  A grep through the source code shows that
the per-database last_autovac_time is exposed nowhere.

            regards, tom lane

Re: 8.1.8 autovacuum missing databases

От
Andrew Sullivan
Дата:
On Fri, May 02, 2008 at 04:18:17PM -0400, Tom Lane wrote:
> Uh, no, what that function returns is the per-table
> autovac_vacuum_timestamp.  A grep through the source code shows that
> the per-database last_autovac_time is exposed nowhere.

Oh, sorry, I completely misunderstood you.

A

--
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/