Обсуждение: autovacuum was not vacuuming

Поиск
Список
Период
Сортировка

autovacuum was not vacuuming

От
Charles Sprickman
Дата:
I’m having some trouble figuring out what was going on that led up to this, but after neglecting maintenance for some
time,I noticed that even though autovacuum was not disabled, it never seemed to be getting around to vacuuming. 

Some basics - PG 9.2, about 150 databases, autovacuum default settings

After an initial read of some vacuuming tips, I came up with the following settings:

autovacuum = on                # Enable autovacuum subprocess?  'on'
log_autovacuum_min_duration = 100    # -1 disables, 0 logs all actions and
autovacuum_max_workers = 8        # max number of autovacuum subprocesses
autovacuum_naptime = 40min        # time between autovacuum runs
autovacuum_vacuum_scale_factor = 0.1    # fraction of table size before vacuum
autovacuum_analyze_scale_factor = 0.1    # fraction of table size before analyze
autovacuum_vacuum_cost_delay = 10ms    # default vacuum cost delay for
                    # autovacuum, in milliseconds;
autovacuum_vacuum_cost_limit = 1000    # default vacuum cost limit for
                    # autovacuum, -1 means use

My main interest there actually was getting logs, but something obviously kicked in as I started seeing a number of
databases/tablesgetting vacuumed that had not been vacuumed in over a year.  Some of these tables have at least a few
thousandinsert/delete/updates per day, some much more than that. 

Was the combination of the large(?) number of databases and default settings effectively preventing autovacuum from
doingits thing? 

Thanks,

Charles

Re: autovacuum was not vacuuming

От
Charles Sprickman
Дата:
On May 27, 2015, at 1:37 PM, Jorge Torralba <jorge.torralba@gmail.com> wrote:

> run this for validation
>
> select relname, last_autovacuum from pg_stat_user_tables order by last_autovacuum;

Hi - thanks, I’d run some similar query before to see what wasn’t being vacuumed.

While some tables seem to be now, here’s an example of an active db that’s still got some blanks in the last_autovacuum
column:

       relname       |        last_autovacuum
---------------------+-------------------------------
 optin               | 2014-07-21 06:31:47.66319-04
 log                 | 2014-10-18 04:55:22.227628-04
 complaint           | 2014-10-18 10:43:18.224247-04
 arch_subscriber     | 2014-11-19 08:27:32.895688-05
 subscriber          | 2014-12-05 10:12:48.340305-05
 transactions        | 2015-02-18 12:47:42.406487-05
 list_sub            | 2015-05-22 10:12:27.755863-04
 post                | 2015-05-23 01:46:47.039179-04
 post_job            | 2015-05-23 21:49:03.25371-04
 post_actions        | 2015-05-26 10:03:43.259077-04
 bounce              | 2015-05-26 10:29:34.888937-04
 bounce_reason       | 2015-05-26 10:29:34.918845-04
 list_data           | 2015-05-27 09:50:21.202492-04
 limit_tracking      | 2015-05-27 13:10:51.673256-04
 post_stats          | 2015-05-27 13:10:52.921319-04
 email               |
 sending_rule_fields |
 sending_rule_files  |
 post_threads        |
 forwards            |
 list                |
 sub_active          |
 subscriber_db_field |
 slice_fields        |
 sending_rule_esps   |
 client              |
 slices              |
 fields              |
 arch_list_sub       |
 sending_rules       |
 subscriber_db_query |
 engagement          |
 reject_rule         |
 list_criterion      |
 repltest            |
 post_attachment     |
 filters             |
 messages            |
 post_domain         |
 track_url           |
 subscriber_data     |
 message_attachments |
 migrations          |
 subscriptions       |
 mailer              |
 subscriber_db_file  |
 track_hit           |
 engagement_rules    |
(48 rows)

Any insights based on that?

Thanks,

Charles


> On Tue, May 26, 2015 at 10:02 PM, Charles Sprickman <spork@biglist.com> wrote:
> I’m having some trouble figuring out what was going on that led up to this, but after neglecting maintenance for some
time,I noticed that even though autovacuum was not disabled, it never seemed to be getting around to vacuuming. 
>
> Some basics - PG 9.2, about 150 databases, autovacuum default settings
>
> After an initial read of some vacuuming tips, I came up with the following settings:
>
> autovacuum = on                         # Enable autovacuum subprocess?  'on'
> log_autovacuum_min_duration = 100       # -1 disables, 0 logs all actions and
> autovacuum_max_workers = 8              # max number of autovacuum subprocesses
> autovacuum_naptime = 40min              # time between autovacuum runs
> autovacuum_vacuum_scale_factor = 0.1    # fraction of table size before vacuum
> autovacuum_analyze_scale_factor = 0.1   # fraction of table size before analyze
> autovacuum_vacuum_cost_delay = 10ms     # default vacuum cost delay for
>                                         # autovacuum, in milliseconds;
> autovacuum_vacuum_cost_limit = 1000     # default vacuum cost limit for
>                                         # autovacuum, -1 means use
>
> My main interest there actually was getting logs, but something obviously kicked in as I started seeing a number of
databases/tablesgetting vacuumed that had not been vacuumed in over a year.  Some of these tables have at least a few
thousandinsert/delete/updates per day, some much more than that. 
>
> Was the combination of the large(?) number of databases and default settings effectively preventing autovacuum from
doingits thing? 
>
> Thanks,
>
> Charles
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>
>
>
> --
> Thanks,
>
> Jorge Torralba
> ----------------------------
>
> Note: This communication may contain privileged or other confidential information. If you are not the intended
recipient,please do not print, copy, retransmit, disseminate or otherwise use the information. Please indicate to the
senderthat you have received this email in error and delete the copy you received. Thank You. 



Re: autovacuum was not vacuuming

От
Alvaro Herrera
Дата:
Charles Sprickman wrote:
> On May 27, 2015, at 1:37 PM, Jorge Torralba <jorge.torralba@gmail.com> wrote:
>
> > run this for validation
> >
> > select relname, last_autovacuum from pg_stat_user_tables order by last_autovacuum;
>
> Hi - thanks, I’d run some similar query before to see what wasn’t being vacuumed.
>
> While some tables seem to be now, here’s an example of an active db that’s still got some blanks in the
last_autovacuumcolumn: 

Does your system crash frequently?  autovacuum would not process files
unless they appear to cross the threshold values; and the statistics are
reset whenever there's a crash recovery.  Perhaps those tables don't
have enough recent activity.  I see your oldest autovac dates are
2014-07 so perhaps you had crashes sometime before that.

How many workers are currently processing tables?  (See
pg_stat_activity)  The more workers there are, the slower they become.
Perhaps they are all busy processing large tables and they never finish
because of the vacuum_delay.  On the other hand, naptime=40min means
that from one worker start to the next one there's a 40min wait, which
seems a bit too long.  Normally the default of 1min is appropriate; why
did you change that?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: autovacuum was not vacuuming

От
Charles Sprickman
Дата:
On May 27, 2015, at 2:00 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> Charles Sprickman wrote:
>> On May 27, 2015, at 1:37 PM, Jorge Torralba <jorge.torralba@gmail.com> wrote:
>>
>>> run this for validation
>>>
>>> select relname, last_autovacuum from pg_stat_user_tables order by last_autovacuum;
>>
>> Hi - thanks, I’d run some similar query before to see what wasn’t being vacuumed.
>>
>> While some tables seem to be now, here’s an example of an active db that’s still got some blanks in the
last_autovacuumcolumn: 
>
> Does your system crash frequently?  autovacuum would not process files
> unless they appear to cross the threshold values; and the statistics are
> reset whenever there's a crash recovery.

No frequent crashes.  Not any that I know of even since moving to
this server (up 1046 days).  So thats one thing to rule out.  I also
know that no one has manually reset any of the stats counters.

> Perhaps those tables don't
> have enough recent activity.  I see your oldest autovac dates are
> 2014-07 so perhaps you had crashes sometime before that.

This is starting to make sense.

On the same database, this data is kind of interesting:

       relname       | n_tup_ins | n_tup_upd | n_tup_del |        last_autovacuum        | autovacuum_count
---------------------+-----------+-----------+-----------+-------------------------------+------------------
 optin               |   7252274 |         0 |   1975631 | 2014-07-21 06:31:47.66319-04  |                3
 log                 |  43045279 |         0 |  31641637 | 2014-10-18 04:55:22.227628-04 |                8
 complaint           |   3002663 |      6154 |   3002973 | 2014-10-18 10:43:18.224247-04 |               31
 arch_subscriber     |   8472659 |   8184029 |     96798 | 2014-11-19 08:27:32.895688-05 |                2
 subscriber          |  13625052 |  28592779 |   8473154 | 2014-12-05 10:12:48.340305-05 |               14
 transactions        |   2514676 |      5182 |    257117 | 2015-02-18 12:47:42.406487-05 |                3
 post                |     32521 |     50722 |      1557 | 2015-05-23 01:46:47.039179-04 |                1
 post_actions        |  98395891 |     43836 |  44881522 | 2015-05-26 10:03:43.259077-04 |                3
 bounce              |    631024 |        81 |    595087 | 2015-05-26 10:29:34.888937-04 |               13
 bounce_reason       |   7961281 |     32736 |   7953428 | 2015-05-26 10:29:34.918845-04 |                6
 list_sub            |  14562995 |  69515258 |   9009496 | 2015-05-28 15:58:13.754569-04 |               30
 post_job            |      8127 |     15908 |      8070 | 2015-05-28 21:52:54.923614-04 |              263
 list_data           |      7350 |    756705 |      6063 | 2015-05-29 12:34:42.323142-04 |              425
 post_stats          |     90155 |  40750025 |       108 | 2015-05-29 13:14:43.307775-04 |              569
 limit_tracking      |     30248 |  12689580 |     30192 | 2015-05-29 13:54:42.06234-04  |            50780
 email               |         0 |         0 |         0 |                               |                0
 sending_rule_fields |         0 |         0 |         0 |                               |                0
 sending_rule_files  |         0 |         0 |         0 |                               |                0
 post_threads        |         0 |         0 |         0 |                               |                0
 forwards            |         0 |         0 |         0 |                               |                0
 list                |         9 |        28 |         7 |                               |                0
 sub_active          |    840019 |         0 |         0 |                               |                0
 subscriber_db_field |         0 |         0 |         0 |                               |                0
 slice_fields        |         0 |         0 |         0 |                               |                0
 sending_rule_esps   |         0 |         0 |         0 |                               |                0
 client              |         1 |         0 |         0 |                               |                0
 slices              |         0 |         0 |         0 |                               |                0
 fields              |        11 |         0 |         0 |                               |                0
 arch_list_sub       |   8909636 |         0 |    101140 |                               |                0
 sending_rules       |         0 |         0 |         0 |                               |                0
 subscriber_db_query |         0 |         0 |         0 |                               |                0
 engagement          |         0 |         0 |         0 |                               |                0
 reject_rule         |       608 |         0 |         0 |                               |                0
 list_criterion      |         0 |         0 |         0 |                               |                0
 repltest            |         0 |         0 |         0 |                               |                0
 post_attachment     |         1 |         0 |         0 |                               |                0
 filters             |         0 |         0 |         0 |                               |                0
 messages            |         0 |         0 |         0 |                               |                0
 post_domain         |    262398 |      1177 |       290 |                               |                0
 track_url           |       786 |      1394 |         1 |                               |                0
 subscriber_data     |         0 |         0 |         0 |                               |                0
 message_attachments |         0 |         0 |         0 |                               |                0
 migrations          |        11 |         0 |         0 |                               |                0
 subscriptions       |   8606373 |    463446 |         0 |                               |                0
 mailer              |         0 |         0 |         0 |                               |                0
 subscriber_db_file  |         0 |         0 |         0 |                               |                0
 track_hit           |  29733544 |     86038 |   6383080 |                               |                0
 engagement_rules    |         0 |         0 |         0 |                               |                0
(48 rows)

Of all the tables that have NOT been autovacuumed, only a handful
have any real write activity.  And on those I cant really get any
data as to WHEN the last activity was (unless I reset stats and
wait) although looking a bit closer I see some timestamped columns
and none of these tables have been touched in years.  Good, I think.

Should I be worried about limit_tracking which has been vacuumed
about 50,000 times in the last few days?  :) That seems a little
extreme.

Now if we assume the tables that werent vacuumed are not seeing any
new activity, my next problem is how to satisfy my nagios
check_postgres checks that look for tables that have not been
vacuumed in X days.

> How many workers are currently processing tables?  (See
> pg_stat_activity)  The more workers there are, the slower they become.

By workers, I assume you just mean pg processes?  It varies, but
anywhere between a few dozen and a hundred.

> Perhaps they are all busy processing large tables and they never finish
> because of the vacuum_delay.  On the other hand, naptime=40min means
> that from one worker start to the next one there's a 40min wait, which
> seems a bit too long.  Normally the default of 1min is appropriate; why
> did you change that?

I think because I dont fully grasp how all these variables interact.
IIRC, when I set this it had something to do with compensating for
the number of databases and the number of workers.

Im looking at pgbadger’s analysis of vacuums and I think Im mostly OK
with whats happening.

Im still stumped on why changes from defaults were necessary to get
autovacuum to kick in though

Thanks,

Charles

>
> --
> Álvaro Herrera                http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin