Обсуждение: autovacuum was not vacuuming
I’m having some trouble figuring out what was going on that led up to this, but after neglecting maintenance for some time,I noticed that even though autovacuum was not disabled, it never seemed to be getting around to vacuuming. Some basics - PG 9.2, about 150 databases, autovacuum default settings After an initial read of some vacuuming tips, I came up with the following settings: autovacuum = on # Enable autovacuum subprocess? 'on' log_autovacuum_min_duration = 100 # -1 disables, 0 logs all actions and autovacuum_max_workers = 8 # max number of autovacuum subprocesses autovacuum_naptime = 40min # time between autovacuum runs autovacuum_vacuum_scale_factor = 0.1 # fraction of table size before vacuum autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze autovacuum_vacuum_cost_delay = 10ms # default vacuum cost delay for # autovacuum, in milliseconds; autovacuum_vacuum_cost_limit = 1000 # default vacuum cost limit for # autovacuum, -1 means use My main interest there actually was getting logs, but something obviously kicked in as I started seeing a number of databases/tablesgetting vacuumed that had not been vacuumed in over a year. Some of these tables have at least a few thousandinsert/delete/updates per day, some much more than that. Was the combination of the large(?) number of databases and default settings effectively preventing autovacuum from doingits thing? Thanks, Charles
On May 27, 2015, at 1:37 PM, Jorge Torralba <jorge.torralba@gmail.com> wrote: > run this for validation > > select relname, last_autovacuum from pg_stat_user_tables order by last_autovacuum; Hi - thanks, I’d run some similar query before to see what wasn’t being vacuumed. While some tables seem to be now, here’s an example of an active db that’s still got some blanks in the last_autovacuum column: relname | last_autovacuum ---------------------+------------------------------- optin | 2014-07-21 06:31:47.66319-04 log | 2014-10-18 04:55:22.227628-04 complaint | 2014-10-18 10:43:18.224247-04 arch_subscriber | 2014-11-19 08:27:32.895688-05 subscriber | 2014-12-05 10:12:48.340305-05 transactions | 2015-02-18 12:47:42.406487-05 list_sub | 2015-05-22 10:12:27.755863-04 post | 2015-05-23 01:46:47.039179-04 post_job | 2015-05-23 21:49:03.25371-04 post_actions | 2015-05-26 10:03:43.259077-04 bounce | 2015-05-26 10:29:34.888937-04 bounce_reason | 2015-05-26 10:29:34.918845-04 list_data | 2015-05-27 09:50:21.202492-04 limit_tracking | 2015-05-27 13:10:51.673256-04 post_stats | 2015-05-27 13:10:52.921319-04 email | sending_rule_fields | sending_rule_files | post_threads | forwards | list | sub_active | subscriber_db_field | slice_fields | sending_rule_esps | client | slices | fields | arch_list_sub | sending_rules | subscriber_db_query | engagement | reject_rule | list_criterion | repltest | post_attachment | filters | messages | post_domain | track_url | subscriber_data | message_attachments | migrations | subscriptions | mailer | subscriber_db_file | track_hit | engagement_rules | (48 rows) Any insights based on that? Thanks, Charles > On Tue, May 26, 2015 at 10:02 PM, Charles Sprickman <spork@biglist.com> wrote: > I’m having some trouble figuring out what was going on that led up to this, but after neglecting maintenance for some time,I noticed that even though autovacuum was not disabled, it never seemed to be getting around to vacuuming. > > Some basics - PG 9.2, about 150 databases, autovacuum default settings > > After an initial read of some vacuuming tips, I came up with the following settings: > > autovacuum = on # Enable autovacuum subprocess? 'on' > log_autovacuum_min_duration = 100 # -1 disables, 0 logs all actions and > autovacuum_max_workers = 8 # max number of autovacuum subprocesses > autovacuum_naptime = 40min # time between autovacuum runs > autovacuum_vacuum_scale_factor = 0.1 # fraction of table size before vacuum > autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze > autovacuum_vacuum_cost_delay = 10ms # default vacuum cost delay for > # autovacuum, in milliseconds; > autovacuum_vacuum_cost_limit = 1000 # default vacuum cost limit for > # autovacuum, -1 means use > > My main interest there actually was getting logs, but something obviously kicked in as I started seeing a number of databases/tablesgetting vacuumed that had not been vacuumed in over a year. Some of these tables have at least a few thousandinsert/delete/updates per day, some much more than that. > > Was the combination of the large(?) number of databases and default settings effectively preventing autovacuum from doingits thing? > > Thanks, > > Charles > > -- > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-admin > > > > -- > Thanks, > > Jorge Torralba > ---------------------------- > > Note: This communication may contain privileged or other confidential information. If you are not the intended recipient,please do not print, copy, retransmit, disseminate or otherwise use the information. Please indicate to the senderthat you have received this email in error and delete the copy you received. Thank You.
Charles Sprickman wrote: > On May 27, 2015, at 1:37 PM, Jorge Torralba <jorge.torralba@gmail.com> wrote: > > > run this for validation > > > > select relname, last_autovacuum from pg_stat_user_tables order by last_autovacuum; > > Hi - thanks, I’d run some similar query before to see what wasn’t being vacuumed. > > While some tables seem to be now, here’s an example of an active db that’s still got some blanks in the last_autovacuumcolumn: Does your system crash frequently? autovacuum would not process files unless they appear to cross the threshold values; and the statistics are reset whenever there's a crash recovery. Perhaps those tables don't have enough recent activity. I see your oldest autovac dates are 2014-07 so perhaps you had crashes sometime before that. How many workers are currently processing tables? (See pg_stat_activity) The more workers there are, the slower they become. Perhaps they are all busy processing large tables and they never finish because of the vacuum_delay. On the other hand, naptime=40min means that from one worker start to the next one there's a 40min wait, which seems a bit too long. Normally the default of 1min is appropriate; why did you change that? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On May 27, 2015, at 2:00 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Charles Sprickman wrote: >> On May 27, 2015, at 1:37 PM, Jorge Torralba <jorge.torralba@gmail.com> wrote: >> >>> run this for validation >>> >>> select relname, last_autovacuum from pg_stat_user_tables order by last_autovacuum; >> >> Hi - thanks, I’d run some similar query before to see what wasn’t being vacuumed. >> >> While some tables seem to be now, here’s an example of an active db that’s still got some blanks in the last_autovacuumcolumn: > > Does your system crash frequently? autovacuum would not process files > unless they appear to cross the threshold values; and the statistics are > reset whenever there's a crash recovery. No frequent crashes. Not any that I know of even since moving to this server (up 1046 days). So thats one thing to rule out. I also know that no one has manually reset any of the stats counters. > Perhaps those tables don't > have enough recent activity. I see your oldest autovac dates are > 2014-07 so perhaps you had crashes sometime before that. This is starting to make sense. On the same database, this data is kind of interesting: relname | n_tup_ins | n_tup_upd | n_tup_del | last_autovacuum | autovacuum_count ---------------------+-----------+-----------+-----------+-------------------------------+------------------ optin | 7252274 | 0 | 1975631 | 2014-07-21 06:31:47.66319-04 | 3 log | 43045279 | 0 | 31641637 | 2014-10-18 04:55:22.227628-04 | 8 complaint | 3002663 | 6154 | 3002973 | 2014-10-18 10:43:18.224247-04 | 31 arch_subscriber | 8472659 | 8184029 | 96798 | 2014-11-19 08:27:32.895688-05 | 2 subscriber | 13625052 | 28592779 | 8473154 | 2014-12-05 10:12:48.340305-05 | 14 transactions | 2514676 | 5182 | 257117 | 2015-02-18 12:47:42.406487-05 | 3 post | 32521 | 50722 | 1557 | 2015-05-23 01:46:47.039179-04 | 1 post_actions | 98395891 | 43836 | 44881522 | 2015-05-26 10:03:43.259077-04 | 3 bounce | 631024 | 81 | 595087 | 2015-05-26 10:29:34.888937-04 | 13 bounce_reason | 7961281 | 32736 | 7953428 | 2015-05-26 10:29:34.918845-04 | 6 list_sub | 14562995 | 69515258 | 9009496 | 2015-05-28 15:58:13.754569-04 | 30 post_job | 8127 | 15908 | 8070 | 2015-05-28 21:52:54.923614-04 | 263 list_data | 7350 | 756705 | 6063 | 2015-05-29 12:34:42.323142-04 | 425 post_stats | 90155 | 40750025 | 108 | 2015-05-29 13:14:43.307775-04 | 569 limit_tracking | 30248 | 12689580 | 30192 | 2015-05-29 13:54:42.06234-04 | 50780 email | 0 | 0 | 0 | | 0 sending_rule_fields | 0 | 0 | 0 | | 0 sending_rule_files | 0 | 0 | 0 | | 0 post_threads | 0 | 0 | 0 | | 0 forwards | 0 | 0 | 0 | | 0 list | 9 | 28 | 7 | | 0 sub_active | 840019 | 0 | 0 | | 0 subscriber_db_field | 0 | 0 | 0 | | 0 slice_fields | 0 | 0 | 0 | | 0 sending_rule_esps | 0 | 0 | 0 | | 0 client | 1 | 0 | 0 | | 0 slices | 0 | 0 | 0 | | 0 fields | 11 | 0 | 0 | | 0 arch_list_sub | 8909636 | 0 | 101140 | | 0 sending_rules | 0 | 0 | 0 | | 0 subscriber_db_query | 0 | 0 | 0 | | 0 engagement | 0 | 0 | 0 | | 0 reject_rule | 608 | 0 | 0 | | 0 list_criterion | 0 | 0 | 0 | | 0 repltest | 0 | 0 | 0 | | 0 post_attachment | 1 | 0 | 0 | | 0 filters | 0 | 0 | 0 | | 0 messages | 0 | 0 | 0 | | 0 post_domain | 262398 | 1177 | 290 | | 0 track_url | 786 | 1394 | 1 | | 0 subscriber_data | 0 | 0 | 0 | | 0 message_attachments | 0 | 0 | 0 | | 0 migrations | 11 | 0 | 0 | | 0 subscriptions | 8606373 | 463446 | 0 | | 0 mailer | 0 | 0 | 0 | | 0 subscriber_db_file | 0 | 0 | 0 | | 0 track_hit | 29733544 | 86038 | 6383080 | | 0 engagement_rules | 0 | 0 | 0 | | 0 (48 rows) Of all the tables that have NOT been autovacuumed, only a handful have any real write activity. And on those I cant really get any data as to WHEN the last activity was (unless I reset stats and wait) although looking a bit closer I see some timestamped columns and none of these tables have been touched in years. Good, I think. Should I be worried about limit_tracking which has been vacuumed about 50,000 times in the last few days? :) That seems a little extreme. Now if we assume the tables that werent vacuumed are not seeing any new activity, my next problem is how to satisfy my nagios check_postgres checks that look for tables that have not been vacuumed in X days. > How many workers are currently processing tables? (See > pg_stat_activity) The more workers there are, the slower they become. By workers, I assume you just mean pg processes? It varies, but anywhere between a few dozen and a hundred. > Perhaps they are all busy processing large tables and they never finish > because of the vacuum_delay. On the other hand, naptime=40min means > that from one worker start to the next one there's a 40min wait, which > seems a bit too long. Normally the default of 1min is appropriate; why > did you change that? I think because I dont fully grasp how all these variables interact. IIRC, when I set this it had something to do with compensating for the number of databases and the number of workers. Im looking at pgbadger’s analysis of vacuums and I think Im mostly OK with whats happening. Im still stumped on why changes from defaults were necessary to get autovacuum to kick in though Thanks, Charles > > -- > Álvaro Herrera http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > > > -- > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-admin