Обсуждение: [ADMIN] Why is my table not autovacuuming?

Поиск
Список
Период
Сортировка

[ADMIN] Why is my table not autovacuuming?

От
Don Seiler
Дата:
PostgreSQL 9.2.18. I have a table with 71M rows (via select count). In pg_stat_activity, n_live_tups is 170K, n_dead_tups is slightly more than that (203K). The autovacuum parameter is on, the scale factor is 0.2 (default) and threshold is 50 (default), autovacuum_max_workers is 3. There are no custom autovacuum thresholds set for this table. The last_autovacuum field is null.

Obviously I would expect autovacuum/autoanalyze to kick in for this table so the stats aren't getting so far out of whack from reality. When I query pg_stat_activity, I never see any autovacuum jobs running, so it isn't a matter of too much work from what I can tell.

Previously I had tables not being autovacuumed because of index corruption (which would also cause manual vacuum to fail), but I was just able to successfully vacuum this table manually and now pg_stat_activity show the accurate info.

Wondering why my autovacuum daemon appears to be acting lazily?

Don.

--
Don Seiler
www.seiler.us

Re: [ADMIN] Why is my table not autovacuuming?

От
Phil Frost
Дата:
With a scale factor of 0.2, autovacuum won't kick in until there are an estimated 0.2 * 71M = 14.2M dead tuples. For such large tables, it may be prudent to adjust the autovacuum parameters to have a lower scale factor. You can also set the scale factor to zero, and then the threshold alone determines when autovacuum runs: you'd set the threshold to some number much larger than the default 50.


On Mon, Aug 28, 2017 at 11:45 AM Don Seiler <don@seiler.us> wrote:
PostgreSQL 9.2.18. I have a table with 71M rows (via select count). In pg_stat_activity, n_live_tups is 170K, n_dead_tups is slightly more than that (203K). The autovacuum parameter is on, the scale factor is 0.2 (default) and threshold is 50 (default), autovacuum_max_workers is 3. There are no custom autovacuum thresholds set for this table. The last_autovacuum field is null.

Obviously I would expect autovacuum/autoanalyze to kick in for this table so the stats aren't getting so far out of whack from reality. When I query pg_stat_activity, I never see any autovacuum jobs running, so it isn't a matter of too much work from what I can tell.

Previously I had tables not being autovacuumed because of index corruption (which would also cause manual vacuum to fail), but I was just able to successfully vacuum this table manually and now pg_stat_activity show the accurate info.

Wondering why my autovacuum daemon appears to be acting lazily?

Don.

--
Don Seiler
www.seiler.us

Re: [ADMIN] Why is my table not autovacuuming?

От
Don Seiler
Дата:
So the autovacuum daemon knows that there are 71M rows, even though pg_stat_all_tables said only 170K? Is it using pg_class.reltuples? Is pg_stat_all_tables not really a reliable source of stats information?

Don.

On Mon, Aug 28, 2017 at 10:55 AM, Phil Frost <phil@postmates.com> wrote:
With a scale factor of 0.2, autovacuum won't kick in until there are an estimated 0.2 * 71M = 14.2M dead tuples. For such large tables, it may be prudent to adjust the autovacuum parameters to have a lower scale factor. You can also set the scale factor to zero, and then the threshold alone determines when autovacuum runs: you'd set the threshold to some number much larger than the default 50.


On Mon, Aug 28, 2017 at 11:45 AM Don Seiler <don@seiler.us> wrote:
PostgreSQL 9.2.18. I have a table with 71M rows (via select count). In pg_stat_activity, n_live_tups is 170K, n_dead_tups is slightly more than that (203K). The autovacuum parameter is on, the scale factor is 0.2 (default) and threshold is 50 (default), autovacuum_max_workers is 3. There are no custom autovacuum thresholds set for this table. The last_autovacuum field is null.

Obviously I would expect autovacuum/autoanalyze to kick in for this table so the stats aren't getting so far out of whack from reality. When I query pg_stat_activity, I never see any autovacuum jobs running, so it isn't a matter of too much work from what I can tell.

Previously I had tables not being autovacuumed because of index corruption (which would also cause manual vacuum to fail), but I was just able to successfully vacuum this table manually and now pg_stat_activity show the accurate info.

Wondering why my autovacuum daemon appears to be acting lazily?

Don.

--
Don Seiler
www.seiler.us



--
Don Seiler
www.seiler.us

Re: [ADMIN] Why is my table not autovacuuming?

От
Joe Conway
Дата:
On 08/28/2017 08:43 AM, Don Seiler wrote:
> PostgreSQL 9.2.18. I have a table with 71M rows (via select count). In
> pg_stat_activity, n_live_tups is 170K, n_dead_tups is slightly more than
> that (203K). The autovacuum parameter is on, the scale factor is 0.2
> (default) and threshold is 50 (default), autovacuum_max_workers is 3.
> There are no custom autovacuum thresholds set for this table. The
> last_autovacuum field is null.

<snip>

> Wondering why my autovacuum daemon appears to be acting lazily?

Autovac will kick in at (threshold + (scale_factor x 71M)), i.e. about
14.2 million dead rows. You are currently nowhere close to that.

That said, the default scale_factor is not good to such a large table.
Reduce it significantly. I often will find it necessary to use custom
values for large tables. One strategy I have used is scale_factor = 0
and threshold = <num_dead_rows_max_desired>.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Вложения

Re: [ADMIN] Why is my table not autovacuuming?

От
Don Seiler
Дата:
Same concerns as Phil's reply. Are the numbers in pg_stat_all_tables not part of what autovacuum looks at? I assume those numbers would be updated by autoanalyze but if that only kicks in at 10% scale factor then that wouldn't happen for a while either? Seems like there's a huge disconnect here, or I just don't have my head around this properly. Or both.

Don.

On Mon, Aug 28, 2017 at 10:59 AM, Joe Conway <mail@joeconway.com> wrote:
On 08/28/2017 08:43 AM, Don Seiler wrote:
> PostgreSQL 9.2.18. I have a table with 71M rows (via select count). In
> pg_stat_activity, n_live_tups is 170K, n_dead_tups is slightly more than
> that (203K). The autovacuum parameter is on, the scale factor is 0.2
> (default) and threshold is 50 (default), autovacuum_max_workers is 3.
> There are no custom autovacuum thresholds set for this table. The
> last_autovacuum field is null.

<snip>

> Wondering why my autovacuum daemon appears to be acting lazily?

Autovac will kick in at (threshold + (scale_factor x 71M)), i.e. about
14.2 million dead rows. You are currently nowhere close to that.

That said, the default scale_factor is not good to such a large table.
Reduce it significantly. I often will find it necessary to use custom
values for large tables. One strategy I have used is scale_factor = 0
and threshold = <num_dead_rows_max_desired>.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




--
Don Seiler
www.seiler.us

Re: [ADMIN] Why is my table not autovacuuming?

От
Joe Conway
Дата:
On 08/28/2017 08:59 AM, Don Seiler wrote:
> So the autovacuum daemon knows that there are 71M rows, even though
> pg_stat_all_tables said only 170K? Is it using pg_class.reltuples? Is
> pg_stat_all_tables not really a reliable source of stats information?

See:
file:///opt/src/pgsql-git/master/src/backend/postmaster/autovacuum.c

 * threshold = vac_base_thresh + vac_scale_factor * reltuples

pg_stat_all_tables only shows information known by the stats collector,
which only knows about activity since the last stats reset (which can
happen, for example on "immediate" shutdown or server crash), not
absolute numbers (although reltuples is only an estimate too).

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Вложения

Re: [ADMIN] Why is my table not autovacuuming?

От
Don Seiler
Дата:
Good to know, thanks very much!

Don.

On Mon, Aug 28, 2017 at 11:13 AM, Joe Conway <mail@joeconway.com> wrote:
On 08/28/2017 08:59 AM, Don Seiler wrote:
> So the autovacuum daemon knows that there are 71M rows, even though
> pg_stat_all_tables said only 170K? Is it using pg_class.reltuples? Is
> pg_stat_all_tables not really a reliable source of stats information?

See:
file:///opt/src/pgsql-git/master/src/backend/postmaster/autovacuum.c

 * threshold = vac_base_thresh + vac_scale_factor * reltuples

pg_stat_all_tables only shows information known by the stats collector,
which only knows about activity since the last stats reset (which can
happen, for example on "immediate" shutdown or server crash), not
absolute numbers (although reltuples is only an estimate too).

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development




--
Don Seiler
www.seiler.us

Re: [ADMIN] Why is my table not autovacuuming?

От
Joe Conway
Дата:
On 08/28/2017 09:12 AM, Don Seiler wrote:
> Same concerns as Phil's reply. Are the numbers in pg_stat_all_tables not
> part of what autovacuum looks at? I assume those numbers would be
> updated by autoanalyze but if that only kicks in at 10% scale factor
> then that wouldn't happen for a while either? Seems like there's a huge
> disconnect here, or I just don't have my head around this properly. Or both.

Well some values are maintained by analyze, but with a 0.1 scale factor
for analyze that still means 7 million rows or so threshold for that to
happen. E.g. see:

src/backend/commands/analyze.c
-------
/*
 * Report ANALYZE to the stats collector, too.  However, if doing
 * inherited stats we shouldn't report, because the stats collector only
 * tracks per-table stats.  Reset the changes_since_analyze counter only
 * if we analyzed all columns; otherwise, there is still work for
 * auto-analyze to do.
 */
-------

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Вложения

Re: [ADMIN] Why is my table not autovacuuming?

От
Don Seiler
Дата:
On Mon, Aug 28, 2017 at 11:25 AM, Joe Conway <mail@joeconway.com> wrote:
Well some values are maintained by analyze, but with a 0.1 scale factor
for analyze that still means 7 million rows or so threshold for that to
happen. 

I think what I'm failing to understand is how it got so big without an autoanalyze kicking in at some point to keep relatively in sync. To my knowledge, autovacuum was never disabled here (I'm new, so I can't say with any certainty that it never was). I would expect as the inserted rows grow past a certain threshold, it would kick in an occasional analyze so it wouldn't get over 2 orders of magnitude off from 170K to 71M.

Don.

--
Don Seiler
www.seiler.us

Re: [ADMIN] Why is my table not autovacuuming?

От
Joe Conway
Дата:
On 08/28/2017 09:31 AM, Don Seiler wrote:
> I think what I'm failing to understand is how it got so big without an
> autoanalyze kicking in at some point to keep relatively in sync. To my
> knowledge, autovacuum was never disabled here (I'm new, so I can't say
> with any certainty that it never was). I would expect as the inserted
> rows grow past a certain threshold, it would kick in an occasional
> analyze so it wouldn't get over 2 orders of magnitude off from 170K to 71M.

I can't say without knowing the history of this table and your server.

Autoanalyze with default settings should kick in at 0.1 (10%) of the
table changing. Lets say your table exists with 71 million rows. Then at
some point your server lost power. The info in the stats collector gets
reset on recovery. Now you modify 170k rows. That is now the change that
autoanalyze will base its decision on. It is looking for 0.1 x 70
million rows to change which is 7 million. You currently only have 170k
that have changed.

Bottom line is as I said earlier -- the default values for autovac and
autoanalyze scaling factor are ridiculously high for a table this large.
Do a manual VACUUM ANALYZE on it, and modify the settings for this table.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Вложения

Re: [ADMIN] Why is my table not autovacuuming?

От
Don Seiler
Дата:
On Mon, Aug 28, 2017 at 11:44 AM, Joe Conway <mail@joeconway.com> wrote:
Bottom line is as I said earlier -- the default values for autovac and
autoanalyze scaling factor are ridiculously high for a table this large.
Do a manual VACUUM ANALYZE on it, and modify the settings for this table.

Yeah I definitely plan to customize the thresholds for the very large tables like this.

I just find it very curious how pg_stat_all_tables.n_live_tups can be so very very far off in this case unless stats tracking was somehow disabled during a large bulk load or something. I'll keep an eye on things.

Thanks again for all your help!

Don.
 
--
Don Seiler
www.seiler.us

Re: [ADMIN] Why is my table not autovacuuming?

От
"David G. Johnston"
Дата:
On Mon, Aug 28, 2017 at 8:43 AM, Don Seiler <don@seiler.us> wrote:
PostgreSQL 9.2.18. I have a table with 71M rows (via select count). In pg_stat_activity, n_live_tups is 170K, n_dead_tups is slightly more than that (203K). The autovacuum parameter is on, the scale factor is 0.2 (default) and threshold is 50 (default), autovacuum_max_workers is 3. There are no custom autovacuum thresholds set for this table. The last_autovacuum field is null.

Obviously I would expect autovacuum/autoanalyze to kick in for this table so the stats aren't getting so far out of whack from reality. When I query pg_stat_activity, I never see any autovacuum jobs running, so it isn't a matter of too much work from what I can tell.


​autovacuum <> autoanalyze - even though the same underlying executable performs both tasks.  If you are questioning why autoanalyze is not running you need to inspect the autoanalyze fields, not the autovacuum ones (both the indicators and the source data elements). Seeing an auto-analyze in pg_stat_activity would be pretty unlikely given how quickly they tend to run.

In addition:


"​The autovacuum daemon, if enabled, will automatically issue ANALYZE commands whenever the content of a table has changed sufficiently."
...
"The daemon schedules ANALYZE strictly as a function of the number of rows inserted or updated;"

Depending on your findings in those columns you may want to ensure that you haven't accidentally disabled the autovacuum daemon process.


The question I have is how did this table get to be 71M records large?  One bulk load or 1M records inserted a day for 2 months?  How often do those records change once inserted?

I'm done for now but will leave with my own thought.  The statistical counts can be reset, and I assume analyze must reset the counts that it uses as input once it has completed.  But I can imagine a well-meaning but misguided activity monitoring script that wants to see (e.g. daily) volume choosing to issue a "pg_stat_reset()" daily just after capturing the count from the previous day.  For sufficiently large tables its likely auto-analyze would never be run again because the magnitude of the counts never gets high enough.

David J.

Re: [ADMIN] Why is my table not autovacuuming?

От
Joe Conway
Дата:
On 08/28/2017 09:49 AM, Don Seiler wrote:
> I just find it very curious how pg_stat_all_tables.n_live_tups can be so
> very very far off in this case unless stats tracking was somehow
> disabled during a large bulk load or something. I'll keep an eye on things.

If you have a dev machine where you can run an experiment, try the
following:

(warning: sorry but this is quite long...)

8<--------------
createdb test
psql test
8<--------------

create table test(id int);
insert into test select g.i from generate_series(1,1000000) as g(i);

-- wait a while for autovacuum to run
select pg_sleep(90);

\x
select reltuples from pg_class where relname = 'test';
-[ RECORD 1 ]----
reltuples | 1e+06

select * from pg_stat_all_tables where relname = 'test';
-[ RECORD 1 ]-------+------------------------------
relid               | 16385
schemaname          | public
relname             | test
seq_scan            | 0
seq_tup_read        | 0
idx_scan            |
idx_tup_fetch       |
n_tup_ins           | 1000000
n_tup_upd           | 0
n_tup_del           | 0
n_tup_hot_upd       | 0
n_live_tup          | 1000000
n_dead_tup          | 0
n_mod_since_analyze | 0
last_vacuum         |
last_autovacuum     |
last_analyze        |
last_autoanalyze    | 2017-08-28 10:10:55.046554-07
vacuum_count        | 0
autovacuum_count    | 0
analyze_count       | 0
autoanalyze_count   | 1

8<--------------
-- quit postgres
-- do an *immediate* shutdown
-- can be done a few ways, but for example see
-- https://www.postgresql.org/docs/9.6/static/app-pg-ctl.html

-- restart postgres
8<--------------
psql test
8<--------------

\x
select reltuples from pg_class where relname = 'test';
-[ RECORD 1 ]----
reltuples | 1e+06

select * from pg_stat_all_tables where relname = 'test';
-[ RECORD 1 ]-------+-------
relid               | 16385
schemaname          | public
relname             | test
seq_scan            | 0
seq_tup_read        | 0
idx_scan            |
idx_tup_fetch       |
n_tup_ins           | 0
n_tup_upd           | 0
n_tup_del           | 0
n_tup_hot_upd       | 0
n_live_tup          | 0
n_dead_tup          | 0
n_mod_since_analyze | 0
last_vacuum         |
last_autovacuum     |
last_analyze        |
last_autoanalyze    |
vacuum_count        | 0
autovacuum_count    | 0
analyze_count       | 0
autoanalyze_count   | 0

update test set id = id + 10000000 where id < 50000;
-- wait a while for autovacuum to run
select pg_sleep(90);
select * from pg_stat_all_tables where relname = 'test';
-[ RECORD 1 ]-------+--------
relid               | 16385
schemaname          | public
relname             | test
seq_scan            | 1
seq_tup_read        | 1000000
idx_scan            |
idx_tup_fetch       |
n_tup_ins           | 0
n_tup_upd           | 49999
n_tup_del           | 0
n_tup_hot_upd       | 0
n_live_tup          | 0
n_dead_tup          | 49999
n_mod_since_analyze | 49999
last_vacuum         |
last_autovacuum     |
last_analyze        |
last_autoanalyze    |
vacuum_count        | 0
autovacuum_count    | 0
analyze_count       | 0
autoanalyze_count   | 0

update test set id = id + 10000000 where id < 100000;
-- wait a while for autovacuum to run
select pg_sleep(90);
-- will still not trigger autoanalyze because defaults are
-- 50 + .1 x 1000000 = 100050
-- but we have only changed 100000
select * from pg_stat_all_tables where relname = 'test';
-[ RECORD 1 ]-------+--------
relid               | 16385
schemaname          | public
relname             | test
seq_scan            | 2
seq_tup_read        | 2000000
idx_scan            |
idx_tup_fetch       |
n_tup_ins           | 0
n_tup_upd           | 99999
n_tup_del           | 0
n_tup_hot_upd       | 48
n_live_tup          | 0
n_dead_tup          | 99999
n_mod_since_analyze | 99999
last_vacuum         |
last_autovacuum     |
last_analyze        |
last_autoanalyze    |
vacuum_count        | 0
autovacuum_count    | 0
analyze_count       | 0
autoanalyze_count   | 0

-- one more time should trigger autoanalyze
-- however not enough to trigger autovac which will be
-- 50 + .2 x 1000000 = 200050
update test set id = id + 10000000 where id < 150000;
-- wait a while for autovacuum to run
select pg_sleep(90);
-- no we have changed 150000
select * from pg_stat_all_tables where relname = 'test';
-[ RECORD 1 ]-------+------------------------------
relid               | 16385
schemaname          | public
relname             | test
seq_scan            | 3
seq_tup_read        | 3000000
idx_scan            |
idx_tup_fetch       |
n_tup_ins           | 0
n_tup_upd           | 149999
n_tup_del           | 0
n_tup_hot_upd       | 113
n_live_tup          | 1000000
n_dead_tup          | 149951
n_mod_since_analyze | 0
last_vacuum         |
last_autovacuum     |
last_analyze        |
last_autoanalyze    | 2017-08-28 10:26:28.691209-07
vacuum_count        | 0
autovacuum_count    | 0
analyze_count       | 0
autoanalyze_count   | 1

-- one more time should trigger autovac
-- 50 + .2 x 1000000 = 200050
update test set id = id + 10000000 where id < 250000;
-- wait a while for autovacuum to run
select pg_sleep(90);
-- no we have changed 150000
select * from pg_stat_all_tables where relname = 'test';
-[ RECORD 1 ]-------+------------------------------
relid               | 16385
schemaname          | public
relname             | test
seq_scan            | 4
seq_tup_read        | 4000000
idx_scan            |
idx_tup_fetch       |
n_tup_ins           | 0
n_tup_upd           | 249999
n_tup_del           | 0
n_tup_hot_upd       | 178
n_live_tup          | 1000000
n_dead_tup          | 0
n_mod_since_analyze | 100000
last_vacuum         |
last_autovacuum     | 2017-08-28 10:29:37.164542-07
last_analyze        |
last_autoanalyze    | 2017-08-28 10:26:28.691209-07
vacuum_count        | 0
autovacuum_count    | 1
analyze_count       | 0
autoanalyze_count   | 1


--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development


Вложения

Re: [ADMIN] Why is my table not autovacuuming?

От
"David G. Johnston"
Дата:
On Mon, Aug 28, 2017 at 9:44 AM, Joe Conway <mail@joeconway.com> wrote:
Then at
​ ​
some point your server lost power. The info in the stats collector gets
reset on recovery.

Haven't pondered the vacuum side of the equation but ISTM if the auto-analyze daemon is confronted with total amnesia that it should go ahead and schedule an analyze on the entire database.  ANALYZE is relatively inexpensive.

And maybe learn some techniques to compensate.  Like copying the most recent snapshot of its input data to some place that survives a crash-restart.​

Also:

(formula) "is compared to the total number of tuples inserted, updated, or deleted since the last ANALYZE."


Since there are other ways for the stats to reset beside running ANALYZE this sentence hides a critical and should be amended to be more accurate.

David J.