Обсуждение: [HACKERS] Queuing all tables for analyze after recovery

Поиск
Список
Период
Сортировка

[HACKERS] Queuing all tables for analyze after recovery

От
Tomasz Ostrowski
Дата:
Hi.

Some (maybe all) row statistics are lost after the database has 
recovered after a failover. So it's recommended to ANALYZE all databases 
in a cluster after recovery.

Amazon's AWS RDS (their managed SQL databases service) even sends an 
email "consider running analyze if your database is slow" after a 
failover of so called MultiAZ  databases (with fast automatic failover 
for double price). Funny that they send it for both PostgreSQL and 
Oracle databases, which, I suppose, confuses Oracle DBA's greatly.

And in AWS RDS MultiAZ a failover is pretty common. Minor version 
upgrade - failover. A storage hiccup - failover. Out of memory - failover.

Shouldn't this analyze be queued and all tables analyzed automatically 
after failover by autovacuum daemon? With up to autovacuum_max_workers 
in parallel?

It might save some DBA's from a couple of lost sleeping hours for sure. 
What do you think? A GUC option? On by dafault? Maybe even backported, 
but off by default in released versions?

-- 
Tomasz "Tometzky" Ostrowski


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Queuing all tables for analyze after recovery

От
Tom Lane
Дата:
Tomasz Ostrowski <tometzky+pg@ato.waw.pl> writes:
> Some (maybe all) row statistics are lost after the database has 
> recovered after a failover. So it's recommended to ANALYZE all databases 
> in a cluster after recovery.

Uh ... recommended by whom?  pg_statistic has exactly the same reliability
guarantees as the rest of the system catalogs.

I don't deny that there might be cases where this is worth doing, but
it does not seem so likely that it should be part of one's standard
checklist.  Much less something that we should expend a great deal
of effort to automate.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Queuing all tables for analyze after recovery

От
Vik Fearing
Дата:
On 10/19/2017 10:54 PM, Tom Lane wrote:
> Tomasz Ostrowski <tometzky+pg@ato.waw.pl> writes:
>> Some (maybe all) row statistics are lost after the database has 
>> recovered after a failover. So it's recommended to ANALYZE all databases 
>> in a cluster after recovery.
> 
> Uh ... recommended by whom?  pg_statistic has exactly the same reliability
> guarantees as the rest of the system catalogs.
> 
> I don't deny that there might be cases where this is worth doing, but
> it does not seem so likely that it should be part of one's standard
> checklist.  Much less something that we should expend a great deal
> of effort to automate.

For data statistics, sure.  One thing I'm unhappy about is that
pg_stat_all_tables is blank.

An idea I've been throwing around in my head is to have autovacuum work
on tables that have vacuum_count and autovacuum_count both zero (and
likewise for analyze).

This will cause a flurry of activity after failover or crash, but the
alternative is autovacuum not knowing anything about the state of the
tables and allowing massive bloat to potentially occur.

For example, if you have a 1 billion row table, and crash when there are
199,999,999 dead tuples, you currently get to wait for another 200
million to die before anything gets cleaned up.
-- 
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Queuing all tables for analyze after recovery

От
Tomasz Ostrowski
Дата:
On 10/19/2017 10:54 PM, Tom Lane wrote:
> Uh ... recommended by whom?  pg_statistic has exactly the same reliability
> guarantees as the rest of the system catalogs.

Actually I'm not exactly sure what is lost and what is preserved. I'm 
pretty sure that pg_stat_all_tables and similar views turn out with no 
data after a failover.

Also I have some experience with badly performing databases after a 
failover, which went back to normal performance after whole cluster 
analyze. This email from AWS suggests that it's not only me.

> I don't deny that there might be cases where this is worth doing, but
> it does not seem so likely that it should be part of one's standard
> checklist.  Much less something that we should expend a great deal
> of effort to automate.

I assumed that the effort here shouldn't be that large. I imagined a 
simple check if the statistics are missing when considering tables for 
analyze by autovacuum. But I'm not a programmer, so I might misestimate 
this effort badly.

-- 
Regards,
Tomasz "Tometzky" Ostrowski


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Queuing all tables for analyze after recovery

От
Tom Lane
Дата:
Vik Fearing <vik.fearing@2ndquadrant.com> writes:
> On 10/19/2017 10:54 PM, Tom Lane wrote:
>> Uh ... recommended by whom?  pg_statistic has exactly the same reliability
>> guarantees as the rest of the system catalogs.

> For data statistics, sure.  One thing I'm unhappy about is that
> pg_stat_all_tables is blank.

Well, that's because we throw away the stats collector's stats after a
crash -- or, in the failover case, because the promoted slave has its own
counters and not the master's.  ANALYZE isn't going to help that at all.

The fact that we drop those stats in a crash cycle is probably mostly
an overabundance of caution.  We could likely quit doing that, maybe
with a bit more validation effort when reading the files.

Not sure whether we ought to change anything about the failover case.
It's certainly reasonable for a standby server to have its own stats.

The one case where it might make sense to explicitly discard the counters
is when we do PITR to a previous system state.  That's not too common
though ...
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Queuing all tables for analyze after recovery

От
Vik Fearing
Дата:
On 10/19/2017 11:26 PM, Tom Lane wrote:
> Vik Fearing <vik.fearing@2ndquadrant.com> writes:
>> On 10/19/2017 10:54 PM, Tom Lane wrote:
>>> Uh ... recommended by whom?  pg_statistic has exactly the same reliability
>>> guarantees as the rest of the system catalogs.
> 
>> For data statistics, sure.  One thing I'm unhappy about is that
>> pg_stat_all_tables is blank.
> 
> Well, that's because we throw away the stats collector's stats after a
> crash -- or, in the failover case, because the promoted slave has its own
> counters and not the master's.  ANALYZE isn't going to help that at all.

Sure it will.  ANALYZE estimates n_dead_tup, which often accurate enough
for autovacuum to figure out what to do.
-- 
Vik Fearing                                          +33 6 46 75 15 36
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers