Обсуждение: When manual analyze is needed

Поиск
Список
Период
Сортировка

When manual analyze is needed

От
veem v
Дата:
Hi,
We see in one of the RDS postgres instances, from pg_stat_user_tables , the auto vacuum and auto analyze happening on the tables without our manual intervention. 

So is auto vacuum analyze is sufficient  to make sure optimal stats and unbloated table structure in place or should we do it manually based on the type of tables like 
for e.g. 
if a table is having just insert only types and getting similar amount of data throughout the day 24*7 
VS 
a table which is getting bulk data load once a day only 
VS  
a volatile table with truncate load kind of operation 
VS 
a table with heavy Update/deletes also happening along with inserts throughout the day 24*7. 

Will auto vacuum/analyze take care of all such tables, or we need to do it manually in certain scenarios?
And if any suboptimal plan is taken by the optimizer for a specific query suddenly, because of the missing stats , how to catch that. Basically, how to get the old plan hash and get it compared with the current plan hash?

Regards
Veem

Re: When manual analyze is needed

От
Laurenz Albe
Дата:
On Mon, 2024-03-04 at 01:33 +0530, veem v wrote:
> We see in one of the RDS postgres instances, from pg_stat_user_tables ,
> the auto vacuum and auto analyze happening on the tables without our
> manual intervention.

That's exactly the idea behind autovacuum.


> So is auto vacuum analyze is sufficient  to make sure optimal stats
> and unbloated table structure in place

Yes, it is sufficient.  If you have a busy database, you may have to
tune autovacuum to keep up.

The only things that require manual ANALYZE are

1. partitioned tables (autoanalyze will collect statistics on the
   partitions, but not the partitioned table itself)

2. after you create an index on an expression (otherwise you have to
   wait until autoanalyze runs to get statistics on the indexed
   expression)

Yours,
Laurenz Albe



Re: When manual analyze is needed

От
Tom Lane
Дата:
Laurenz Albe <laurenz.albe@cybertec.at> writes:
> The only things that require manual ANALYZE are
> ...

You missed one important exception: autovacuum/autoanalyze cannot
process temporary tables, because those are not accessible outside
the owning session.  So you need to do those manually if it's
important for performance.

            regards, tom lane



Re: When manual analyze is needed

От
veem v
Дата:
On Mon, 4 Mar 2024 at 09:42, Laurenz Albe <laurenz.albe@cybertec.at> wrote:

The only things that require manual ANALYZE are

1. partitioned tables (autoanalyze will collect statistics on the
   partitions, but not the partitioned table itself)


So the partitioned table stats is nothing but the rolledover stats of all the partitions. As you mentioned, autoanalyze only works for child partitions but not the partition tables, so does it mean we have to schedule some jobs manually(through some scheduletr like pg_cron) to analyze these partitioned tables at certain time intervals to keep those up to date for partitioned tables? And won't that scan all the child partitions again to have the stats aggregated/rolled over from all the child partitions level to the table level?

Re: When manual analyze is needed

От
veem v
Дата:
Additionally if a query was working fine but suddenly takes a suboptimal plan because of missing stats , do we have any hash value column on any performance view associated with the queryid which we can refer to see past vs current plans difference and identify such issues quickly and fix it? 
I am not seeing any such column in pg_stat_activity or pg_stat_statements to hold hash value of the plan and also the query column is showing "<insufficient privilege>" for many of the entries, why so?

Re: When manual analyze is needed

От
Laurenz Albe
Дата:
On Mon, 2024-03-04 at 10:16 +0530, veem v wrote:
> So the partitioned table stats is nothing but the rolledover stats of all the partitions.
> As you mentioned, autoanalyze only works for child partitions but not the partition tables,
> so does it mean we have to schedule some jobs manually(through some scheduletr like pg_cron)
> to analyze these partitioned tables at certain time intervals to keep those up to date for
> partitioned tables?

Something like that, yes.

> And won't that scan all the child partitions again to have the stats aggregated/rolled
> over from all the child partitions level to the table level?

Yes.

> Additionally if a query was working fine but suddenly takes a suboptimal plan because
> of missing stats , do we have any hash value column on any performance view associated
> with the queryid which we can refer to see past vs current plans difference and identify
> such issues quickly and fix it?

Not that I know of.

Yours,
Laurenz Albe



Re: When manual analyze is needed

От
Greg Sabino Mullane
Дата:
On Mon, Mar 4, 2024 at 12:23 AM veem v <veema0000@gmail.com> wrote:
Additionally if a query was working fine but suddenly takes a suboptimal plan because of missing stats , do we have any hash value column on any performance view associated with the queryid which we can refer to see past vs current plans difference and identify such issues quickly and fix it?

You can use auto_explain; nothing else tracks things at that fine a level. You can use pg_stat_statements to track the average and max time for each query. Save and reset periodically to make it more useful.


 
I am not seeing any such column in pg_stat_activity or pg_stat_statements to hold hash value of the plan and also the query column is showing "<insufficient privilege>" for many of the entries, why so?

Ordinary users are not allowed to see what other people are running. You can add a user to the pg_read_all_stats role to allow this:

GRANT pg_read_all_stats TO alice;

Oftentimes someone needing access to the stats also needs a little more access, so consider the pg_monitor role as well. Both are documented here:


Cheers,
Greg

Re: When manual analyze is needed

От
veem v
Дата:

On Mon, 4 Mar 2024 at 21:46, Greg Sabino Mullane <htamfids@gmail.com> wrote:
On Mon, Mar 4, 2024 at 12:23 AM veem v <veema0000@gmail.com> wrote:
Additionally if a query was working fine but suddenly takes a suboptimal plan because of missing stats , do we have any hash value column on any performance view associated with the queryid which we can refer to see past vs current plans difference and identify such issues quickly and fix it?

You can use auto_explain; nothing else tracks things at that fine a level. You can use pg_stat_statements to track the average and max time for each query. Save and reset periodically to make it more useful.


 

Thank you so much Greg. That helps.

We were planning to have the auto_explain extension added and set the log_min_duration to ~5 seconds and log_analyze to true. So that all the queries going above that time period will be logged and provide detailed information on the exact point of bottleneck. Will it be a good idea to set it on production DB which is a highly active database? or should we only have the extension added but only set the parameters while we debug some performance issue and then reset it back after we are done.

 

Re: When manual analyze is needed

От
Greg Sabino Mullane
Дата:
We were planning to have the auto_explain extension added and set the log_min_duration to ~5 seconds and log_analyze to true. So that all the queries going above that time period will be logged and provide detailed information on the exact point of bottleneck. Will it be a good idea to set it on production DB which is a highly active database? or should we only have the extension added but only set the parameters while we debug some performance issue and then reset it back after we are done.

I would not use log_analyze on a highly active production db. Even on a dev system, use it carefully as it has some downsides. The log_min_duration of 5s should be fine everywhere, however.

Cheers,
Greg