Обсуждение: vacuumdb --analyze-in-stages

Поиск
Список
Период
Сортировка

vacuumdb --analyze-in-stages

От
"nikolai.berkoff"
Дата:
Running --analyze-in-stages on a database with statistics causes the statistics to become significantly worse before
theyare improved.  This could be made clearer in the documentation. 
Вложения

Re: vacuumdb --analyze-in-stages

От
"Euler Taveira"
Дата:
On Fri, Oct 8, 2021, at 6:33 AM, nikolai.berkoff wrote:
Running --analyze-in-stages on a database with statistics causes the statistics to become significantly worse before they are improved.  This could be made clearer in the documentation.
I think the paragraph you modified *already* stated that usefulness of
--analyze-in-stages (newly populated from a restored dump or by pg_upgrade). I
don't think your change is an improvement for --analyze-in-stages description.
If you read the previous paragraph, it explains _how_ (3 stages of ANALYZE with
different settings) and _why_ (produce usable statistics faster).


--
Euler Taveira

Re: vacuumdb --analyze-in-stages

От
Alvaro Herrera
Дата:
On 2021-Oct-18, Euler Taveira wrote:

> On Fri, Oct 8, 2021, at 6:33 AM, nikolai.berkoff wrote:
> > Running --analyze-in-stages on a database with statistics causes the
> > statistics to become significantly worse before they are improved.
> > This could be made clearer in the documentation.
>
> I think the paragraph you modified *already* stated that usefulness of
> --analyze-in-stages (newly populated from a restored dump or by pg_upgrade). I
> don't think your change is an improvement for --analyze-in-stages description.
> If you read the previous paragraph, it explains _how_ (3 stages of ANALYZE with
> different settings) and _why_ (produce usable statistics faster).

Yeah, but it doesn't say that the initial stats target (1) is the worst
value you could possibly have.

I wonder if it isn't better to *remove* the second phrase from this
paragraph, and leave just this text:

        This option is useful to analyze a database that was newly populated
        from a restored dump or by <command>pg_upgrade</command>.

to avoid giving the impression that it'll improve things if you run it
in a database with existing statistics.  Or we could proactively warn
more explicitly about the problem:

        This option is useful to analyze a database that was newly populated
        from a restored dump or by <command>pg_upgrade</command>.
    Beware that running with this option in a database with existing
    statistics may cause query optimizer choices to become
    transiently worse, because of the very low statistics target
    that is used in the early stages.

Or maybe just

        This option is useful to analyze a database that was newly populated
        from a restored dump or by <command>pg_upgrade</command>.
    Running with this option in a database with existing statistics
    is not recommended.

Given that the first stage uses statistic target=1, running this option
in a database with any stats at all is probably a bad idea.

-- 
Álvaro Herrera           39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/



Re: vacuumdb --analyze-in-stages

От
"David G. Johnston"
Дата:
On Mon, Oct 18, 2021 at 4:02 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2021-Oct-18, Euler Taveira wrote:

> On Fri, Oct 8, 2021, at 6:33 AM, nikolai.berkoff wrote:
> > Running --analyze-in-stages on a database with statistics causes the
> > statistics to become significantly worse before they are improved.
> > This could be made clearer in the documentation.
>
> I think the paragraph you modified *already* stated that usefulness of
> --analyze-in-stages (newly populated from a restored dump or by pg_upgrade). I
> don't think your change is an improvement for --analyze-in-stages description.
> If you read the previous paragraph, it explains _how_ (3 stages of ANALYZE with
> different settings) and _why_ (produce usable statistics faster).

Yeah, but it doesn't say that the initial stats target (1) is the worst
value you could possibly have.
[...]
Given that the first stage uses statistic target=1, running this option
in a database with any stats at all is probably a bad idea.


Add the word "only"?

This option is only useful to analyze a database...

Beyond that maybe adjust the procedure description to include a comment that we don't actually skip tables that already have a higher statistics target than the current pass would apply. (can we do this?)

"Run several (currently three) stages of analyze with different configuration settings, to produce usable statistics faster.  The first of these stages will remove any existing statistics even if they use a larger statistic target configuration."

David J.

Re: vacuumdb --analyze-in-stages

От
Alvaro Herrera
Дата:
On 2021-Oct-18, David G. Johnston wrote:

> On Mon, Oct 18, 2021 at 4:02 PM Alvaro Herrera <alvherre@alvh.no-ip.org>
> wrote:

> > Given that the first stage uses statistic target=1, running this option
> > in a database with any stats at all is probably a bad idea.
>
> Add the word "only"?
> 
> This option is only useful to analyze a database...

Maybe this is sufficient, since it would drive people away from trying
to do anything else than help upgrades with it.

> Beyond that maybe adjust the procedure description to include a comment
> that we don't actually skip tables that already have a higher statistics
> target than the current pass would apply. (can we do this?)

Hmm, vacuumdb can certainly query the catalogs to see what we have and
skip tables for which we have more than that number, perhaps with a
query based on this number

select tablename,
    max(coalesce(cardinality(histogram_bounds), cardinality(most_common_freqs))) - 1
from pg_stats
group by tablename;

(and of course *don't* skip tables in the final stage, since the stored
stats could be obsolete.)

> "Run several (currently three) stages of analyze with different
> configuration settings, to produce usable statistics faster.  The first of
> these stages will remove any existing statistics even if they use a larger
> statistic target configuration."

.. yeah, this is another option.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



Re: vacuumdb --analyze-in-stages

От
"Euler Taveira"
Дата:
On Mon, Oct 18, 2021, at 8:53 PM, Alvaro Herrera wrote:
On 2021-Oct-18, David G. Johnston wrote:

> On Mon, Oct 18, 2021 at 4:02 PM Alvaro Herrera <alvherre@alvh.no-ip.org>
> wrote:

> > Given that the first stage uses statistic target=1, running this option
> > in a database with any stats at all is probably a bad idea.
>
> Add the word "only"?

> This option is only useful to analyze a database...

Maybe this is sufficient, since it would drive people away from trying
to do anything else than help upgrades with it.
+1. I like your 2nd suggestion.

"This option is only useful to analyze a database that was newly populated from
a restored dump or by <command>pg_upgrade</command>.  Beware that running with
this option in a database with existing statistics may cause query optimizer
choices to become transiently worse, because of the very low statistics target
that is used in the early stages."


> "Run several (currently three) stages of analyze with different
> configuration settings, to produce usable statistics faster.  The first of
> these stages will remove any existing statistics even if they use a larger
> statistic target configuration."

.. yeah, this is another option.
We might include it too but I would suggest "replace" instead of "remove"
because it seems there won't be statistics after the first stage.


--
Euler Taveira

Re: vacuumdb --analyze-in-stages

От
"nikolai.berkoff"
Дата:
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Tuesday, October 19th, 2021 at 01:02, Euler Taveira <euler@eulerto.com> wrote:

> On Mon, Oct 18, 2021, at 8:53 PM, Alvaro Herrera wrote:
>

> > On 2021-Oct-18, David G. Johnston wrote:
> >

> > > On Mon, Oct 18, 2021 at 4:02 PM Alvaro Herrera <alvherre@alvh.no-ip.org>
> > > wrote:
> >

> > > > Given that the first stage uses statistic target=1, running this option
> > > > in a database with any stats at all is probably a bad idea.
> > >
> > > Add the word "only"?
> > > 
> > > This option is only useful to analyze a database...
> >

> > Maybe this is sufficient, since it would drive people away from trying
> > to do anything else than help upgrades with it.
>

> +1. I like your 2nd suggestion.
>

> "This option is only useful to analyze a database that was newly populated from
> a restored dump or by <command>pg_upgrade</command>.  Beware that running with
> this option in a database with existing statistics may cause query optimizer
> choices to become transiently worse, because of the very low statistics target
> that is used in the early stages."
>

> > > "Run several (currently three) stages of analyze with different
> > > configuration settings, to produce usable statistics faster.  The first of
> > > these stages will remove any existing statistics even if they use a larger
> > > statistic target configuration."
> >

> > .. yeah, this is another option.
>

> We might include it too but I would suggest "replace" instead of "remove"
> because it seems there won't be statistics after the first stage.

Given all the suggestions I've tried to combine them into one patch again.

Regards,

Nikolai
Вложения

Re: vacuumdb --analyze-in-stages

От
Alvaro Herrera
Дата:
On 2021-Oct-26, nikolai.berkoff wrote:

> Given all the suggestions I've tried to combine them into one patch again.

OK, I pushed something very similar but not exactly your patch ...
please have a look if you care to suggest improvements.

... oh, I just noticed I made a typo -- omitted "with" in "running this
option".  I'll wait before pushing a fix in case there are other
rewording suggestions.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/