Re: Add important info about ANALYZE after create Functional Index

Поиск
Список
Период
Сортировка
От David G. Johnston
Тема Re: Add important info about ANALYZE after create Functional Index
Дата
Msg-id CAKFQuwZWFTBxFfsdtfy09YzYUznN5c-u=1vWjVkfpE8A=bm0ag@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Add important info about ANALYZE after create Functional Index  (Nikolay Samokhvalov <samokhvalov@gmail.com>)
Список pgsql-hackers
On Mon, Oct 26, 2020 at 9:44 PM Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:
On Mon, Oct 26, 2020 at 7:03 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Monday, October 26, 2020, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote:
Although, this triggers a question – should ANALYZE be automated in, say, pg_restore as well?

Independent concern.

It's the same class of issues – after we created some objects, we lack statistics and willing to automate its collection. If the approach is automated in one case, it should be automated in the others, for consistency.

I don't see a need to force consistency between something that will affect, at most, one table, and something that will affect an entire database or cluster.  The other material difference is that the previous state of a restore is "nothing" while in the create/reindex cases we are going from live, populated, state to another.

I do observe that while the create/reindex analyze would run automatically during the restore on object creation there would be no data present so it would be close to a no-op in practice.
 

And another question: how ANALYZE needs to be run? If it's under the user's control, there is an option to use vacuumdb --analyze and benefit from using -j to parallelize the work (and, in some cases, benefit from using --analyze-in-stages). If we had ANALYZE as a part of building indexes on expressions, should it be parallelized to the same extent as index creation (controlled by max_parallel_maintenance_workers)?

None of that seems relevant here.  The only relevant parameter I see is what to specify for “table_and_columns”.

I'm not sure I follow.

Describe how parallelism within the session that is auto-analyzing is supposed to work.  vaccuumdb opens up multiple connections which shouldn't happen here.

I suppose having the auto-analyze run three times with different targets would work but I'm doubting that is a win.  I may just be underestimating how long an analyze on an extremely large table with high statistics takes.

David J.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: list of extended statistics on psql
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Deduplicate aggregates and transition functions in planner