Обсуждение: progress reporting for partitioned REINDEX

Поиск
Список
Период
Сортировка

progress reporting for partitioned REINDEX

От
Justin Pryzby
Дата:
It looks like we missed this in a6642b3ae.

I think it's an odd behavior of pg_stat_progress_create_index to simultaneously
show the global progress as well as the progress for the current partition ...

It seems like for partitioned reindex, reindex_index() should set the AM, which
is used in the view:

src/backend/catalog/system_views.sql-                       WHEN 2 THEN 'building index' ||
src/backend/catalog/system_views.sql:                           COALESCE((': ' ||
pg_indexam_progress_phasename(S.param9::oid,S.param11)),
 

Maybe it needs a new flag, like:
params->options & REINDEXOPT_REPORT_PROGRESS_AM

I don't understand why e66bcfb4c added multiple calls to
pgstat_progress_start_command().

-- 
Justin

Вложения

Re: progress reporting for partitioned REINDEX

От
Matthias van de Meent
Дата:
On Tue, 16 Feb 2021, 07:42 Justin Pryzby, <pryzby@telsasoft.com> wrote:
>
> It looks like we missed this in a6642b3ae.
>
> I think it's an odd behavior of pg_stat_progress_create_index to simultaneously
> show the global progress as well as the progress for the current partition ...
>
> It seems like for partitioned reindex, reindex_index() should set the AM, which
> is used in the view:
>
> src/backend/catalog/system_views.sql-                       WHEN 2 THEN 'building index' ||
> src/backend/catalog/system_views.sql:                           COALESCE((': ' ||
pg_indexam_progress_phasename(S.param9::oid,S.param11)),
 
>
> Maybe it needs a new flag, like:
> params->options & REINDEXOPT_REPORT_PROGRESS_AM
>
> I don't understand why e66bcfb4c added multiple calls to
> pgstat_progress_start_command().


These were added to report the index and table that are currently
being worked on in concurrent reindexes of tables, schemas and
databases. Before that commit, it would only report up to the last
index being prepared in phase 1, leaving the user with no info on
which index is being rebuilt.

Why pgstat_progress_start_command specifically was chosen? That is
because there is no method to update the
beentry->st_progress_command_target other than through
stat_progress_start_command, and according to the docs that field
should contain the tableId of the index that is currently being worked
on. This field needs a pgstat_progress_start_command because CIC / RiC
reindexes all indexes concurrently at the same time (and not grouped
by e.g. table), so we must re-start reporting for each index in each
new phase in which we report data to get the heapId reported correctly
for that index.


With regards,

Matthias van de Meent



Re: progress reporting for partitioned REINDEX

От
Michael Paquier
Дата:
On Tue, Feb 16, 2021 at 12:39:08PM +0100, Matthias van de Meent wrote:
> These were added to report the index and table that are currently
> being worked on in concurrent reindexes of tables, schemas and
> databases. Before that commit, it would only report up to the last
> index being prepared in phase 1, leaving the user with no info on
> which index is being rebuilt.

Nothing much to add on top of what Matthias is saying here.  REINDEX
for partitioned relations builds first the full list of partitions to
work on, and then processes each one of them in a separate
transaction.  This is consistent with what we do for other commands
that need to handle an object different than a non-partitioned table
or a non-partitioned index.  The progress reporting has to report the
index whose storage is manipulated and its parent table.

> Why pgstat_progress_start_command specifically was chosen? That is
> because there is no method to update the
> beentry->st_progress_command_target other than through
> stat_progress_start_command, and according to the docs that field
> should contain the tableId of the index that is currently being worked
> on. This field needs a pgstat_progress_start_command because CIC / RiC
> reindexes all indexes concurrently at the same time (and not grouped
> by e.g. table), so we must re-start reporting for each index in each
> new phase in which we report data to get the heapId reported correctly
> for that index.

Using pgstat_progress_start_command() for this purpose is fine IMO.
This provides enough information for the user without complicating
more this API layer.

-       if (progress)
-               pgstat_progress_update_param(PROGRESS_CREATEIDX_ACCESS_METHOD_OID,
-                                                                        iRel->rd_rel->relam);
+       // Do this unconditionally?
+       pgstat_progress_update_param(PROGRESS_CREATEIDX_ACCESS_METHOD_OID,
+                                                                iRel->rd_rel->relam);
You cannot do that, this would clobber the progress information of any
upper layer that already reports something to the progress infra in
the backend's MyProc.  CLUSTER is one example calling reindex_relation()
that does *not* want progress reporting to happen in REINDEX.

+       /* progress reporting for partitioned indexes */
+       if (relkind == RELKIND_PARTITIONED_INDEX)
+       {
+               const int       progress_index[3] = {
+                       PROGRESS_CREATEIDX_COMMAND,
+                       PROGRESS_CREATEIDX_INDEX_OID,
+                       PROGRESS_CREATEIDX_PARTITIONS_TOTAL
+               };
This does not make sense in ReindexPartitions() IMO because this
relation is not reindexed as it has no storage, and you would lose the
context of each partition.

Something that we may want to study instead is whether we'd like to
report to the user the set of relations a REINDEX command is working
on and on which relation the work is currently done.  But I am not
really sure that we need that as long a we have a VERBOSE option that
lets us know via the logs what already happened in a single command.

I see no bug here.
--
Michael

Вложения

Re: progress reporting for partitioned REINDEX

От
Justin Pryzby
Дата:
On Wed, Feb 17, 2021 at 02:55:04PM +0900, Michael Paquier wrote:
> I see no bug here.

pg_stat_progress_create_index includes partitions_{done,total} for
CREATE INDEX p, so isn't it strange if it wouldn't do likewise for
REINDEX INDEX p ?

-- 
Justin



Re: progress reporting for partitioned REINDEX

От
Michael Paquier
Дата:
On Wed, Feb 17, 2021 at 12:10:43AM -0600, Justin Pryzby wrote:
> On Wed, Feb 17, 2021 at 02:55:04PM +0900, Michael Paquier wrote:
>> I see no bug here.
>
> pg_stat_progress_create_index includes partitions_{done,total} for
> CREATE INDEX p, so isn't it strange if it wouldn't do likewise for
> REINDEX INDEX p ?

There is always room for improvement.  This stuff applies now only
when creating an index in the non-concurrent case because an index
cannot be created on a partitioned table concurrently, and this
behavior is documented as such.  If we are going to improve this area,
it seems to me that we may want to consider more cases than just the
case of partitions, as it could also help the monitoring of REINDEX on
schemas and databases.

I don't think that this fits as an open item.  That's just a different
feature.
--
Michael

Вложения