Обсуждение: progress reporting for partitioned REINDEX
It looks like we missed this in a6642b3ae. I think it's an odd behavior of pg_stat_progress_create_index to simultaneously show the global progress as well as the progress for the current partition ... It seems like for partitioned reindex, reindex_index() should set the AM, which is used in the view: src/backend/catalog/system_views.sql- WHEN 2 THEN 'building index' || src/backend/catalog/system_views.sql: COALESCE((': ' || pg_indexam_progress_phasename(S.param9::oid,S.param11)), Maybe it needs a new flag, like: params->options & REINDEXOPT_REPORT_PROGRESS_AM I don't understand why e66bcfb4c added multiple calls to pgstat_progress_start_command(). -- Justin
Вложения
On Tue, 16 Feb 2021, 07:42 Justin Pryzby, <pryzby@telsasoft.com> wrote: > > It looks like we missed this in a6642b3ae. > > I think it's an odd behavior of pg_stat_progress_create_index to simultaneously > show the global progress as well as the progress for the current partition ... > > It seems like for partitioned reindex, reindex_index() should set the AM, which > is used in the view: > > src/backend/catalog/system_views.sql- WHEN 2 THEN 'building index' || > src/backend/catalog/system_views.sql: COALESCE((': ' || pg_indexam_progress_phasename(S.param9::oid,S.param11)), > > Maybe it needs a new flag, like: > params->options & REINDEXOPT_REPORT_PROGRESS_AM > > I don't understand why e66bcfb4c added multiple calls to > pgstat_progress_start_command(). These were added to report the index and table that are currently being worked on in concurrent reindexes of tables, schemas and databases. Before that commit, it would only report up to the last index being prepared in phase 1, leaving the user with no info on which index is being rebuilt. Why pgstat_progress_start_command specifically was chosen? That is because there is no method to update the beentry->st_progress_command_target other than through stat_progress_start_command, and according to the docs that field should contain the tableId of the index that is currently being worked on. This field needs a pgstat_progress_start_command because CIC / RiC reindexes all indexes concurrently at the same time (and not grouped by e.g. table), so we must re-start reporting for each index in each new phase in which we report data to get the heapId reported correctly for that index. With regards, Matthias van de Meent
On Tue, Feb 16, 2021 at 12:39:08PM +0100, Matthias van de Meent wrote: > These were added to report the index and table that are currently > being worked on in concurrent reindexes of tables, schemas and > databases. Before that commit, it would only report up to the last > index being prepared in phase 1, leaving the user with no info on > which index is being rebuilt. Nothing much to add on top of what Matthias is saying here. REINDEX for partitioned relations builds first the full list of partitions to work on, and then processes each one of them in a separate transaction. This is consistent with what we do for other commands that need to handle an object different than a non-partitioned table or a non-partitioned index. The progress reporting has to report the index whose storage is manipulated and its parent table. > Why pgstat_progress_start_command specifically was chosen? That is > because there is no method to update the > beentry->st_progress_command_target other than through > stat_progress_start_command, and according to the docs that field > should contain the tableId of the index that is currently being worked > on. This field needs a pgstat_progress_start_command because CIC / RiC > reindexes all indexes concurrently at the same time (and not grouped > by e.g. table), so we must re-start reporting for each index in each > new phase in which we report data to get the heapId reported correctly > for that index. Using pgstat_progress_start_command() for this purpose is fine IMO. This provides enough information for the user without complicating more this API layer. - if (progress) - pgstat_progress_update_param(PROGRESS_CREATEIDX_ACCESS_METHOD_OID, - iRel->rd_rel->relam); + // Do this unconditionally? + pgstat_progress_update_param(PROGRESS_CREATEIDX_ACCESS_METHOD_OID, + iRel->rd_rel->relam); You cannot do that, this would clobber the progress information of any upper layer that already reports something to the progress infra in the backend's MyProc. CLUSTER is one example calling reindex_relation() that does *not* want progress reporting to happen in REINDEX. + /* progress reporting for partitioned indexes */ + if (relkind == RELKIND_PARTITIONED_INDEX) + { + const int progress_index[3] = { + PROGRESS_CREATEIDX_COMMAND, + PROGRESS_CREATEIDX_INDEX_OID, + PROGRESS_CREATEIDX_PARTITIONS_TOTAL + }; This does not make sense in ReindexPartitions() IMO because this relation is not reindexed as it has no storage, and you would lose the context of each partition. Something that we may want to study instead is whether we'd like to report to the user the set of relations a REINDEX command is working on and on which relation the work is currently done. But I am not really sure that we need that as long a we have a VERBOSE option that lets us know via the logs what already happened in a single command. I see no bug here. -- Michael
Вложения
On Wed, Feb 17, 2021 at 02:55:04PM +0900, Michael Paquier wrote: > I see no bug here. pg_stat_progress_create_index includes partitions_{done,total} for CREATE INDEX p, so isn't it strange if it wouldn't do likewise for REINDEX INDEX p ? -- Justin
On Wed, Feb 17, 2021 at 12:10:43AM -0600, Justin Pryzby wrote: > On Wed, Feb 17, 2021 at 02:55:04PM +0900, Michael Paquier wrote: >> I see no bug here. > > pg_stat_progress_create_index includes partitions_{done,total} for > CREATE INDEX p, so isn't it strange if it wouldn't do likewise for > REINDEX INDEX p ? There is always room for improvement. This stuff applies now only when creating an index in the non-concurrent case because an index cannot be created on a partitioned table concurrently, and this behavior is documented as such. If we are going to improve this area, it seems to me that we may want to consider more cases than just the case of partitions, as it could also help the monitoring of REINDEX on schemas and databases. I don't think that this fits as an open item. That's just a different feature. -- Michael