Re: progress reporting for partitioned REINDEX

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: progress reporting for partitioned REINDEX
Дата
Msg-id YCyvuBoo8PMHyf1g@paquier.xyz
обсуждение исходный текст
Ответ на Re: progress reporting for partitioned REINDEX  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Ответы Re: progress reporting for partitioned REINDEX
Список pgsql-hackers
On Tue, Feb 16, 2021 at 12:39:08PM +0100, Matthias van de Meent wrote:
> These were added to report the index and table that are currently
> being worked on in concurrent reindexes of tables, schemas and
> databases. Before that commit, it would only report up to the last
> index being prepared in phase 1, leaving the user with no info on
> which index is being rebuilt.

Nothing much to add on top of what Matthias is saying here.  REINDEX
for partitioned relations builds first the full list of partitions to
work on, and then processes each one of them in a separate
transaction.  This is consistent with what we do for other commands
that need to handle an object different than a non-partitioned table
or a non-partitioned index.  The progress reporting has to report the
index whose storage is manipulated and its parent table.

> Why pgstat_progress_start_command specifically was chosen? That is
> because there is no method to update the
> beentry->st_progress_command_target other than through
> stat_progress_start_command, and according to the docs that field
> should contain the tableId of the index that is currently being worked
> on. This field needs a pgstat_progress_start_command because CIC / RiC
> reindexes all indexes concurrently at the same time (and not grouped
> by e.g. table), so we must re-start reporting for each index in each
> new phase in which we report data to get the heapId reported correctly
> for that index.

Using pgstat_progress_start_command() for this purpose is fine IMO.
This provides enough information for the user without complicating
more this API layer.

-       if (progress)
-               pgstat_progress_update_param(PROGRESS_CREATEIDX_ACCESS_METHOD_OID,
-                                                                        iRel->rd_rel->relam);
+       // Do this unconditionally?
+       pgstat_progress_update_param(PROGRESS_CREATEIDX_ACCESS_METHOD_OID,
+                                                                iRel->rd_rel->relam);
You cannot do that, this would clobber the progress information of any
upper layer that already reports something to the progress infra in
the backend's MyProc.  CLUSTER is one example calling reindex_relation()
that does *not* want progress reporting to happen in REINDEX.

+       /* progress reporting for partitioned indexes */
+       if (relkind == RELKIND_PARTITIONED_INDEX)
+       {
+               const int       progress_index[3] = {
+                       PROGRESS_CREATEIDX_COMMAND,
+                       PROGRESS_CREATEIDX_INDEX_OID,
+                       PROGRESS_CREATEIDX_PARTITIONS_TOTAL
+               };
This does not make sense in ReindexPartitions() IMO because this
relation is not reindexed as it has no storage, and you would lose the
context of each partition.

Something that we may want to study instead is whether we'd like to
report to the user the set of relations a REINDEX command is working
on and on which relation the work is currently done.  But I am not
really sure that we need that as long a we have a VERBOSE option that
lets us know via the logs what already happened in a single command.

I see no bug here.
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: John Naylor
Дата:
Сообщение: Re: [POC] verifying UTF-8 using SIMD instructions
Следующее
От: Justin Pryzby
Дата:
Сообщение: Re: progress reporting for partitioned REINDEX