Re: [HACKERS] CLUSTER command progress monitor

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: [HACKERS] CLUSTER command progress monitor
Дата
Msg-id 20190909073159.GA25390@paquier.xyz
обсуждение исходный текст
Ответ на Re: [HACKERS] CLUSTER command progress monitor  (Alvaro Herrera from 2ndQuadrant <alvherre@alvh.no-ip.org>)
Список pgsql-hackers
On Fri, Sep 06, 2019 at 10:27:02AM -0400, Alvaro Herrera from 2ndQuadrant wrote:
> That said, I did spend some time on this type of issue when doing CREATE
> INDEX support; you can tell because I defined the columns for block
> numbers in a scan separately from CREATE INDEX specific fields,
> precisely to avoid multiple commands running concurrently from
> clobbering unrelated columns:
>
> /* Block numbers in a generic relation scan */
> #define PROGRESS_SCAN_BLOCKS_TOTAL                15
> #define PROGRESS_SCAN_BLOCKS_DONE                16

Hm.  It is not really clear what is the intention by looking at the
contents progress.h.

> I would say that it's fairly useful to have CLUSTER report progress on
> indexes being created underneath, but I understand that it might be too
> late to be designing the CLUSTER report to take advantage of the CREATE
> INDEX metrics.

The same can be said about the reporting done in reindex_relation for
PROGRESS_CLUSTER_INDEX_REBUILD_COUNT.  I think that it should be
removed for now.

> I think a workable, not terribly invasive approach is to have REINDEX
> process its commands conditionally: have the caller indicate whether
> progress is to be reported, and skip the calls if not.  That would
> (should) prevent it from clobbering the state set up by CLUSTER.

So, you would basically add an extra flag in the options of
reindex_index() to decide if a progress report should be started or
not?  I am not a fan of that because it does not take care of the root
issue which is that the start of the progress reports is too much
internal.  I think that it would be actually less error prone to move
the start of the progress reporting for REINDEX out of reindex_index()
and start it at a higher level.  Looking again at the code, I would
recommend that we should start the progress in ReindexIndex() before
calling reindex_index(), ReindexMultipleTables() before calling
reindex_relation() and ReindexRelationConcurrently(), and
ReindexTable() before calling reindex_relation().  That will avoid
each command to clobber each other's in-progress reports.

It would be also very good to document clearly how the overlaps for
the progress parameter values are not happening.  For example for
CREATE INDEX, we don't know why 1, 2 and 7 are not used.

Note that there is an ID collision with PROGRESS_CREATEIDX_INDEX_OID
updated in reindex_index() and the CLUSTER part
PROGRESS_CLUSTER_HEAP_BLKS_SCANNED..  There could be an argument to
use a completely different range of IDs for each command phase to
avoid any overlap, but it is scary to consider that we may not have
found all the issues with one command cloberring another one's
state..
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: [HACKERS] PATCH: Batch/pipelining support for libpq
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: BUG #15977: Inconsistent behavior in chained transactions