Re: pg_stat_progress_create_index vs. parallel index builds

Поиск
Список
Период
Сортировка
От Matthias van de Meent
Тема Re: pg_stat_progress_create_index vs. parallel index builds
Дата
Msg-id CAEze2WhhGStdJ0=3nXGz9eUYNDVD0UEt0s04SDaumtOLYetstA@mail.gmail.com
обсуждение исходный текст
Ответ на pg_stat_progress_create_index vs. parallel index builds  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: pg_stat_progress_create_index vs. parallel index builds  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers
On Wed, 2 Jun 2021 at 13:57, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>
> Hi,
>
> While experimenting with parallel index builds, I've noticed a somewhat
> strange behavior of pg_stat_progress_create_index when a btree index is
> built with parallel workers - some of the phases seem to be missing.
>
> In serial (no parallelism) mode, the progress is roughly this (it's
> always the first/last timestamp of each phase):
>
>               |   command    |             phase
> -------------+--------------+----------------------------------------
>   12:56:01 AM | CREATE INDEX | building index: scanning table
>           ...
>   01:06:22 AM | CREATE INDEX | building index: scanning table
>   01:06:23 AM | CREATE INDEX | building index: sorting live tuples
>           ...
>   01:13:10 AM | CREATE INDEX | building index: sorting live tuples
>   01:13:11 AM | CREATE INDEX | building index: loading tuples in tree
>           ...
>   01:24:02 AM | CREATE INDEX | building index: loading tuples in tree
>
> So it goes through three phases:
>
> 1) scanning tuples
> 2) sorting live tuples
> 3) loading tuples in tree
>
> But with parallel build index build, it changes to:
>
>               |   command    |             phase
> -------------+--------------+----------------------------------------
>   11:40:48 AM | CREATE INDEX | building index: scanning table
>           ...
>   11:47:24 AM | CREATE INDEX | building index: scanning table (scan
>                 complete)
>   11:56:22 AM | CREATE INDEX | building index: scanning table
>   11:56:23 AM | CREATE INDEX | building index: loading tuples in tree
>           ...
>   12:05:33 PM | CREATE INDEX | building index: loading tuples in tree
>
> That is, the "sorting live tuples" phase disappeared, and instead it
> seems to be counted in the "scanning table" one, as if there was an
> update of the phase missing.

> I've only tried this on master, but I assume it behaves like this in the
> older releases too. I wonder if this is intentional - it sure is a bit
> misleading.

This was a suprise to me as well. According to documentation in
sortsupport.h (line 125-129) the parallel workers produce pre-sorted
segments during the scanning phase, which are subsequently merged by
the leader. This might mean that the 'sorting' phase is already
finished during the 'scanning' phase by waiting for the parallel
workers; I haven't looked further if this is the case and whether it
could be changed to also produce the sorting metrics, but seeing as it
is part of the parallel workers API of tuplesort, I think fixing it in
current releases is going to be difficult.

With regards,

Matthias van de Meent



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Noah Misch
Дата:
Сообщение: Re: Race condition in recovery?
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: pg_stat_progress_create_index vs. parallel index builds