Re: Parallel CREATE INDEX for GIN indexes

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Parallel CREATE INDEX for GIN indexes
Дата
Msg-id 08b57e98-2fd8-4372-bd1f-b15a010f171b@vondra.me
обсуждение исходный текст
Ответ на Re: Parallel CREATE INDEX for GIN indexes  (Tomas Vondra <tomas@vondra.me>)
Ответы Re: Parallel CREATE INDEX for GIN indexes
Список pgsql-hackers
One more patch version / rebase. I've been planning to get 0001
committed, but I realized there's one more loose end - progress reporting.

I could have committed it without it, I guess, but Matthias actually
mentioned this a couple days ago so I took a stab at it. The build goes
through these 5 build stages (on top of "INITIALIZE"):

  PROGRESS_GIN_PHASE_INDEXBUILD_TABLESCAN
  PROGRESS_GIN_PHASE_PERFORMSORT_1
  PROGRESS_GIN_PHASE_MERGE_1
  PROGRESS_GIN_PHASE_PERFORMSORT_2
  PROGRESS_GIN_PHASE_MERGE_2

The phases up to PROGRESS_GIN_PHASE_MERGE_1 happen in workers, i.e. it
ends with workers feeding the sorted/merged data into the shared
tuplesort. The last two phases are in the leader, which merges the data
and actually inserts it into the GIN index.

The "parallel" part has the blocks_done/blocks_total showing progress,
per the parallel scan. The "leader" phases use tuples_done/tuples_total,
where "tuple" is the GIN tuple produced by workers (each worker reports
the number of "tuples" it writes into the shared tuplesort, the leader
then tracks how many it processed).

I think this works pretty nicely. I'm not entirely sure we need all the
phases, maybe it'd be fine to have the sort+merge as a single phase? Or
maybe there should be one extra "sort" phase? Workers do two sorts,
first on their "private" tuplesort, then on the "shared" one.

What annoys me a little bit is that we only see those stages if the
leader participates as a worker. With parallel_leader_participation=off
none of this is visible anyway (we still see the blocks from the scan).


regards

-- 
Tomas Vondra

Вложения

В списке pgsql-hackers по дате отправления: