Re: Parallel CREATE INDEX for BRIN indexes

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: Parallel CREATE INDEX for BRIN indexes
Дата	30 декабря 2023 г. 22:42:24
Msg-id	3733d042-71e1-6ae6-5fac-00c12db62db6@enterprisedb.com обсуждение исходный текст
Ответ на	Re: Parallel CREATE INDEX for BRIN indexes (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы	Re: Parallel CREATE INDEX for BRIN indexes
Список	pgsql-hackers

Дерево обсуждения

Hi,

While cleaning up some unnecessary bits of the code and slightly
inaccurate comments, I ran into a failure when the parallel scan (used
by the parallel build) happened to be a synchronized scan. When the scan
did not start on page 0, the parallel callback failed to correctly
handle tuples after wrapping around to the start of the table.

AFAICS the extensive testing I did during development did not detect
this because strictly speaking the index was "correct" (as in not
returning incorrect results in queries), just less efficient (missing
some ranges, and some ranges being "wider" than necessary). Or perhaps
the tests happened to not trigger synchronized scans.

Should be fixed by 1ccab5038eaf261f. It took me ages to realize what the
problem is, and I initially suspected there's some missing coordination
between the workers/leader, or something.

So I started comparing the code to btree, which is where it originated,
and I realized there's indeed one difference - the BRIN code only does
half the work with the workersdonecv variable. The workers do correctly
update the count and notify the leader, but the leader never waits for
the count to be 0. That is, there's nothing like _bt_parallel_heapscan.

I wonder whether this actually is a problem, considering the differences
between the flow in BRIN and BTREE. In particular, the "leader" does the
work in _brin_end_parallel() after WaitForParallelWorkersToFinish(). So
it's not like there might be a worker still processing data, I think.

But now that I think about it, maybe it's not such a great idea to do
this kind of work in _brin_end_parallel(). Maybe it should do just stuff
related to termination of workers etc. and the merging of results should
happen elsewhere - earlier in brinbuild()? Then it'd make sense to have
something like _bt_parallel_heapscan ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Parallel CREATE INDEX for BRIN indexes