Re: Parallel CREATE INDEX for BRIN indexes
От | Tomas Vondra |
---|---|
Тема | Re: Parallel CREATE INDEX for BRIN indexes |
Дата | |
Msg-id | 254c4ca8-0644-f32c-d4ed-c4eed304054e@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Parallel CREATE INDEX for BRIN indexes (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Список | pgsql-hackers |
On 7/5/23 10:44, Matthias van de Meent wrote: > On Wed, 5 Jul 2023 at 00:08, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: >> >> >> >> On 7/4/23 23:53, Matthias van de Meent wrote: >>> On Thu, 8 Jun 2023 at 14:55, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: >>>> >>>> Hi, >>>> >>>> Here's a WIP patch allowing parallel CREATE INDEX for BRIN indexes. The >>>> infrastructure (starting workers etc.) is "inspired" by the BTREE code >>>> (i.e. copied from that and massaged a bit to call brin stuff). >>> >>> Nice work. >>> >>>> In both cases _brin_end_parallel then reads the summaries from worker >>>> files, and adds them into the index. In 0001 this is fairly simple, >>>> although we could do one more improvement and sort the ranges by range >>>> start to make the index nicer (and possibly a bit more efficient). This >>>> should be simple, because the per-worker results are already sorted like >>>> that (so a merge sort in _brin_end_parallel would be enough). >>> >>> I see that you manually built the passing and sorting of tuples >>> between workers, but can't we use the parallel tuplesort >>> infrastructure for that? It already has similar features in place and >>> improves code commonality. >>> >> >> Maybe. I wasn't that familiar with what parallel tuplesort can and can't >> do, and the little I knew I managed to forget since I wrote this patch. >> Which similar features do you have in mind? >> >> The workers are producing the results in "start_block" order, so if they >> pass that to the leader, it probably can do the usual merge sort. >> >>>> For 0002 it's a bit more complicated, because with a single parallel >>>> scan brinbuildCallbackParallel can't decide if a range is assigned to a >>>> different worker or empty. And we want to generate summaries for empty >>>> ranges in the index. We could either skip such range during index build, >>>> and then add empty summaries in _brin_end_parallel (if needed), or add >>>> them and then merge them using "union". >>>> >>>> >>>> I just realized there's a third option to do this - we could just do >>>> regular parallel scan (with no particular regard to pagesPerRange), and >>>> then do "union" when merging results from workers. It doesn't require >>>> the sequence of TID scans, and the union would also handle the empty >>>> ranges. The per-worker results might be much larger, though, because >>>> each worker might produce up to the "full" BRIN index. >>> >>> Would it be too much effort to add a 'min_chunk_size' argument to >>> table_beginscan_parallel (or ParallelTableScanDesc) that defines the >>> minimum granularity of block ranges to be assigned to each process? I >>> think that would be the most elegant solution that would require >>> relatively little effort: table_block_parallelscan_nextpage already >>> does parallel management of multiple chunk sizes, and I think this >>> modification would fit quite well in that code. >>> >> >> I'm confused. Isn't that pretty much exactly what 0002 does? I mean, >> that passes pagesPerRange to table_parallelscan_initialize(), so that >> each pagesPerRange is assigned to a single worker. > > Huh, I overlooked that one... Sorry for that. > >> The trouble I described above is that the scan returns tuples, and the >> consumer has no idea what was the chunk size or how many other workers >> are there. Imagine you get a tuple from block 1, and then a tuple from >> block 1000. Does that mean that the blocks in between are empty or that >> they were processed by some other worker? > > If the unit of work for parallel table scans is the index's > pages_per_range, then I think we can just fill in expected-but-missing > ranges as 'empty' in the parallel leader during index loading, like > the first of the two solutions you proposed. > Right, I think that's the right solution. Or rather the only solution, because the other idea (generating the empty ranges in workers) relies on the workers knowing when to generate that. But I don't think the workers have the necessary information. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления: