Re: Parallel CREATE INDEX for BRIN indexes
От | Tomas Vondra |
---|---|
Тема | Re: Parallel CREATE INDEX for BRIN indexes |
Дата | |
Msg-id | 32e3f645-389d-87f9-2ce7-33d0087f47d4@enterprisedb.com обсуждение исходный текст |
Ответ на | Re: Parallel CREATE INDEX for BRIN indexes (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Ответы |
Re: Parallel CREATE INDEX for BRIN indexes
|
Список | pgsql-hackers |
On 12/4/23 16:00, Matthias van de Meent wrote: > On Sun, 3 Dec 2023 at 17:46, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: >> On 11/30/23 18:47, Matthias van de Meent wrote: >>> ... >>> >>> I just ran some more tests in less favorable environments, and it >>> looks like I hit a bug: >>> >>> % SET max_parallel_workers = 0; >>> % CREATE INDEX ... USING brin (...); >>> ERROR: cannot update tuples during a parallel operation >>> >>> Fix attached in 0002. >> >> Yeah, that's a bug, thanks for the fix. Yeah Just jumping to a "cleanup" >> label seems a bit cleaner (if that can be said about using goto), so I >> tweaked the patch to do that instead. > > Good point, I agree that's cleaner. > >>> In 0003 I add the mentioned backfilling of empty ranges at the end of >>> the table. I added it for both normal and parallel index builds, as >>> normal builds apparently also didn't yet have this yet. >>> >> >> Right. I was thinking about doing that to, but you beat me to it. I >> don't want to bury this in the main patch adding parallel builds, it's >> not really related to parallel CREATE INDEX. And it'd be weird to have >> this for parallel builds first, so I rebased it as 0001. > > OK. > >> As for the backfilling, I think we need to simplify the code a bit. >> >> So 0004 simplifies this - the backfilling is done by a function called >> from all the places. The main complexity is in ensuring all three places >> have the same concept of how to specify the range (of ranges) to fill. > > Good points, +1. However, the simplification in 0005 breaks that with > an underflow: > >> @@ -1669,6 +1672,19 @@ initialize_brin_buildstate(Relation idxRel, BrinRevmap *revmap, >> state->bs_worker_id = 0; >> state->bs_spool = NULL; >> >> + /* >> + * Calculate the start of the last page range. Page numbers are 0-based, >> + * so to get the index of the last page we need to subtract one. Then the >> + * integer division gives us the proper 0-based range index. >> + */ >> + state->bs_maxRangeStart = ((tablePages - 1) / pagesPerRange) * pagesPerRange; > > When the table is empty, this will try to fill all potential ranges up > to InvalidBlockNo's range, which is obviously invalid. It also breaks > the regression tests, as showin in CFBot. > Whoooops! You're right, ofc. If it's empty, we should use 0 instead. That's what we do now anyway, BRIN will have the first range even for empty tables. >> skipping the last page range? >> ----------------------------- >> >> I noticed you explicitly skipped backfilling empty tuple for the last >> page range. Can you explain? I suspect the idea was that the user >> activity would trigger building the tuple once that page range is >> filled, but we don't really know if the table receives any changes. It >> might easily be just a static table, in which case the last range would >> remain unsummarized. If this is the right thing to do, the serial build >> should do that too probably ... >> >> But I don't think that's the correct thing to do - I think CREATE INDEX >> is expected to always build a complete index, so my version always >> builds an index for all table pages. > > Hmm. My idea here is to create an index that is closer to what you get > when you hit the insertion path with aminsert. This isn't 1:1 how the > index builds ranges during (re)index when there is data for that > range, but I thought it to be a close enough analog. Either way, I > don't mind it adding an empty range for the last range if that's > considered useful. > I understand, but I'm not sure if keeping this consistency with aminsert has any material benefit. It's not like we do that now, I think (for empty tables we already build the first index range). >> BlockNumber overflows >> --------------------- >> >> The one thing that I'm not quite sure is correct is whether this handles >> overflows/underflows correctly. I mean, imagine you have a huge table >> that's almost 0xFFFFFFFF blocks, pages_per_range is prime, and the last >> range ends less than pages_per_range from 0xFFFFFFFF. Then this >> >> blkno += pages_per_range; >> >> can overflow, and might start inserting index tuples again (so we'd end >> up with a duplicate). >> >> I do think the current patch does this correctly, but AFAICS this is a >> pre-existing issue ... > > Yes, I know I've flagged this at least once before. IIRC, the response > back then was that it's a very unlikely issue, as you'd have to extend > the relation to at least the first block of the last range, which > would currently be InvalidBlockNo - 131072 + 1, or just shy of 32TB of > data at 8kB BLCKSZ. That's not exactly a common use case, and BRIN > range ID wraparound is likely the least of your worries at that point. > Probably true, but it seems somewhat careless and untidy ... >> Anyway, while working on this / stress-testing it, I realized there's a >> bug in how we allocate the emptyTuple. It's allocated lazily, but if can >> easily happen in the per-range context we introduced last week. It needs >> to be allocated in the context covering the whole index build. > > Yeah, I hadn't tested with (very) sparse datasets yet. > I haven't actually checked what the failing cases look like, but I don't think it needs to be particularly sparse. AFAIK it's just that the script deletes a chunk of the data somewhere in the table and/or it also creates a partial index. >> I think the best way to do that is per 0006, i.e. allocate it in the >> BrinBuildState, along with the appropriate memory context. > > That fix looks fine to me. > Thanks! regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Peter EisentrautДата:
Сообщение: Re: Clean up some signal usage mainly related to Windows