Re: Parallel CREATE INDEX for BRIN indexes

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: Parallel CREATE INDEX for BRIN indexes
Дата	4 декабря 2023 г. 15:33:09
Msg-id	32e3f645-389d-87f9-2ce7-33d0087f47d4@enterprisedb.com обсуждение
Ответ на	Re: Parallel CREATE INDEX for BRIN indexes (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Ответы	Re: Parallel CREATE INDEX for BRIN indexes
Список	pgsql-hackers

Дерево обсуждения


On 12/4/23 16:00, Matthias van de Meent wrote:
> On Sun, 3 Dec 2023 at 17:46, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
>> On 11/30/23 18:47, Matthias van de Meent wrote:
>>> ...
>>>
>>> I just ran some more tests in less favorable environments, and it
>>> looks like I hit a bug:
>>>
>>> % SET max_parallel_workers = 0;
>>> % CREATE INDEX ... USING brin (...);
>>> ERROR:  cannot update tuples during a parallel operation
>>>
>>> Fix attached in 0002.
>>
>> Yeah, that's a bug, thanks for the fix. Yeah Just jumping to a "cleanup"
>> label seems a bit cleaner (if that can be said about using goto), so I
>> tweaked the patch to do that instead.
> 
> Good point, I agree that's cleaner.
> 
>>> In 0003 I add the mentioned backfilling of empty ranges at the end of
>>> the table. I added it for both normal and parallel index builds, as
>>> normal builds apparently also didn't yet have this yet.
>>>
>>
>> Right. I was thinking about doing that to, but you beat me to it. I
>> don't want to bury this in the main patch adding parallel builds, it's
>> not really related to parallel CREATE INDEX. And it'd be weird to have
>> this for parallel builds first, so I rebased it as 0001.
> 
> OK.
> 
>> As for the backfilling, I think we need to simplify the code a bit.
>>
>> So 0004 simplifies this - the backfilling is done by a function called
>> from all the places. The main complexity is in ensuring all three places
>> have the same concept of how to specify the range (of ranges) to fill.
> 
> Good points, +1. However, the simplification in 0005 breaks that with
> an underflow:
> 
>> @@ -1669,6 +1672,19 @@ initialize_brin_buildstate(Relation idxRel, BrinRevmap *revmap,
>>     state->bs_worker_id = 0;
>>     state->bs_spool = NULL;
>>
>> +    /*
>> +     * Calculate the start of the last page range. Page numbers are 0-based,
>> +     * so to get the index of the last page we need to subtract one. Then the
>> +     * integer division gives us the proper 0-based range index.
>> +     */
>> +    state->bs_maxRangeStart = ((tablePages - 1) / pagesPerRange) * pagesPerRange;
> 
> When the table is empty, this will try to fill all potential ranges up
> to InvalidBlockNo's range, which is obviously invalid. It also breaks
> the regression tests, as showin in CFBot.
> 

Whoooops! You're right, ofc. If it's empty, we should use 0 instead.
That's what we do now anyway, BRIN will have the first range even for
empty tables.

>> skipping the last page range?
>> -----------------------------
>>
>> I noticed you explicitly skipped backfilling empty tuple for the last
>> page range. Can you explain? I suspect the idea was that the user
>> activity would trigger building the tuple once that page range is
>> filled, but we don't really know if the table receives any changes. It
>> might easily be just a static table, in which case the last range would
>> remain unsummarized. If this is the right thing to do, the serial build
>> should do that too probably ...
>>
>> But I don't think that's the correct thing to do - I think CREATE INDEX
>> is expected to always build a complete index, so my version always
>> builds an index for all table pages.
> 
> Hmm. My idea here is to create an index that is closer to what you get
> when you hit the insertion path with aminsert. This isn't 1:1 how the
> index builds ranges during (re)index when there is data for that
> range, but I thought it to be a close enough analog. Either way, I
> don't mind it adding an empty range for the last range if that's
> considered useful.
> 

I understand, but I'm not sure if keeping this consistency with aminsert
has any material benefit. It's not like we do that now, I think (for
empty tables we already build the first index range).

>> BlockNumber overflows
>> ---------------------
>>
>> The one thing that I'm not quite sure is correct is whether this handles
>> overflows/underflows correctly. I mean, imagine you have a huge table
>> that's almost 0xFFFFFFFF blocks, pages_per_range is prime, and the last
>> range ends less than pages_per_range from 0xFFFFFFFF. Then this
>>
>>     blkno += pages_per_range;
>>
>> can overflow, and might start inserting index tuples again (so we'd end
>> up with a duplicate).
>>
>> I do think the current patch does this correctly, but AFAICS this is a
>> pre-existing issue ...
> 
> Yes, I know I've flagged this at least once before. IIRC, the response
> back then was that it's a very unlikely issue, as you'd have to extend
> the relation to at least the first block of the last range, which
> would currently be InvalidBlockNo - 131072 + 1, or just shy of 32TB of
> data at 8kB BLCKSZ. That's not exactly a common use case, and BRIN
> range ID wraparound is likely the least of your worries at that point.
> 

Probably true, but it seems somewhat careless and untidy ...

>> Anyway, while working on this / stress-testing it, I realized there's a
>> bug in how we allocate the emptyTuple. It's allocated lazily, but if can
>> easily happen in the per-range context we introduced last week. It needs
>> to be allocated in the context covering the whole index build.
> 
> Yeah, I hadn't tested with (very) sparse datasets yet.
> 

I haven't actually checked what the failing cases look like, but I don't
think it needs to be particularly sparse. AFAIK it's just that the
script deletes a chunk of the data somewhere in the table and/or it also
creates a partial index.

>> I think the best way to do that is per 0006, i.e. allocate it in the
>> BrinBuildState, along with the appropriate memory context.
> 
> That fix looks fine to me.
> 

Thanks!


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Parallel CREATE INDEX for BRIN indexes