Re: Deleting older versions in unique indexes to avoid page splits

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Deleting older versions in unique indexes to avoid page splits
Дата
Msg-id CAH2-Wzn0TGfxhAviNq=wgwoP9FHHhBvyPmiZiDKKRXpEk8GEYQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Deleting older versions in unique indexes to avoid page splits  (Heikki Linnakangas <hlinnaka@iki.fi>)
Список pgsql-hackers
On Mon, Jan 4, 2021 at 4:08 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> You said above that heap_tid_shellsort() is very performance critical,
> and that's why you use the two arrays approach. If it's so performance
> critical that swapping 8 bytes vs 12 byte array elements makes a
> difference, I would guess that comparing TID vs just the block numbers
> would also make a difference.
>
> If you really want to shave cycles, you could also store BlockNumber and
> OffsetNumber in the TM_IndexDelete array, instead of ItemPointerData.
> What's the difference, you might ask? ItemPointerData stores the block
> number as two 16 bit integers that need to be reassembled into a 32 bit
> integer in the ItemPointerGetBlockNumber() macro.
>
> My argument here is two-pronged: If this is performance critical, you
> could do these additional optimizations. If it's not, then you don't
> need the two-arrays trick; one array would be simpler.

That's reasonable. The reason why I haven't been more decisive here is
because the question of whether or not it matters is actually very
complicated, for reasons that have little to do with sorting. The more
effective the mechanism is each time (the more bytes it allows nbtree
to free from each leaf page), the less often it is called, and the
less performance critical the overhead per operation is. On the other
hand there are a couple of other questions about what we do in
heapam.c that aren't quite resolved yet (e.g. exact "favorable
blocks"/prefetching behavior in bottom-up case), that probably affect
how important the heap_tid_shellsort() microoptimisations are.

I think that it makes sense to make a final decision on this at the
last minute, once everything else is resolved, since the implicit
dependencies make any other approach much too complicated. I agree
that this kind of microoptimization is best avoided, but I really
don't want to have to worry about regressions in workloads that I now
understand fully. I think that the sort became less important on
perhaps 2 or 3 occasions during development, even though that was
never really the goal that I had in mind in each case.

I'll do my best to avoid it.

> Hmm, maybe: "... is the best information that we have to go on, it is
> just a guess based on simple heuristics about duplicates in indexes".

I'll add something like that to the next revision.

> I see. Would be good to explain that pattern with multiple indexes in
> the comment.

Will do -- it is the single best example of how heap block locality
can matter with a real workload, so it makes sense to go with it in
explanatory comments.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: set_config() documentation clarification
Следующее
От: Chapman Flack
Дата:
Сообщение: Re: set_config() documentation clarification