Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

Поиск

Список

Период

Сортировка

От	Matthias van de Meent
Тема	Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Дата	5 марта 2024 г. 20:08:08
Msg-id	CAEze2Wh3eSAnXFdY_6roNPb3WD-YsKbNLiKf=cPmAGHkPUd22w@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements (Michail Nikolaev <michail.nikolaev@gmail.com>)
Ответы	Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Список	pgsql-hackers

Дерево обсуждения

On Wed, 21 Feb 2024 at 12:37, Michail Nikolaev
<michail.nikolaev@gmail.com> wrote:
>
> Hi!
>
> > How do you suppose this would work differently from a long-lived
> > normal snapshot, which is how it works right now?
>
> Difference in the ability to take new visibility snapshot periodically
> during the second phase with rechecking visibility of tuple according
> to the "reference" snapshot (which is taken only once like now).
> It is the approach from (1) but with a workaround for the issues
> caused by heap_page_prune_opt.
>
> > Would it be exclusively for that relation?
> Yes, only for that affected relation. Other relations are unaffected.

I suppose this could work. We'd also need to be very sure that the
toast relation isn't cleaned up either: Even though that's currently
DELETE+INSERT only and can't apply HOT, it would be an issue if we
couldn't find the TOAST data of a deleted for everyone (but visible to
us) tuple.

Note that disabling cleanup for a relation will also disable cleanup
of tuple versions in that table that are not used for the R/CIC
snapshots, and that'd be an issue, too.

> > How would this be integrated with e.g. heap_page_prune_opt?
> Probably by some flag in RelationData, but not sure here yet.
>
> If the idea looks sane, I could try to extend my POC - it should be
> not too hard, likely (I already have tests to make sure it is
> correct).

I'm not a fan of this approach. Changing visibility and cleanup
semantics to only benefit R/CIC sounds like a pain to work with in
essentially all visibility-related code. I'd much rather have to deal
with another index AM, even if it takes more time: the changes in
semantics will be limited to a new plug in the index AM system and a
behaviour change in R/CIC, rather than behaviour that changes in all
visibility-checking code.

But regardless of second scan snapshots, I think we can worry about
that part at a later moment: The first scan phase is usually the most
expensive and takes the most time of all phases that hold snapshots,
and in the above discussion we agreed that we can already reduce the
time that a snapshot is held during that phase significantly. Sure, it
isn't great that we have to scan the table again with only a single
snapshot, but generally phase 2 doesn't have that much to do (except
when BRIN indexes are involved) so this is likely less of an issue.
And even if it is, we would still have reduced the number of
long-lived snapshots by half.

-Matthias

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements