Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

Поиск

Список

Период

Сортировка

От	Michail Nikolaev
Тема	Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Дата	9 мая 16:00:00
Msg-id	CANtu0oiT9SPFhs=h8DR4YVPox7TJ6jkfR9JgqT-0L+=uy=Lxng@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements (Michail Nikolaev <michail.nikolaev@gmail.com>)
Ответы	Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Список	pgsql-hackers

Дерево обсуждения

Hello, Matthias and others!

Realized new horizon was applied only during validation phase (once index is marked as ready).
Now it applied if index is not marked as valid yet.

Updated version in attach.

--------------------------------------------------

> I think the best way for this to work would be an index method that
> exclusively stores TIDs, and of which we can quickly determine new
> tuples, too. I was thinking about something like GIN's format, but
> using (generation number, tid) instead of ([colno, colvalue], tid) as
> key data for the internal trees, and would be unlogged (because the
> data wouldn't have to survive a crash). Then we could do something
> like this for the second table scan phase:

Regarding that approach to dealing with validation phase and resetting of snapshot:

I was thinking about it and realized: once we go for an additional index - we don't need the second heap scan at all!

We may do it this way:

* create target index, not marked as indisready yet
* create a temporary unlogged index with the same parameters to store tids (optionally with the indexes columns data, see below), marked as indisready (but not indisvalid)
* commit them both in a single transaction
* wait for other transaction to know about them and honor in HOT constraints and new inserts (for temporary index)
* now our temporary index is filled by the tuples inserted to the table
* start building out target index, resetting snapshot every so often (if it is "safe" index)
* finish target index building phase
* mark target index as indisready
* now, start validation of the index:
* take the reference snapshot
* take a visibility snapshot of the target index, sort it (as it done currently)
* take a visibility snapshot of our temporary index, sort it
* start merging loop using two synchronized cursors over both visibility snapshots
* if we encountered tid which is not present in target visibility snapshot
* insert it to target index
* if a temporary index contains the column's data - we may even avoid the tuple fetch
* if temporary index is tid-only - we fetch tuple from the heap, but as plus we are also skipping dead tuples from insertion to the new index (I think it is better option)
* commit everything, release reference snapshot
* wait for transactions older than reference snapshot (as it done currently)
* mark target index as indisvalid, drop temporary index
* done

So, pros:
* just a single heap scan
* snapshot is reset periodically

Cons:
* we need to maintain the additional index during the main building phase
* one more tuplesort

If the temporary index is unlogged, cheap to maintain (just append-only mechanics) this feels like a perfect tradeoff for me.

This approach will work perfectly with low amount of tuple inserts during the building phase. And looks like even in the worst case it still better than the current approach.

What do you think? Have I missed something?

Thanks,
Michail.

Вложения

v4-0001-WIP-fix-d9d076222f5b-VACUUM-ignore-indexing-opera.patch

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Bertrand Drouvot
Дата: 09 мая, 15:20:51
Сообщение: Re: Avoid orphaned objects dependencies, take 3

Следующее

От: Bruce Momjian
Дата: 09 мая, 16:08:52
Сообщение: Re: First draft of PG 17 release notes

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

Вложения

Предыдущее

Следующее