Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements

Поиск

Список

Период

Сортировка

От	Matthias van de Meent
Тема	Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Дата	4 января 2024 г. 11:24:04
Msg-id	CAEze2WgeyVnDb_j4gJQYC4+HcSsYQAdeRA1-F0KDnJ=Y0A_TzA@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements (Michail Nikolaev <michail.nikolaev@gmail.com>)
Ответы	Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Список	pgsql-hackers

Дерево обсуждения

On Mon, 25 Dec 2023 at 15:12, Michail Nikolaev
<michail.nikolaev@gmail.com> wrote:
>
> Hello!
>
> It seems like the idea of "old" snapshot is still a valid one.
>
> > Should this deal with any potential XID wraparound, too?
>
> As far as I understand in our case, we are not affected by this in any way.
> Vacuum in our table is not possible because of locking, so, nothing
> may be frozen (see below).
> In the case of super long index building, transactional limits will
> stop new connections using current
> regular infrastructure because it is based on relation data (but not
> actual xmin of backends).
>
> > How does this behave when the newly inserted tuple's xmin gets frozen?
> > This would be allowed to happen during heap page pruning, afaik - no
> > rules that I know of which are against that - but it would create
> > issues where normal snapshot visibility rules would indicate it
> > visible to both snapshots regardless of whether it actually was
> > visible to the older snapshot when that snapshot was created...
>
> As I can see, heap_page_prune never freezes any tuples.
> In the case of regular vacuum, it used this way: call heap_page_prune
> and then call heap_prepare_freeze_tuple and then
> heap_freeze_execute_prepared.

Correct, but there are changes being discussed where we would freeze
tuples during pruning as well [0], which would invalidate that
implementation detail. And, if I had to choose between improved
opportunistic freezing and improved R/CIC, I'd probably choose
improved freezing over R/CIC.

As an alternative, we _could_ keep track of concurrent index inserts
using a dummy index (with the same predicate) which only holds the
TIDs of the inserted tuples. We'd keep it as an empty index in phase
1, and every time we reset the visibility snapshot we now only need to
scan that index to know what tuples were concurrently inserted. This
should have a significantly lower IO overhead than repeated full index
bulkdelete scans for the new index in the second table scan phase of
R/CIC. However, in a worst case it could still require another
O(tablesize) of storage.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

[0] https://www.postgresql.org/message-id/CAAKRu_a+g2oe6aHJCbibFtNFiy2aib4E31X9QYJ_qKjxZmZQEg@mail.gmail.com

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements